From SimonStL at classic.msn.com Sun Nov 2 01:45:01 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:58:47 2004 Subject: as end tag Message-ID: While looking over the release notes for the 31 October 97 version of the Java MSXML parser, I noticed that they've added a 'feature' that allows for 'Short end tags,' using . This won't be too difficult to implement, perhaps, but it seems like an odd break with XML's (so far) rather strict rules for start and end tags, particularly 3.1 of the 7 August 97 Working Draft: >The end of every element may (for elements which are not empty, must) be marked by an end-tag containing >a name that echoes the element's type as given in the start-tag... >Well-Formedness Constraint - GI Match: >The Name in an element's end-tag must match that in the start-tag. Is this something new going on with the spec, or is it just Microsoft? It looks like they fixed a lot of the bugs, but this may introduce some new problems. (They also allow ampersands in PCDATA, as long as they're 'not followed by a valid name character.) It seems a little early for XML to begin fragmenting. Source: http://www.microsoft.com/standards/xml/xmlchgs.htm. Simon St.Laurent xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From btrafford at worldnet.att.net Sun Nov 2 06:10:04 1997 From: btrafford at worldnet.att.net (Ben Trafford) Date: Mon Jun 7 16:58:47 2004 Subject: Unusual error with MSXML Message-ID: <345C18F0.DFF9C0DD@worldnet.att.net> Hello! Has anybody tried out Microsoft's latest download of MSXML? I'm finding that parsing the HTML 4.0 DTD causes it to crash out. Here's the error: JVIEW caused an invalid page fault in module MSJAVA.DLL at 014f:7c009445. Anybody else seeing this error? Any ideas why? For info's sake, the version of MSJAVA.DLL I'm using is: 4.79.1518 of May 5th of 1997. --->Ben Trafford xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Sun Nov 2 07:00:42 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:58:47 2004 Subject: MSXML comments Message-ID: <345C2331.9C0F2D9A@jclark.com> I played a little today with MSXML and have a couple of suggestions: - MSXML wrongly rejects this ]> It appears to require notations to be declared before use in entity declarations rather than just declared in the DTD. The XML spec could probably be clearer here, but this definitely is not desirable: you often need to declare external entities in the DTD subset that use notations declared in the DTD. It's also incompatible with SGML. - It appears to be impossible to prevent MSXML performing certain validity checks. Worse, MSXML appears to apply Draconian error handling to validity errors not just to well-formedness errors. This makes it impossible to parse some well-formed XML documents. For example: ]> I would suggest that applications should be able to control whether validation is performed. I would also suggest that validity errors not be handled as fatal errors using exceptions; instead, the parser should continue processing in the presence of validity errors, and should make information about validity errors available in the object model. - It requires the version attribute in the XML declaration to be in upper-case. The draft still has "version" in lower case. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Mon Nov 3 01:21:42 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:47 2004 Subject: ANNOUNCE: New MSXML Java Parser Available Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCA7@red-17-msg.dns.microsoft.com> See http://www.microsoft.com/standards/xml/xmlparse.htm for details. I would like to thank all those people who sent bug reports and suggestions. This is a newer version of the parser than the one included in IE 4.0. A lot went into this version including: * Case sensitivity * Conditional sections in the DTD (INCLUDE and IGNORE keywords) * Full support for NAMESPACES (see http://www.microsoft.com/standards/xml/Namespaces.htm). * Support for the ENCODING attribute on the XML tag * Support for the XML-SPACE attribute in regular XML and in the DTD * Support for the RMD attribute on the XML tag * Support for W3C DOM ignorable whitespace nodes. * Support for processing of external text entities. * Optimization on Windows that makes parsing 4 times faster. Other non-spec experimental things that were added: * New Document save options for COMPACT and PRETTY save formats (the default save option uses the ignorable whitespace nodes to save in exactly the same format as the original document). * Support for floating ampersands, e.g., "This & that" * Support for empty end tags, e.g., bar * New helper classes like ElementCollection, TreeEnumeration, etc. For a detailed description of changes see http://www.microsoft.com/standards/xml/xmlchgs.htm. Enjoy !! xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Mon Nov 3 01:27:10 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:47 2004 Subject: as end tag Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-msg.dns.microsoft.com> This is totally optional and experimental. The only rational is that for large documents or documents with long tag names, this saves a lot of bytes. Think of it as a kind of compression technique that would only be enabled when both ends of the pipe can handle it. As for the ampersands, this is a real problem. We found with our experience with CDF that customers just can't handle putting & inside their URL's. We want to comply with XML standards, but we also want XML to be successful in the marketplace. One area that we didn't compromise is with case sensitivity. The new parser is fully case sensitive - but with a switch that sets it back to case insensitive for those people that are reading XML that was generated before case sensitivity was decided. You have to make some tough compromizes sometimes. > -----Original Message----- > From: Simon St.Laurent [SMTP:SimonStL@classic.msn.com] > Sent: Saturday, November 01, 1997 5:43 PM > To: Xml-Dev (E-mail) > Subject: as end tag > > While looking over the release notes for the 31 October 97 version of the > Java > MSXML parser, I noticed that they've added a 'feature' that allows for > 'Short > end tags,' using . This won't be too difficult to implement, perhaps, > but > it seems like an odd break with XML's (so far) rather strict rules for > start > and end tags, particularly 3.1 of the 7 August 97 Working Draft: > > >The end of every element may (for elements which are not empty, must) be > marked by an end-tag containing >a name that echoes the element's type as > given in the start-tag... > >Well-Formedness Constraint - GI Match: > >The Name in an element's end-tag must match that in the start-tag. > > Is this something new going on with the spec, or is it just Microsoft? It > > looks like they fixed a lot of the bugs, but this may introduce some new > problems. (They also allow ampersands in PCDATA, as long as they're 'not > followed by a valid name character.) It seems a little early for XML to > begin > fragmenting. > > Source: http://www.microsoft.com/standards/xml/xmlchgs.htm. > > Simon St.Laurent > > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Mon Nov 3 01:29:39 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:47 2004 Subject: Unusual error with MSXML Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCA9@red-17-msg.dns.microsoft.com> Try turning off the I/O optimization as follows: c:\msxml> regsvr32 /u classes\com\ms\xml\xmlstream\xmlurlstream\xmlurlstream.dll > -----Original Message----- > From: Ben Trafford [SMTP:btrafford@worldnet.att.net] > Sent: Saturday, November 01, 1997 10:09 PM > To: xml-dev@ic.ac.uk > Subject: Unusual error with MSXML > > Hello! > > Has anybody tried out Microsoft's latest download of MSXML? I'm > finding > that parsing the HTML 4.0 DTD causes it to crash out. Here's the error: > > JVIEW caused an invalid page fault in module MSJAVA.DLL at > 014f:7c009445. > > Anybody else seeing this error? Any ideas why? For info's sake, the > version of MSJAVA.DLL I'm using is: > > 4.79.1518 of May 5th of 1997. > > --->Ben Trafford > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Mon Nov 3 01:38:34 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:47 2004 Subject: MSXML comments Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCAA@red-17-msg.dns.microsoft.com> > I played a little today with MSXML and have a couple of suggestions: > - MSXML wrongly rejects this > > > > ]> > > It appears to require notations to be declared before use in entity > declarations rather than just declared in the DTD. The XML spec could > probably be clearer here, but this definitely is not desirable: you > often need to declare external entities in the DTD subset that use > notations declared in the DTD. It's also incompatible with SGML. Yes, order independence of DTD's is not yet implemented. We're still trying to figure out how to implement this without slowing down the parser ?! > - It appears to be impossible to prevent MSXML performing certain > validity checks. Worse, MSXML appears to apply Draconian error handling > to validity errors not just to well-formedness errors. This makes it > impossible to parse some well-formed XML documents. For example: > > > ]> > > I would suggest that applications should be able to control whether > validation is performed. I would also suggest that validity errors not > be handled as fatal errors using exceptions; instead, the parser should > continue processing in the presence of validity errors, and should make > information about validity errors available in the object model. Yes, I agree. Perhaps we should add a handleValidityError method to the ElementFactory, and if the subclass returns true we continue on parsing or something. You can turn off DTD processing all together using RMD="NONE" > - It requires the version attribute in the XML declaration to be in > upper-case. The draft still has "version" in lower case. What's the story here anyway ? We added support for case sensitivity, and decided to make all XML reserved keywords upper case for consistency. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Mon Nov 3 01:54:13 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:47 2004 Subject: as end tag In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-msg.dns.microsoft.com> (message from Chris Lovett on Sun, 2 Nov 1997 17:27:01 -0800) Message-ID: <199711030153.RAA09117@boethius.eng.sun.com> [Chris Lovett:] | This is totally optional and experimental. The only rational is that | for large documents or documents with long tag names, this saves a lot | of bytes. Tests have shown that this difference disappears under compression. | Think of it as a kind of compression technique that would | only be enabled when both ends of the pipe can handle it. Empty end tags are a well formedness error, and the behavior of a conforming XML processor upon encountering such an error is to stop parsing. The prohibition on empty end tags was adopted specifically to enable users to perform a large class of maintenance operations on XML documents without having to buy commercial software. I'm having a very difficult time seeing this as anything but a blatant attempt to subvert the standard by implementing a nonstandard feature in a widely disseminated parser. Please help me to understand this differently. Jon xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Mon Nov 3 05:24:17 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:47 2004 Subject: as end tag Message-ID: <3.0.32.19971102212301.00b323a8@pop.intergate.bc.ca> t 05:53 PM 02/11/97 -0800, Jon Bosak wrote: >| Think of it as a kind of compression technique that would >| only be enabled when both ends of the pipe can handle it. >Empty end tags are a well formedness error, and the behavior of a >conforming XML processor upon encountering such an error is to stop >parsing. Seconded. I am flabbergasted. In November 1997, we should be forgiving about well-intentioned parsers missing details of compliance with a spec which we keep changing, but this apparently-deliberate step out of bounds is incomprehensible; let us assume that it is a transient error which will soon be rectified. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Mon Nov 3 08:50:50 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:47 2004 Subject: as end tag In-Reply-To: Message-ID: <3.0.1.16.19971103094614.295f36e0@pop3.demon.co.uk> At 01:43 02/11/97 UT, Simon St.Laurent wrote: >While looking over the release notes for the 31 October 97 version of the Java >MSXML parser, I noticed that they've added a 'feature' that allows for 'Short >end tags,' using . This won't be too difficult to implement, perhaps, but >it seems like an odd break with XML's (so far) rather strict rules for start >and end tags, particularly 3.1 of the 7 August 97 Working Draft: > >>The end of every element may (for elements which are not empty, must) be >marked by an end-tag containing >a name that echoes the element's type as >given in the start-tag... >>Well-Formedness Constraint - GI Match: >>The Name in an element's end-tag must match that in the start-tag. > >Is this something new going on with the spec, or is it just Microsoft? It >looks like they fixed a lot of the bugs, but this may introduce some new >problems. (They also allow ampersands in PCDATA, as long as they're 'not >followed by a valid name character.) It seems a little early for XML to begin >fragmenting. I would urge readers of this list to adhere to the specs absolutely. [I pass no comments on the motivation for msxml supporting or &.] XML can ONLY be implemented if everyone adheres totally to the specs. I believe this list shares the view that both data and software can be modularised so that different parts of the effort can be investigated by different people. For example I intend to rely completely on parsers (e.g. Lark, NXP) to provide the parsing part of JUMBO at present. In similar vein I am developing JUMBO with the clear motivation that it tracks everything in the spec and implements it as far as possible. [I hope to release a new JUMBO in a few days - I want to see how it sits on top of Lark first.] Even when everyone agrees to implement the spec it is not easy. Any spec has ambiguities and special cases. There are also genuine differences of opinion about procedure in areas where the spec makes no comment. For example there is an uncertainty about when a document can be validated - can the author or the document assert that validation should/not take place? It is going to be EXTREMELY important that documents circulating in the devlopers' community adhere to the specs as closely as possible. We already have challenges from capitalisation (we are case-sensitive now, but await a final resolution on the exact case of some names and a possible policy more generally.) I know that when a policy is announced I am going to have to transform many of my current prototype documents - that's the price of being a developer :-) But I do not intend to change any to deal with software that *knowingly* does not conform to the spec, nor do I intend to test my software on any documents that knowingly do not conform to the spec. There is quite enough problem with ones (including mine:-) that do it unknowingly :-) P. > >Source: http://www.microsoft.com/standards/xml/xmlchgs.htm. > >Simon St.Laurent > > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Mon Nov 3 09:21:12 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:58:47 2004 Subject: Unescaped '&' In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-msg.dns.microsoft.com> Message-ID: In message <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17- msg.dns.microsoft.com>, Chris Lovett writes >As for the ampersands, this is a real problem. We found with our experience >with CDF that customers just can't handle putting & inside their URL's. I don't follow the logic here. My understanding is that spaces within URLs have to be escaped (hence most of the changes, for XPointers, to the TEI Extended Pointer spec). So if '&' has to be followed by a space in order to be unescaped, but that space itself has to be escaped - what exactly are you gaining? Richard. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From neil at bradley.co.uk Mon Nov 3 09:46:53 1997 From: neil at bradley.co.uk (Neil Bradley) Date: Mon Jun 7 16:58:47 2004 Subject: more link questions Message-ID: <199711030946.JAA19093@andromeda.ndirect.co.uk> I am wondering why it is stated that a STEPS value of 2 is appropriate when using a hub document that contains all the extended links. Surely, STEPS=1 is all that is required, as it should not be necessary to process the other documents in the collection to look for extended links. When using a URL to locate a document on the local system, should the protocol be 'file://', or is this a default if no protocol is given? This question applies to entities as well as links, of course. ----------------------------------------------- Neil Bradley - Author of The Concise SGML Companion. neil@bradley.co.uk www.bradley.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Mon Nov 3 16:12:58 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:58:47 2004 Subject: Unusual error with MSXML In-Reply-To: <345C18F0.DFF9C0DD@worldnet.att.net> (message from Ben Trafford on Sat, 01 Nov 1997 23:08:48 -0700) Message-ID: <199711031616.LAA13873@geode.ora.com> [Ben Trafford] > Has anybody tried out Microsoft's latest download of MSXML? > I'm finding that parsing the HTML 4.0 DTD causes it to crash > out. Here's the error: Why are you parsing the HTML 4.0 DTD with an XML parser? It's not XML; it uses AND groups and exclusions. True, MSXML should probably fail more gracefully on non-XML data, but hey - it's beta. -Chris -- http://www.oreilly.com/people/staff/crism/ +1.617.499.7487 90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From aray at q2.net Mon Nov 3 17:25:16 1997 From: aray at q2.net (Arjun Ray) Date: Mon Jun 7 16:58:47 2004 Subject: as end tag In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-msg.dns.microsoft.com> Message-ID: On Sun, 2 Nov 1997, Chris Lovett wrote: > This is totally optional and experimental. The only rational is that for > large documents or documents with long tag names, this saves a lot of bytes. Sorry, this rationale (among others) was discussed to death in Sept 96 on the old XML-WG list and found inadequate. Please review the archives () for anything we might have missed. The good arguments for empty end-tags have nothing to do with byte economy, but they involve other design issues that impinge in a non-trivial way on SGML's minimization rules -- the upshot is that empty end-tags as an *isolated* option (i.e. just an option per se) is a very bad idea for XML. I say this even though I was one of those arguing for empty end-tags back then. Please reconsider. Arjun xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Mon Nov 3 18:44:43 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:47 2004 Subject: as end tag Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCB9@red-17-msg.dns.microsoft.com> Enough already !! I can tell that none of you actually tried the latest MSXML parser. To even get short end tags the programmer has to explicitly turn on the option as follows: Document d = new Document(); d.load("http://www.somewhere.com/somexml.xml"); OutputStream o = new FileOutputStream("test.xml"); XMLOutputStream out = d.createOutputStream(o); o.setShortEndTags(true); d.save(out); In other words, it is a completely experimental feature that is thoroughly buried in the API and the naive user won't even know it exists. The only reason it is there is because of the very fact that there was a lot of discussion about short end tags in the first place. So I decided to play with the idea and quite frankly I thought it was kind of cool that XML was so simple that end tags were redundant. I think this further emphasizes the simplicity of XML. As for blatant attempts at subversion, I'm just a country boy from Australia, I don't get involved in that sort of thing :-) So, enough politics. I'm more interested in constructive feedback from people you have actually played with the new parser.... xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Mon Nov 3 19:12:47 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:47 2004 Subject: as end tag In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCB9@red-17-msg.dns.microsoft.com> (message from Chris Lovett on Mon, 3 Nov 1997 10:43:57 -0800) Message-ID: <199711031911.LAA09574@boethius.eng.sun.com> [Chris Lovett:] | As for blatant attempts at subversion, I'm just a country boy from | Australia, I don't get involved in that sort of thing :-) So, enough | politics. I'm more interested in constructive feedback from people | you have actually played with the new parser.... Thank you so much for providing the alternative interpretation that I asked for. I really didn't want to see the inclusion of a nonstandard extension in your parser as an attempt to make changes to the standard by going outside of the process. I'm very glad to hear that that is not what you had in mind. So, now that you're aware that the effect of leaving the code as an option in the parser will be to encourage nonstandard implementations, when do you intend to remove it? Jon xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Mon Nov 3 19:36:01 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:47 2004 Subject: Unusual error with MSXML Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCBA@red-17-msg.dns.microsoft.com> I notice you are using a JDK 1.0.2 version of MSJAVA.DLL. I've reproduced the problem, it crashes on the line Document d = new Document() and so I'm guessing this has something to do with JavaBeans, since there is a DocumentBeanInfo. I'll trying building another version that works with JDK 1.0.2. In the meantime you should be able to get it to work using Microsoft's Java SDK 2.0. > -----Original Message----- > From: Ben Trafford [SMTP:btrafford@worldnet.att.net] > Sent: Saturday, November 01, 1997 10:09 PM > To: xml-dev@ic.ac.uk > Subject: Unusual error with MSXML > > Hello! > > Has anybody tried out Microsoft's latest download of MSXML? I'm > finding > that parsing the HTML 4.0 DTD causes it to crash out. Here's the > error: > > JVIEW caused an invalid page fault in module MSJAVA.DLL at > 014f:7c009445. > > Anybody else seeing this error? Any ideas why? For info's sake, > the > version of MSJAVA.DLL I'm using is: > > 4.79.1518 of May 5th of 1997. > > --->Ben Trafford > > xml-dev: A list for W3C XML Developers. To post, > mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Ingo.Macherius at TU-Clausthal.de Mon Nov 3 20:18:35 1997 From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius) Date: Mon Jun 7 16:58:47 2004 Subject: as end tag In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCB9@red-17-msg.dns.microsoft.com> Message-ID: <199711032018.VAA20687@sinfonix.rz.tu-clausthal.de> > From: Chris Lovett > To: xml-dev@ic.ac.uk > Subject: RE: as end tag > Date: Mon, 3 Nov 1997 10:43:57 -0800 Chris Lovett said: > I'm more interested in constructive feedback from people > you have actually played with the new parser.... I spent half of the day playing with msxml, which means I tried to get it running on Linux with Java SDK 1.1.3. Here some points I found: 1) Case folding The filenames in the msxml.tar.gz are all folded to lowercase. This might not matter with DOS filesystems, Unix is a bit harsher here. One has to rebuild all *.class files. 2) Makefile Rebuilding the *.class files from source is not easy (haven't managed yet) because of various cross-dependencies. Is it possible for you to provide a Unix style Makefile, or give me a pointer to docs on how to read your MS specific Makefile ? 3) Missing imports These imports are hidden in the source. Surely I could copy them from IE4's Java files, but I'd be nice if msxml was self-contained. import com.ms.com.*; import com.ms.com.IUnknown; import com.ms.com.Variant; import com.ms.osp.*; import netscape.javascript.JSObject; Has anyone successfully run msxml on non-windows platforms ? Is this at least intended ? ++im -- Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/ Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From agreene at bitstream.com Mon Nov 3 20:27:01 1997 From: agreene at bitstream.com (Andrew Greene) Date: Mon Jun 7 16:58:48 2004 Subject: How best to represent unrepresentable characters in NAME tokens? Message-ID: <19971103195249.AAA18429@AGREENE-PC.bitstream.com> If you have a Unicode-friendly XML environment, then users can create elements whose GIs or attribute names contain "interesting" characters. (Yes? A NAME token can contain "BaseChars", which includes characters beyond ASCII and even beyond Latin-1.) So, if a user requests that the document instance be saved as an ASCII file, what is the best way for a Unicode-aware and standards-compliant application to represent these characters? It's not legal to say and the user may already have an element type called "Strasse" so it would be inappropriate to "reduce" it. [I chose this example because it is easy to describe in email; the problem is much more difficult if, instead of German, the user has used Cyrillic or Hebrew NAMEs.] I've thought of three solutions: 1. It's an error. Tell the user "Sorry, your file could not be saved in that character encoding because the element name 'StraBe' could not be represented. Advantages: It's fully compliant and no data can get lost. Disadvantages: No data can get out, either. Perhaps the user has an 8-bit app to massage the data in a particular way, and she doesn't want to rename all her elements. 2. Rename all the offending elements and attributes, and use PIs to ensure that when they're read back in we can patch things up. So, for example, the file could contain: foo bar Advantages: It's fully compliant. Disadvantages: It assumes that all other processing applications will be nice and won't lose my processing instructions, and it makes the file hard to read. It's also non-portable; unless we as a community decide on a "semi-standard" PI to use, no one else will know how to interpret this convention. (On the other hand, this is exactly why I'm bringing the issue up here. Maybe we can all agree on a semi-standard and I'll feel less uneasy about doing something like this....) 3. Violate the standard and use character entities to represent the ineffable, for example: foo bar Advantages: It's compact and unambiguous (even if it's illegal :-). Disadvantages: It violates both XML and 8879 in a new and perverse way. The user's file will not be usable by any other piece of standards-compliant software. That's worse than refusing to write the file at all (number 1). My questions to the assembled multitudes are: * Is there a need for a "semi-standard" solution to this problem, or am I the only one struggling with it? * Is there interest in adopting some variation of number 2 so that we're better able to exchange such data? * I can't help but think that number 3 would be the most elegant solution if it were only legal. Yet I'm also sure that the XML committee had a good reason for disallowing it. I'd be interested in hearing what their reason was, so that I may become enlightened. :-) Thanks for your thoughts, Andrew Greene xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Tue Nov 4 00:37:09 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:58:48 2004 Subject: as end tag Message-ID: >In other words, it is a completely experimental feature that is thoroughly >buried in the API and the naive user won't even know it exists. It is deeply buried in the API, yes, but it was shown vividly in the site demonstrating XML with IE 4.0, where it's presented as perfectly ordinary XML and parsed as such. For instance, the DSO example: Number, the Language of Science</> <AUTHOR>Danzig</> <PRICE>5.95</> <QUANTITY>3</> </> <DSIG>192817265</> etc.... (from http://www.microsoft.com/standards/xml/ - XML Parser, Samples, DSO example.) (I just revisited the site, and the files all still seem to be there, though my IE 4.0 browser is exploding with JavaScript errors that I hope are a sign that the site is under construction and coming down.) >As for blatant attempts at subversion, I'm just a country boy from >Australia, I don't get involved in that sort of thing :-) So, enough >politics. I'm more interested in constructive feedback from people you have >actually played with the new parser.... On Sunday, I ran a lot of my files that had demonstrated freaky parsing behavior in the past - files that parsed as valid when they were explosively wrong (some referred to the wrong DTD, for instance), files that used parameter entities, and files that wouldn't parse at all. They all seem to work properly now - so at least the bugs I had found are now dead. (I only used jview - I'll run it through Sun's JDK and see what happens when I have a chance.) Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From btrafford at worldnet.att.net Tue Nov 4 00:37:45 1997 From: btrafford at worldnet.att.net (Ben Trafford) Date: Mon Jun 7 16:58:48 2004 Subject: Unusual error with MSXML References: <199711031616.LAA13873@geode.ora.com> Message-ID: <345E6F88.6C4D8B18@worldnet.att.net> Chris Maden wrote: > > [Ben Trafford] > > Has anybody tried out Microsoft's latest download of MSXML? > > I'm finding that parsing the HTML 4.0 DTD causes it to crash > > out. Here's the error: > > Why are you parsing the HTML 4.0 DTD with an XML parser? It's not > XML; it uses AND groups and exclusions. > > True, MSXML should probably fail more gracefully on non-XML data, but > hey - it's beta. Well, I often use parsers to find errors in DTDs. Since XML is nominally SGML compatible, an XML parser should find the errors in an SGML DTD (even if just to say that it's got a bunch of stuff XML doesn't recognize). What I was hoping to do was to parse the DTD, read the errors, then figure out what I need to change in the HTML DTD to make it XML-compliant. I've already made a number of changes according to the revised note on the differences between XML and SGML. As I'm currently working with other people's SGML in my professional life, I've found it very useful to parse their DTDs with James Clark's NSGMLS, and to correct their DTDs from that. I'd hoped to do the same thing with MSXML. In addition, not everyone's copy of MSXML crashes on this DTD; I've been working with Simon St.Laurent on the problem, and his MSXML parsed without crashing, using an older version of MSJAVA.DLL. Oh, and just in case Chris Lovett is reading this message, thanks for your initial advice, Chris, but it doesn't appear to have had an impact. I get the same error. Do you have any other suggestions? --->Ben Trafford xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From btrafford at worldnet.att.net Tue Nov 4 00:44:36 1997 From: btrafford at worldnet.att.net (Ben Trafford) Date: Mon Jun 7 16:58:48 2004 Subject: Unusual error with MSXML References: <41135C785691CF11B73B00805FD4D2D703E4FCBA@red-17-msg.dns.microsoft.com> Message-ID: <345E7134.3E8606EE@worldnet.att.net> Chris Lovett wrote: > > I notice you are using a JDK 1.0.2 version of MSJAVA.DLL. I've > reproduced the problem, it crashes on the line Document d = new > Document() and so I'm guessing this has something to do with JavaBeans, > since there is a DocumentBeanInfo. I'll trying building another version > that works with JDK 1.0.2. In the meantime you should be able to get it > to work using Microsoft's Java SDK 2.0. Please ignore my previous plea for advice in a letter directed to Chris Maden. I'll download the 2.0 SDK tonight and give 'er a whirl. Is there any public documentation on the error messages that MSXML gives out? Simon St.Laurent sent me some the other night, and they were a little. . .confusing. Here's the copy of the error message he sent me: Error: null(24,9) Context: - <null> com.ms.xml.ParseException: Expected: Doctype at com.ms.xml.Parser.error(Parser.java:110) at com.ms.xml.Parser.parseToken(Parser.java:583) at com.ms.xml.Parser.parseKeyword(Parser.java:599) at com.ms.xml.Parser.tryDocTypeDecl(Parser.java:748) at com.ms.xml.Parser.parseProlog(Parser.java:676) at com.ms.xml.Parser.parseDocument(Parser.java:642) at com.ms.xml.Parser.parse(Parser.java:58) at com.ms.xml.Document.load(Document.java:183) at msxml.main(msxml.java:48) Can you explain to me what that error represents, in terms of what each field means? It's obviously sorted in a rational fashion, but I'm not very good at guesswork. Any help you could proffer would be very much appreciated, even if it's just pointing me to some more narrative-style documentation (if any exists). --->Ben Trafford xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Tue Nov 4 01:12:21 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:48 2004 Subject: </> as end tag Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCCC@red-17-msg.dns.microsoft.com> > 1) Case folding > > The filenames in the msxml.tar.gz are all folded to lowercase. This > might not matter with DOS filesystems, Unix is a bit harsher here. > One has to rebuild all *.class files. I have a new tar file going out at 6pm. Turns out I had the wrong environment variables set in my C-Shell for Windows. > 2) Makefile > > Rebuilding the *.class files from source is not easy (haven't managed > yet) because of various cross-dependencies. Is it possible for you to > provide a Unix style Makefile, or give me a pointer to docs on how to > read your MS specific Makefile ? Any volunteers ? > 3) Missing imports > > These imports are hidden in the source. Surely I could copy them from > IE4's Java files, but I'd be nice if msxml was self-contained. > > import com.ms.com.*; > import com.ms.com.IUnknown; > import com.ms.com.Variant; > import com.ms.osp.*; > import netscape.javascript.JSObject; All these go away if you remove XMLDSO.java I believe. You shouldn't even try and build this file on other platforms anyway since it designed to only work with the the Data Binding features of IE 4.0. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Tue Nov 4 01:22:58 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:48 2004 Subject: Unusual error with MSXML Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCCF@red-17-msg.dns.microsoft.com> > Is there > any public documentation on the error messages that MSXML gives out? > Simon St.Laurent sent me some the other night, and they were a little. . > .confusing. > > Here's the copy of the error message he sent me: > > Error: null(24,9) > Context: - <null> > com.ms.xml.ParseException: Expected: Doctype > at com.ms.xml.Parser.error(Parser.java:110) > at com.ms.xml.Parser.parseToken(Parser.java:583) > at com.ms.xml.Parser.parseKeyword(Parser.java:599) > at com.ms.xml.Parser.tryDocTypeDecl(Parser.java:748) > at com.ms.xml.Parser.parseProlog(Parser.java:676) > at com.ms.xml.Parser.parseDocument(Parser.java:642) > at com.ms.xml.Parser.parse(Parser.java:58) > at com.ms.xml.Document.load(Document.java:183) > at msxml.main(msxml.java:48) > > Can you explain to me what that error represents, in terms of what > each > field means? It's obviously sorted in a rational fashion, but I'm not > very good at guesswork. Any help you could proffer would be very much > appreciated, even if it's just pointing me to some more narrative-style > documentation (if any exists). > [Chris Lovett] This is the old error message format are you sure it installed ok ? The new msxml.java doesn't print out the exception stack any more. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Tue Nov 4 01:23:35 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:48 2004 Subject: </> as end tag Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCD0@red-17-msg.dns.microsoft.com> > >In other words, it is a completely experimental feature that is thoroughly > >buried in the API and the naive user won't even know it exists. > > It is deeply buried in the API, yes, but it was shown vividly in the site > demonstrating XML with IE 4.0, where it's presented as perfectly ordinary XML > and parsed as such. For instance, the DSO example: > > <BOOKS> > <ITEM> > <TITLE>Number, the Language of Science</> > <AUTHOR>Danzig</> > <PRICE>5.95</> > <QUANTITY>3</> > </> > <DSIG>192817265</> > etc.... (from http://www.microsoft.com/standards/xml/ - XML Parser, Samples, > DSO example.) Woops - this as an oversight. Turns out this XML is generated dynamically via JavaScript and I forgot to update this script. This will be fixed tonight. > >As for blatant attempts at subversion, I'm just a country boy from > >Australia, I don't get involved in that sort of thing :-) So, enough > >politics. I'm more interested in constructive feedback from people you have > >actually played with the new parser.... > > On Sunday, I ran a lot of my files that had demonstrated freaky parsing > behavior in the past - files that parsed as valid when they were explosively > wrong (some referred to the wrong DTD, for instance), files that used > parameter entities, and files that wouldn't parse at all. They all seem to > work properly now - so at least the bugs I had found are now dead. (I only > used jview - I'll run it through Sun's JDK and see what happens when I have a > chance.) Music to my ears. Finally some good news :-) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Tue Nov 4 01:54:55 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:48 2004 Subject: Unusual error with MSXML Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCC6@red-17-msg.dns.microsoft.com> I was able to fix the error on my machine by removing all references to java.io.Serializable. I will be posting a fixed version soon. -----Original Message----- From: Ben Trafford [SMTP:btrafford@worldnet.att.net] Sent: Monday, November 03, 1997 4:43 PM To: xml-dev@ic.ac.uk Subject: Re: Unusual error with MSXML Chris Maden wrote: > > [Ben Trafford] > > Has anybody tried out Microsoft's latest download of MSXML? > > I'm finding that parsing the HTML 4.0 DTD causes it to crash > > out. Here's the error: > > Why are you parsing the HTML 4.0 DTD with an XML parser? It's not > XML; it uses AND groups and exclusions. > > True, MSXML should probably fail more gracefully on non-XML data, but > hey - it's beta. Well, I often use parsers to find errors in DTDs. Since XML is nominally SGML compatible, an XML parser should find the errors in an SGML DTD (even if just to say that it's got a bunch of stuff XML doesn't recognize). What I was hoping to do was to parse the DTD, read the errors, then figure out what I need to change in the HTML DTD to make it XML-compliant. I've already made a number of changes according to the revised note on the differences between XML and SGML. As I'm currently working with other people's SGML in my professional life, I've found it very useful to parse their DTDs with James Clark's NSGMLS, and to correct their DTDs from that. I'd hoped to do the same thing with MSXML. In addition, not everyone's copy of MSXML crashes on this DTD; I've been working with Simon St.Laurent on the problem, and his MSXML parsed without crashing, using an older version of MSJAVA.DLL. Oh, and just in case Chris Lovett is reading this message, thanks for your initial advice, Chris, but it doesn't appear to have had an impact. I get the same error. Do you have any other suggestions? --->Ben Trafford xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Tue Nov 4 02:44:01 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:58:48 2004 Subject: Unusual error with MSXML Message-ID: <UPMAIL17.199711040241580946@classic.msn.com> My apologies to all, especially Ben; it looks like I dropped into the wrong directory when ran MSXML on Ben's DTD - there are two copies floating on each of the hard drives of two machines, one with IE 4 and one with IE 3. I now get the same weird Java errors he did running the correct combination of the new version of MSXML with the old version of jview. I used the viewer applet included in the MSXML package under IE 4 to test a number of files. When there are errors, the viewer still brings up quite a list of errors that look a lot like the list from the previous version, but the errors seem accurate, a significant improvement on version 1.0. The viewer is much handier than the control line was. A preliminary run-through of the new version using Sun's JDK 1.1.3 produced: java.lang.NoClassDefFoundError: com/ms/xml/om/Document at msxml.main(msxml.java:28) No idea why, yet. Simon St.Laurent xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Tue Nov 4 04:10:20 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:58:48 2004 Subject: </> as end tag Message-ID: <01bce8ce$45bad370$0100007f@localhost> >> 3) Missing imports >> >> These imports are hidden in the source. Surely I could copy them from >> IE4's Java files, but I'd be nice if msxml was self-contained. >> >> import com.ms.com.*; >> import com.ms.com.IUnknown; >> import com.ms.com.Variant; >> import com.ms.osp.*; >> import netscape.javascript.JSObject; > >All these go away if you remove XMLDSO.java I believe. You shouldn't even >try and build this file on other platforms anyway since it designed to only >work with the the Data Binding features of IE 4.0. I am not sure where JSObject is imported from but only problem I had running MSXML under JDK 1.1.4 was with XMLInputStream's use of XMLStream (and IXMLStream) which has native methods. I made minor change to XMLInputStream and it now runs wonderfully under JDK. I could post my changes if someone wants it and if MS approves. Don Park xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From btrafford at worldnet.att.net Tue Nov 4 04:34:40 1997 From: btrafford at worldnet.att.net (Ben Trafford) Date: Mon Jun 7 16:58:48 2004 Subject: Unusual error with MSXML References: <41135C785691CF11B73B00805FD4D2D703E4FCCF@red-17-msg.dns.microsoft.com> Message-ID: <345EA735.9CFF15B0@worldnet.att.net> Chris Lovett wrote: > > very good at guesswork. Any help you could proffer would be very much > > appreciated, even if it's just pointing me to some more narrative-style > > documentation (if any exists). > > > [Chris Lovett] This is the old error message format are you sure it > installed ok ? The new msxml.java doesn't print out the exception stack any > more. > of the hard drives of two machines, one with IE 4 and one with IE 3. I now > get the same weird Java errors he did running the correct combination of the > new version of MSXML with the old version of jview. So, let's see if I get this straight: Simon has experienced the same crashes as myself, which Chris hopes to fix with his latest build. Now, the error message I got from Simon was from an old version of MSXML? Regardless, is there more documentation on the error messages MSXML is giving? --->Ben Trafford xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Tue Nov 4 12:12:03 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:58:48 2004 Subject: XML processing experiments Message-ID: <345F0F2E.C3FF9AD7@jclark.com> One nice feature of XML is that it is easily processable by the Desperate C/C++/Java/Perl hacker: the syntax is simple enough that you can do useful things with XML without a full XML parser. I've been exploring this sort processing. If all you want to do is be able to correctly parse well-formed XML, and you don't care about detecting whether or not it is well-formed, how much code does it take and is it significantly faster than using an XML parser that does proper well-formedness or validity checking? I used Jon's Old Testament XML file as test data (after removing the doctype line), which is about 3.7Mb. I ran the tests on a Toshiba Tecra 720CDT (133MHz Pentium, 80Mb RAM) with Windows NT 4.0. I used the IE 4.0 Java VM. The timings I give are after a couple of runs, so there's little or no disk I/O involved. Lark 0.97 parsed the file in about 10.5 seconds, MSXML in about 24 seconds. I suspect the difference is partly because MSXML is building a tree (I didn't see any command line switch to turn this off). By comparison nsgmlsu -s took about 8 seconds. I also tried LT XML (which is written in C). I didn't find a program that did nothing but parsing. The fastest one I found was the sgcount program (which counts the number of each element type); it took about 11 seconds. That's much slower than I expected; I suspect there may be some Windows-specific performance problems. The code I wrote is available at <URL:ftp://ftp.jclark.com/pub/test/xmltok.zip>. First I wrote a little library in C for doing XML "tokenization". This code just splits the input up into "tokens" where each token is data or some kind of XML markup (start-tag, end-tag, comment etc). The idea is that it does the minimum necessary to do any kind of useful XML-aware processing. I wrote a little application xmlec that just counts the number of elements in an XML document. This can compiled either to use Win32 file mapping (if FILEMAP is defined) or normal read() calls. You'll probably have to tweak the code a little if you're using anything other than Visual C++. I then translated this into Java (I'm not much of a Java programmer, so there's probably plenty of scope for improvement in the Java version). xmlec parses the test file in about 0.5 seconds. Using read() instead of file mapping increases the time to about 0.65 seconds. The Java version takes about 1.5 seconds. I also wrote a Java version of the LT XML textonly program (which extracts the non-markup of an XML document). The LT XML version ran in about 13.5 seconds. My Java version ran in about 3.5 seconds. The class files for the Java element counting program total about 6k. The source for the C version is about 750 lines, including both the file mapping and read()ing version. I was quite surprised that there was such a big performance difference between real, conforming XML processing that does well-formedness checking, and quick and dirty XML processing that does the minimum necessary to get the correct result. This doesn't seem right to me... James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Tue Nov 4 12:31:17 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:58:48 2004 Subject: How best to represent unrepresentable characters in NAME tokens? References: <19971103195249.AAA18429@AGREENE-PC.bitstream.com> Message-ID: <345E9AA0.96BDDE55@jclark.com> Andrew Greene wrote: > > If you have a Unicode-friendly XML environment, then users can create > elements whose GIs or attribute names contain "interesting" > characters. (Yes? A NAME token can contain "BaseChars", which includes > characters beyond ASCII and even beyond Latin-1.) > > So, if a user requests that the document instance be saved as an ASCII > file, what is the best way for a Unicode-aware and standards-compliant > application to represent these characters? I would use numeric character references wherever XML allows them; if there are non-ASCII characters in places where numeric character references aren't allowed I would use UTF-8 and give a warning to the user. The ASCII characters will still be there as ASCII, and the non-ASCII characters won't get lost, although they will look a bit funny in an 8-bit editor. An interesting case is when there are non-ASCII characters in places where numeric character references are not recognized but do not cause an error (eg PIs, comments); one could have an application convention that recognizes numeric character references in these cases. > 2. Rename all the offending elements and attributes, and use PIs to > ensure that when they're read back in we can patch things up. > So, for example, the file could contain: > > <?GoodCitizen MangledGI Strae1="Straße"?> > <Strae1>foo bar</Strae1> > > Advantages: It's fully compliant. If I was going to do this sort of thing, I think I would use a variation on URL % encoding. I would have a convention that underscore (say) followed by 4 hex digits represented the Unicode character with that hex code. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Tue Nov 4 14:43:34 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:58:48 2004 Subject: How best to represent unrepresentable characters in NAME tokens? Message-ID: <199711041438.BAA25777@jawa.chilli.net.au> > From: Andrew Greene <agreene@bitstream.com> > * Is there a need for a "semi-standard" solution to this problem, or am > I the only one struggling with it? > > * Is there interest in adopting some variation of number 2 so that we're > better able to exchange such data? > > * I can't help but think that number 3 would be the most elegant solution > if it were only legal. Yet I'm also sure that the XML committee had a > good reason for disallowing it. I'd be interested in hearing what their > reason was, so that I may become enlightened. :-) When I proposed the "native language markup" scheme (for the ERCS project of the Standardization Project Regarding East Asian Documents of the China/Japan/ Korea Document Processing Group) which XML implements, we also developed the idea of "lowest-common-denominator naming". This means that you should only use characters in names which are available in all the systems through which the document will pass. So, if you have a requirement (known upfront) to save in ASCII, then you should use "ue" not "u umlaut". The best solution is to not create one in the first place! (For example, Japanese users should restrict themselves to only using characters in Shift JIS for names, not in JIS 212 or the additional sets coming.) I do not think there is any requirement for global interoperability of DTDs: if there is, then some system of numeric character references in names would be appropriate. However, I can suggest a 4th approach that may be better than your three. It is to provide a language or encoding specific fixed attribute, giving the ASCII form of the GI for use in dumping. OF course, it requires a minimum of smarts to convert to the new IDs. You might have an "also known as" aka attribute (I'll use B instead of esszet): <!ATTLIST straBe aka CDATA #FIXED "en-646 street de-8859-1 straBe de-646 strasse" > Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at cogsci.ed.ac.uk Tue Nov 4 15:46:09 1997 From: richard at cogsci.ed.ac.uk (Richard Tobin) Date: Mon Jun 7 16:58:48 2004 Subject: XML processing experiments Message-ID: <199711041546.PAA29474@stevenson.cogsci.ed.ac.uk> >I also tried LT XML (which is written in C). I didn't find a program that >did nothing but parsing. The fastest one I found was the sgcount >program (which counts the number of each element type); it took about 11 >seconds. That's much slower than I expected; I suspect there may be >some Windows-specific performance problems. It's true that we do our development under unix, and I don't have any benchmarks for MS Windows. I just ran "sgcount <ot.xml" on an AMD K5 PR-100 (supposedly equivalent to a 100MHz Pentium) under FreeBSD, and it took 6.8 seconds. This suggests that we run about twice as fast under unix as MS Windows, which is something we will have to look into. But in any case, the currently-released version of LT-XML (0.9.5) is far too slow on all platforms. The next version, which we hope to release by the end of the year, has a completely new parser and is roughly three times as fast. Why is the old version so slow? - It's written in yacc and lex. I didn't expect this to be slow, but profiling shows that it's spending most of its time in the yacc and lex internals, which we can't do much about. The new version is written in plain C, and I actually think it's much clearer. Yacc is not well-suited to the sort of context-dependent tokenising that is required in DTDs. We had to abandon lex anyway to handle 16-bit characters. - It does a malloc() and free() for every start tag, end tag, attribute name, attribute value, and pcdata. The new version only does that for attribute values and pcdata. Another reason that both versions are slower than the desperate C hacker's programs is that they maintain a stack of input sources to implement entity expansion. This adds an overhead even when entities are not being expanded. The figures above are all for 8-bit-character systems. The next release will have a compile-time option to support 16-bit characters. I expect the 16-bit version to be about 30% slower than the 8-bit version (for the same 8-bit data). We also plan to release the parser itself separately from the rest of the LT-XML/LT-NSL toolkit, for use in programs that just need an XML parser. I expect it be about 25% faster than the LT-XML version, just because a layer is removed. > >I was quite surprised that there was such a big performance difference > >between real, conforming XML processing that does well-formedness > >checking, and quick and dirty XML processing that does the minimum > >necessary to get the correct result. This doesn't seem right to me... It isn't, and we're hoping to reduce it. -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From istvanc at microsoft.com Tue Nov 4 16:57:42 1997 From: istvanc at microsoft.com (Istvan Cseri) Date: Mon Jun 7 16:58:49 2004 Subject: XML processing experiments Message-ID: <91B7E292027DCF1195CD08002BB690B00298B81E@red-93-msg.dns.microsoft.com> I can offer a couple of reasons why a 'real' parser would be slower then an ad-hoc processor: - abstraction for encapsulating different encodings - keeping track of line and column information for error reporting - storing attributes and checking for uniqueness - checking for valid element close tags - processing entity references In addition to this the MSXML parser is building the tree. We are going to have a version where this can be turned off but when XML is used as data it is extremely useful to have the tree around so you can actually do different kinds of lookups, navigation on it and can update it. Istvan > ---------- > From: James Clark[SMTP:jjc@jclark.com] > Reply To: James Clark > Sent: Tuesday, November 04, 1997 4:03 AM > To: XML Developers' List > Subject: XML processing experiments > > One nice feature of XML is that it is easily processable by the > Desperate C/C++/Java/Perl hacker: the syntax is simple enough that you > can do useful things with XML without a full XML parser. I've been > exploring this sort processing. If all you want to do is be able to > correctly parse well-formed XML, and you don't care about detecting > whether or not it is well-formed, how much code does it take and is it > significantly faster than using an XML parser that does proper > well-formedness or validity checking? > > I used Jon's Old Testament XML file as test data (after removing the > doctype line), which is about 3.7Mb. I ran the tests on a Toshiba > Tecra > 720CDT (133MHz Pentium, 80Mb RAM) with Windows NT 4.0. I used the IE > 4.0 Java VM. The timings I give are after a couple of runs, so there's > little or no disk I/O involved. Lark 0.97 parsed the file in about > 10.5 > seconds, MSXML in about 24 seconds. I suspect the difference is > partly > because MSXML is building a tree (I didn't see any command line switch > to turn this off). By comparison nsgmlsu -s took about 8 seconds. I > also tried LT XML (which is written in C). I didn't find a program > that > did nothing but parsing. The fastest one I found was the sgcount > program (which counts the number of each element type); it took about > 11 > seconds. That's much slower than I expected; I suspect there may be > some Windows-specific performance problems. > > The code I wrote is available at > <URL:ftp://ftp.jclark.com/pub/test/xmltok.zip>. First I wrote a little > library in C for doing XML "tokenization". This code just splits the > input up into "tokens" where each token is data or some kind of XML > markup (start-tag, end-tag, comment etc). The idea is that it does > the > minimum necessary to do any kind of useful XML-aware processing. I > wrote > a little application xmlec that just counts the number of elements in > an > XML document. This can compiled either to use Win32 file mapping (if > FILEMAP is defined) or normal read() calls. You'll probably have to > tweak the code a little if you're using anything other than Visual > C++. > I then translated this into Java (I'm not much of a Java programmer, > so > there's probably plenty of scope for improvement in the Java version). > > xmlec parses the test file in about 0.5 seconds. Using read() instead > of > file mapping increases the time to about 0.65 seconds. The Java > version takes about 1.5 seconds. > > I also wrote a Java version of the LT XML textonly program (which > extracts the non-markup of an XML document). The LT XML version ran > in > about 13.5 seconds. My Java version ran in about 3.5 seconds. > > The class files for the Java element counting program total about 6k. > The source for the C version is about 750 lines, including both the > file > mapping and read()ing version. > > I was quite surprised that there was such a big performance difference > between real, conforming XML processing that does well-formedness > checking, and quick and dirty XML processing that does the minimum > necessary to get the correct result. This doesn't seem right to me... > > > James > > xml-dev: A list for W3C XML Developers. To post, > mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Tue Nov 4 17:37:09 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:58:49 2004 Subject: How best to represent unrepresentable characters in NAME tokens? In-Reply-To: <19971103195249.AAA18429@AGREENE-PC.bitstream.com> Message-ID: <v03007802b084fbebc910@[205.181.197.107]> At 2:52 PM -0500 11/3/97, Andrew Greene wrote: >If you have a Unicode-friendly XML environment, then users can create >elements whose GIs or attribute names contain "interesting" >characters. (Yes? A NAME token can contain "BaseChars", which includes >characters beyond ASCII and even beyond Latin-1.) Sure can... I'l give my solution at the end, but first, a few comments on the suggestions. >So, if a user requests that the document instance be saved as an ASCII >file, what is the best way for a Unicode-aware and standards-compliant >application to represent these characters? <snip> >I've thought of three solutions: > >1. It's an error. Tell the user "Sorry, your file could not be saved > in that character encoding because the element name 'StraBe' could > not be represented. > > Advantages: It's fully compliant and no data can get lost. > > Disadvantages: No data can get out, either. Perhaps the user has > an 8-bit app to massage the data in a particular way, and she > doesn't want to rename all her elements. This works, but isn't needed. >2. Rename all the offending elements and attributes, and use PIs to > ensure that when they're read back in we can patch things up. > So, for example, the file could contain: > > <?GoodCitizen MangledGI Strae1="Straße"?> > <Strae1>foo bar</Strae1> > > Advantages: It's fully compliant. > > Disadvantages: It assumes that all other processing applications > will be nice and won't lose my processing instructions, and it > makes the file hard to read. It's also non-portable; unless we > as a community decide on a "semi-standard" PI to use, no one else > will know how to interpret this convention. (On the other hand, > this is exactly why I'm bringing the issue up here. Maybe we can > all agree on a semi-standard and I'll feel less uneasy about > doing something like this....) This is actively evil, in that it obfuscates the markup, and makes it impossible to validate against the original DTD. Validating against a DTD at all requires a DTD translation tool to change element and attribute names there as well. The use of PIs to affect the meaning of markup (as opposed to enable additional application processing that can't be expressed in markup) is generally a bad idea. In fact, most SGML experts concur that PIs are best used in _exceptional_ cases. The reason for this is that applications are allowed (and usually do) ignore any PIs that they are not specialized for. > >3. Violate the standard and use character entities to represent the > ineffable, for example: > > <Stra�xDF;e>foo bar</Stra�xDF;e> > > Advantages: It's compact and unambiguous (even if it's illegal :-). > > Disadvantages: It violates both XML and 8879 in a new and perverse > way. The user's file will not be usable by any other piece of > standards-compliant software. That's worse than refusing to write > the file at all (number 1). Yes, this is not good. >* Is there a need for a "semi-standard" solution to this problem, or am > I the only one struggling with it? Yes, but it's already built into XML. >* Is there interest in adopting some variation of number 2 so that we're > better able to exchange such data? Not from me... >* I can't help but think that number 3 would be the most elegant solution > if it were only legal. Yet I'm also sure that the XML committee had a > good reason for disallowing it. I'd be interested in hearing what their > reason was, so that I may become enlightened. :-) Part of it is simply compatibility -- this cannot be done in SGML. The argument about SGML compatibility is no worth rehashing here, the archive of the working group discussions include many messages on it. So now that I've objected to all three solutions, you may think I'm a negative kind of guy... But I do have a suggestion. Support for UTF-8 is required for XML processors, so that an "8-bit" tool can always be fed something that it can understand, even though some strings may look funny in some editors. Since XML parsers do _not_ perform any kind of character format normalization (e.g. of diacritical marks) each element name will be a constant string, even if that string is not readable. [[ Note for anyone who may be puzzled: UTF-8 is a clever little encoding trick that uses variable length character codes to represent the larger space of Unicode (and 10646) codes in 8-bit chunks. Codes < 128 represent USASCII, and codes above are concatenated together to represent large values. The details (and sample code in C) can be found at http://www.unicode.org/ So aplain ASCII file in UTF-8 looks the same, but other characters show up as strings with leading chars >= 128. One detail is that Latin-1 etc., are _not_ valid UTF-8 because they use the eighth-bit high codes for single characters.]] The core of your problime is the very good, and very real point: writers of XML processors need to remember that the Unicode basis of XML is fundamental -- so conversion to another character set may fail because the characters in a document may simply not exist in the target code. Of course, for many documents, the markup will allow transcoding to Latin-1 (and other local processing codes), but this does depend on the document. Text can be modified to use numeric character references but this is probably too horrible, especially for the asian ideographic scripts. So, you can keep your 8-bit tools, but you may need UTF-8 display code to make them maximally usable. -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Tue Nov 4 18:23:37 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:58:49 2004 Subject: Ampersand in URLs (was: RE: </> as end tag) In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-msg.dns.microsoft.com> Message-ID: <v03007806b08511d8ef83@[205.181.197.107]> At 8:27 PM -0500 11/2/97, Chris Lovett wrote: >As for the ampersands, this is a real problem. We found with our experience >with CDF that customers just can't handle putting & inside their URL's. >We want to comply with XML standards, but we also want XML to be successful >in the marketplace. One area that we didn't compromise is with case >sensitivity. The new parser is fully case sensitive - but with a switch >that sets it back to case insensitive for those people that are reading XML >that was generated before case sensitivity was decided. You have to make >some tough compromizes sometimes. There was a query on the XML-SIG about HTML and the ampersand rule (XML agrees with the HTML standard, but not all HTML implementations). I thought that my answer fits well in this discussion as well. Internet Explorer, ironically, already insists on the escaping of ampersand in some circumstances. All that I've tested, actually, but I won't assume that it follows the standard -- if they do, "some" should be changed to "all". I am not sure about the story with whitespace, but in fact, if they don't require & before space, it matters little to me, since space isn't legal in a URL. I don't see ampersand as a show stopper, especially once people realize how useful entities can be in modularizing long URLs. And, as Paul notes, we can fall back on the authoring tool argument. More important, since we have "draconian" error handling in XML, simple testing of the document will ensure that the error is detected (rather than the HTML case, where it depends on the browser that you test with). One of the biggest problems with HTML has been that that the standard and the implmentations differ(ed) so widely and on so many points -- this is a primary reason that we should be very careful to implement XML exactly. Consistent parsing will go a long way to salve the wounds of slight differences from HTML. Divergent syntax in any software that purports to be XML-compliant will cause real problems from users, who may not be technical enough to read and understand the specification to judge correctness of implementations. We're sure to have bugs, but implementors we have a very real responsibility to conform in every way that they can, regardless of what design decisions they would rather have made differently. This truth is what makes standards such an object of (seemingly pointless) passion -- because you have to take them as they _actually are_ if they are to have the value that they promise (even when you feel that that value is uncoscionably less than it could have been). -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Tue Nov 4 19:20:46 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:58:49 2004 Subject: How best to represent unrepresentable characters in NAMEtoken s? Message-ID: <7BB61B44F197D011892800805FD4F79201CD64AF@red-03-msg.dns.microsoft.com> I'm left unclear by this response. Suppose that I have a Java program with an object class called "$Price" and I want to serialize this into XML. Something such as the following is not legal XML: <$Price>15.95<$Price> What can I do? One thing I could do is to avoid such names when writing Java. But suppose that isn't an option. I could do the following: <OBJECT realtype="$Price">15.95</OBJECT> But, as you say, this "obfuscates the markup, and makes it impossible to validate against the original DTD" (in the sense that the declaration for the OBJECT element type would be almost meaningless). What is the recommended solution? --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: dgd@cs.bu.edu [SMTP:dgd@cs.bu.edu] > Sent: Tuesday, November 04, 1997 8:44 AM > To: xml-dev@ic.ac.uk > Subject: Re: How best to represent unrepresentable characters in > NAMEtokens? > > At 2:52 PM -0500 11/3/97, Andrew Greene wrote: > >If you have a Unicode-friendly XML environment, then users can create > >elements whose GIs or attribute names contain "interesting" > >characters. (Yes? A NAME token can contain "BaseChars", which includes > >characters beyond ASCII and even beyond Latin-1.) > > Sure can... > > I'l give my solution at the end, but first, a few comments on the > suggestions. > > >So, if a user requests that the document instance be saved as an ASCII > >file, what is the best way for a Unicode-aware and standards-compliant > >application to represent these characters? > > <snip> > >I've thought of three solutions: > > > >1. It's an error. Tell the user "Sorry, your file could not be saved > > in that character encoding because the element name 'StraBe' could > > not be represented. > > > > Advantages: It's fully compliant and no data can get lost. > > > > Disadvantages: No data can get out, either. Perhaps the user has > > an 8-bit app to massage the data in a particular way, and she > > doesn't want to rename all her elements. > > This works, but isn't needed. > > >2. Rename all the offending elements and attributes, and use PIs to > > ensure that when they're read back in we can patch things up. > > So, for example, the file could contain: > > > > <?GoodCitizen MangledGI Strae1="Straße"?> > > <Strae1>foo bar</Strae1> > > > > Advantages: It's fully compliant. > > > > Disadvantages: It assumes that all other processing applications > > will be nice and won't lose my processing instructions, and it > > makes the file hard to read. It's also non-portable; unless we > > as a community decide on a "semi-standard" PI to use, no one else > > will know how to interpret this convention. (On the other hand, > > this is exactly why I'm bringing the issue up here. Maybe we can > > all agree on a semi-standard and I'll feel less uneasy about > > doing something like this....) > > This is actively evil, in that it obfuscates the markup, and makes it > impossible to validate against the original DTD. Validating against a DTD > at all requires a DTD translation tool to change element and attribute > names there as well. The use of PIs to affect the meaning of markup (as > opposed to enable additional application processing that can't be > expressed > in markup) is generally a bad idea. In fact, most SGML experts concur that > PIs are best used in _exceptional_ cases. The reason for this is that > applications are allowed (and usually do) ignore any PIs that they are not > specialized for. > > > > >3. Violate the standard and use character entities to represent the > > ineffable, for example: > > > > <Stra�xDF;e>foo bar</Stra�xDF;e> > > > > Advantages: It's compact and unambiguous (even if it's illegal :-). > > > > Disadvantages: It violates both XML and 8879 in a new and perverse > > way. The user's file will not be usable by any other piece of > > standards-compliant software. That's worse than refusing to write > > the file at all (number 1). > > Yes, this is not good. > > >* Is there a need for a "semi-standard" solution to this problem, or am > > I the only one struggling with it? > > Yes, but it's already built into XML. > > >* Is there interest in adopting some variation of number 2 so that we're > > better able to exchange such data? > > Not from me... > > >* I can't help but think that number 3 would be the most elegant solution > > if it were only legal. Yet I'm also sure that the XML committee had a > > good reason for disallowing it. I'd be interested in hearing what their > > reason was, so that I may become enlightened. :-) > > Part of it is simply compatibility -- this cannot be done in SGML. The > argument about SGML compatibility is no worth rehashing here, the archive > of the working group discussions include many messages on it. > > So now that I've objected to all three solutions, you may think I'm a > negative kind of guy... But I do have a suggestion. > > Support for UTF-8 is required for XML processors, so that an "8-bit" tool > can always be fed something that it can understand, even though some > strings may look funny in some editors. Since XML parsers do _not_ perform > any kind of character format normalization (e.g. of diacritical marks) > each > element name will be a constant string, even if that string is not > readable. > > [[ Note for anyone who may be puzzled: UTF-8 is a clever little encoding > trick that uses variable length character codes to represent the larger > space of Unicode (and 10646) codes in 8-bit chunks. Codes < 128 represent > USASCII, and codes above are concatenated together to represent large > values. The details (and sample code in C) can be found at > http://www.unicode.org/ So aplain ASCII file in UTF-8 looks the same, but > other characters show up as strings with leading chars >= 128. One detail > is that Latin-1 etc., are _not_ valid UTF-8 because they use the > eighth-bit > high codes for single characters.]] > > The core of your problime is the very good, and very real point: writers > of > XML processors need to remember that the Unicode basis of XML is > fundamental -- so conversion to another character set may fail because the > characters in a document may simply not exist in the target code. Of > course, for many documents, the markup will allow transcoding to Latin-1 > (and other local processing codes), but this does depend on the document. > Text can be modified to use numeric character references but this is > probably too horrible, especially for the asian ideographic scripts. > > So, you can keep your 8-bit tools, but you may need UTF-8 display code to > make them maximally usable. > > -- David > > _________________________________________ > David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com > Boston University Computer Science \ Sr. Analyst > http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams > --------------------------------------------\ > http://www.dynamicDiagrams.com/ > MAPA: mapping for the WWW \__________________________ > > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Tue Nov 4 19:29:37 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:58:49 2004 Subject: Useless XML Statistics Message-ID: <199711041929.OAA03197@unready.microstar.com> Here are some stats from Alta Vista: Number of web pages mentioning: SGML and not XML.......................109,790 XML and not SGML.........................5,083 SGML and XML.............................8,409 Neither.............................77,726,900 By this measurement, full SGML is more than three times as popular as Monty Python (40,546 pages) and slightly more popular than even the Spice Girls (105,228 pages). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From scott at iguana.co.nz Tue Nov 4 23:43:22 1997 From: scott at iguana.co.nz (Scott Cooper) Date: Mon Jun 7 16:58:49 2004 Subject: new msxml behaviour Message-ID: <3.0.1.32.19971105124130.00a8d220@mail.iguana.co.nz> the new msxml contains this code within getText of ElementImpl which changes the behaviour of entity expansions from the previous version: for (Enumeration en = children.elements(); en.hasMoreElements(); ) { if (sb.length() > 0) sb.append(' '); sb.append(((Element)en.nextElement()).getText()); } return sb.toString(); notice the appending of a space. is this appropriate? it means constructs like 'abc&SOME.ENTITY;def' expand to 'abc SOME.ENTITY.CONTENTS def' rather than 'abcSOME.ENTITY.CONTENTSdef' like it used to which really stuffs my application. the last time i tried to 'improve' msxml the damn thing proved incredibly difficult to recompile due to some ridiculuous circular dependancies among the files - god knows how ms compiled it in the first place - anyway i now see this awful dll rubbish in there so before i attempt to make a makefile (*please* supply one next time ms :) ) is this in fact a problem? or should i change my approach. i was also wondering whether defaults for attributes should appear to the application if the attribute isn't explicitly given in the markup. right now i've added a function to traverse the tree and insert all attribute defaults (if needed) before i start processing the document - what do you think of that? the msxml api was awful for getting schema information such as default values. now it has a 'toSchema' function which returns an element with child elements for each attribute. the child element's tag is 'ATTRIBUTE' and it contains attributes such as 'XML:ID' containing the attribute name and 'XML:DEFAULT' containing the default, for instance. this is an incredibly convoluted method for accessing such information - are there any other xml parsers out there that attach schema information to the markup element itself - like element.getAttribute("xyz").getDefaultValue() rather than document.getElementDecl("abc").getChild("xyz").getAttribute("XML:DEFAULT"). finally (i've been saving up questions) i'd like this construct to be parsed as a <bar> element... <!ENTITY foo '<![CDATA[ <bar>blah blah</bar> ]]>'> ... &foo; but instead, &foo; is processed as PCDATA (by msxml). is this correct behaviour? section 4.4 of the xml ref contains the following: '6.For an internal (text) entity, the processor must include the entity; that is, retrieve its replacement text and process it as a part of the document (i.e. as content or AttValue, whichever was being processed when the reference was recognized), passing the result to the application in place of the reference. The replacement text may contain both text and markup, which must be recognized in the usual way...' well i haven't received any messages from the list today (maybe you're all in bed on US time) so how about chewing on that for me 'cos i must say it's a pain to rewrite your code when you had to hack it in the first place to work. p.s. i'm using xml to define the syntax and byte data of a peer-to-peer network interaction over pacnet and there aren't any PIs. if anyone would like to check out what i've done i'd greatly appreciate any opinions. --------------------------------------------------------------------- Iguana Information Services Ph +64 4 499 9782 PO Box 10 609 Fax +64 4 499 4439 Wellington Email scott@iguana.co.nz New Zealand HTTP http://www.iguana.co.nz xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Ingo.Macherius at TU-Clausthal.de Wed Nov 5 04:16:08 1997 From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius) Date: Mon Jun 7 16:58:49 2004 Subject: announce: slides for talk on xml Message-ID: <199711050416.FAA07513@sinfonix.rz.tu-clausthal.de> Hello, there are PostScript and PowerPoint versions of a 23-slides talk on XML available at http://www.heim9.tu-clausthal.de/~inim/xml/dfn-bt-97/ I'm very interested in feedback (including spelling and translation errors in the english version). The URL is not yet permanent. Enjoy. ++im -- Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/ Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Wed Nov 5 08:35:32 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:49 2004 Subject: Entity processing (was new msxml behaviour) Message-ID: <3.0.32.19971104214605.00b621d8@pop.intergate.bc.ca> At 12:41 PM 05/11/97 +1300, Scott Cooper wrote: >the new msxml contains this code ... > if (sb.length() > 0) > sb.append(' '); >notice the appending of a space. is this appropriate? Recent WG decisions require that parameter entity expansions (outside of entitity values) should be forced to match an even number tokens simply by appending & prepending spaces to their expansion; I hypothesize that this is what the msxml code is doing. Of course, this can't be done when building the replacement text of an internal text entity. To aid in sorting this, out, I attach a couple of my test files; credit is due to Henry Thomson, Michael Sperberg-McQueen, and likely others for helping cook these up. I *think* that the behavior of Lark 0.97 on these is per the spec. But it's there's enough hair on this set of problems that there's lots of ways I could be wrong. -Tim -------------- next part -------------- <?XML version='1.0'?> <!DOCTYPE test [ <!ELEMENT test (#PCDATA) > <!ENTITY % xx '%zz;'> <!ENTITY % zz '<!ENTITY tricky "error-prone" >' > %xx; <!ENTITY example "<p>An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;).</p>" > <!ENTITY % pub "Éditions Gallimard" > <!ENTITY rights "All rights reserved" > <!ENTITY book "La Peste: Albert Camus, © 1947 %pub;. &rights;" > <!ENTITY rights " - all rights reserved - "> ]> <test> <e1>This sample shows a &tricky; method.</e1> <e2>&example;</e2> <e3>&book;</e3> </test> -------------- next part -------------- <!DOCTYPE FOO [ <!ENTITY lt "<"> <!ENTITY b "{Value of b}"> <!ENTITY weird 'foMo<bar'> <!ENTITY % d '{Value of d}'> <!ENTITY % defatt1 'DEF1 CDATA "{default 1}"'> <!ENTITY % bazatt '<!ATTLIST BAZ BIFF CDATA "QuidNunc" BAR CDATA "xyxy">'> <!ELEMENT FOO - - (#PCDATA|BAZ|BUM)*> <!ELEMENT BAZ - - (#PCDATA)> <!ELEMENT BUM - - (#PCDATA)> <!ATTLIST FOO BAR CDATA "A&b;C%d;EMF"> <!ATTLIST FOO BAR CDATA "Should not appear"> %bazatt; <!ATTLIST BAZ %defatt1;> ]> <!--<FOO test='KLMNO'>hi <there--> <FOO> <BAZ BAR="A&b;C%d;EMF"></BAZ> <BUM>&weird;</BUM> </FOO> From tbray at textuality.com Wed Nov 5 08:36:07 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:49 2004 Subject: XML processing experiments Message-ID: <3.0.32.19971105003221.00b6529c@pop.intergate.bc.ca> First off, thanks to James for a some very thought-provoking work. At 07:03 PM 04/11/97 +0700, James Clark wrote: >If all you want to do is be able to >correctly parse well-formed XML, and you don't care about detecting >whether or not it is well-formed, how much code does it take and is it >significantly faster than using an XML parser ... >Lark: 10.5 seconds .. MSXML: 24 .. nsgmlsu: 8 .. sgcount:11 .. >xmlec (C): 0.5 seconds .. (Java): 1.5 seconds. [BTW, when I got Lark to run "almost as fast as SP", I decided that was qualitatively fast enough for now]. >I was quite surprised that there was such a big performance difference No kidding. Discussions here are a bit dangerous, since in the Java domain, we are kind of operating in the dark; we don't have profiling tools with really good granularity. This is my excuse for engaging in performance analysis based on intuition, something for which I have personally fried more than one junior programmer. Let's look at James' code eating up a "-quoted literal, where characters are in the byte array buf[], start and end being integer indices therein: case (byte)'"': { for (++start; start != end; ++start) { if (buf[start] == (byte)'"') { nextTokenIndex = start + 1; return TOK_LITERAL; } The following are candidates for why a program like Lark or MSXML might run slower. - works with Java char rather than byte variables - does a method dispatch (or at least a few conditionals) per character processed for at least two reasons: to manage the entity stack, and to have a place to put the different character encoding processing modules. [Note: A look at James' code makes me wonder if this is *really* as necessary as I thought] - does quite a bit more work upon recognizing some markup constructs; in particular for a start tag it pulls appart the attribute list and packages up the element type & attributes in a nice structure convenient for an API user I went and looked at Lark's main loop, and for a 'typical' character processing mode, i.e. it's not the begin or end of a tag or attribute or something and no buffers run out but the text is being saved, it ends up executing 25 lines of Java including one getXmlCharacter() method dispatch; none of them are monster conditionals or anything. James' code above, in the equivalent case, is executing 3 I think. so while lines-of-code is very shaky yardstick indeed, the difference is 8 or 9 to 1, which is not out of line with the observed performance difference. My intuition is that what's holding Lark back is (a) the per-char dispatching, and (b) turning the DFA crank, which requires a 2D array reference, then a shift & mask I have some ideas on how to fix both, but first I have to make Lark do conditional sections and validate (neither should slow it down significantly). One other experiment would be useful, that might shed light from a different angle. James, how about doing element counts per type; i.e. actually *using* some of the info come back from the tokenizer, nothing fancy, just use a java.util.Hashtable or some such; should be able to run very similar code on Lark and your TokenStream thing; I wonder if it would change the numbers. I'll get around to this sometime if nobody else does, but not for the next 2-3 weeks. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Wed Nov 5 15:37:15 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:58:49 2004 Subject: new msxml behaviour In-Reply-To: <3.0.1.32.19971105124130.00a8d220@mail.iguana.co.nz> (message from Scott Cooper on Wed, 05 Nov 1997 12:41:30 +1300) Message-ID: <199711051541.KAA00205@geode.ora.com> > notice the appending of a space. is this appropriate? it means > constructs like 'abc&SOME.ENTITY;def' expand to 'abc > SOME.ENTITY.CONTENTS def' rather than 'abcSOME.ENTITY.CONTENTSdef' > like it used to which really stuffs my application. That's absolutely uncool. My résumé is NOT a r é sum é . This will, I hope, be fixed before the actual release. -Chris -- <!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN" "<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487 <USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jimg at digitalthink.com Wed Nov 5 17:28:49 1997 From: jimg at digitalthink.com (Jim Gindling) Date: Mon Jun 7 16:58:49 2004 Subject: Entity processing (was new msxml behaviour) Message-ID: <01BCE9CD.0E694A00.jimg@digitalthink.com> Hi all, Could somebody post the result of parsing the files Tim Bray posted according to spec since there seems to be some question as to whether or not msxml is doing it properly. Thanks in advance. Jim On Wednesday, November 05, 1997 12:36 AM, Tim Bray [SMTP:tbray@textuality.com] wrote: > At 12:41 PM 05/11/97 +1300, Scott Cooper wrote: > >the new msxml contains this code ... > > if (sb.length() > 0) > > sb.append(' '); > >notice the appending of a space. is this appropriate? > > Recent WG decisions require that parameter entity expansions (outside > of entitity values) should be forced to match an even number tokens > simply by appending & prepending spaces to their expansion; I > hypothesize that this is what the msxml code is doing. > > Of course, this can't be done when building the replacement text > of an internal text entity. To aid in sorting this, out, I attach > a couple of my test files; credit is due to Henry Thomson, Michael > Sperberg-McQueen, and likely others for helping cook these up. > > I *think* that the behavior of Lark 0.97 on these is per the spec. > But it's there's enough hair on this set of problems that there's > lots of ways I could be wrong. -Tim << File: EntVal.xml >> << File: > EntVal2.xml >> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at cogsci.ed.ac.uk Wed Nov 5 17:39:07 1997 From: richard at cogsci.ed.ac.uk (Richard Tobin) Date: Mon Jun 7 16:58:50 2004 Subject: Entity processing (was new msxml behaviour) In-Reply-To: Jim Gindling's message of Wed, 5 Nov 1997 09:27:36 -0800 Message-ID: <199711051738.RAA04357@stevenson.cogsci.ed.ac.uk> >Could somebody post the result of parsing the files Tim Bray posted according >to spec since there seems to be some question as to whether or not msxml is >doing it properly. Well here's what my XML parser makes of them. It's *intended* to parse according to spec :-) -- Richard EntVal.xml: <?XML VERSION="1.0" ENCODING="ISO-8859-1" RMD="ALL"?> <!doctype TEST [ <!ELEMENT test (#PCDATA) > <!ENTITY % xx '%zz;'> <!ENTITY % zz '<!ENTITY tricky "error-prone" >' > %xx; <!ENTITY example "<p>An ampersand (&#38;) may be escaped numerically (&#38;#38;) or with a general entity (&amp;).</p>" > <!ENTITY % pub "Éditions Gallimard" > <!ENTITY rights "All rights reserved" > <!ENTITY book "La Peste: Albert Camus, © 1947 %pub;. &rights;" > <!ENTITY rights " - all rights reserved - "> ]> <TEST> <E1>This sample shows a error-prone method.</E1> <E2><P>An ampersand (&) may be escaped numerically (&) or with a general entity (&).</P></E2> <E3>La Peste: Albert Camus, © 1947 Éditions Gallimard. All rights reserved</E3> </TEST> It also produces the following warnings: Warning: Ignoring redefinition of entity rights in unnamed entity at line 13 char 47 of file:/home/richard/XML/EntVal.xml Warning: start tag for undeclared element E1; declaring it to have content ANY in unnamed entity at line 16 char 4 of file:/home/richard/XML/EntVal.xml Warning: start tag for undeclared element E2; declaring it to have content ANY in unnamed entity at line 17 char 4 of file:/home/richard/XML/EntVal.xml Warning: start tag for undeclared element P; declaring it to have content ANY in entity "example" defined at line 7 char 1 of file:/home/richard/XML/EntVal.xml in unnamed entity at line 17 char 14 of file:/home/richard/XML/EntVal.xml Warning: start tag for undeclared element E3; declaring it to have content ANY in unnamed entity at line 18 char 4 of file:/home/richard/XML/EntVal.xml EntVal2.xml: <!doctype FOO [ <!ENTITY lt "<"> <!ENTITY b "{Value of b}"> <!ENTITY weird 'foMo<bar'> <!ENTITY % d '{Value of d}'> <!ENTITY % defatt1 'DEF1 CDATA "{default 1}"'> <!ENTITY % bazatt '<!ATTLIST BAZ BIFF CDATA "QuidNunc" BAR CDATA "xyxy">'> <!ELEMENT FOO - - (#PCDATA|BAZ|BUM)*> <!ELEMENT BAZ - - (#PCDATA)> <!ELEMENT BUM - - (#PCDATA)> <!ATTLIST FOO BAR CDATA "A&b;C%d;EMF"> <!ATTLIST FOO BAR CDATA "Should not appear"> %bazatt; <!ATTLIST BAZ %defatt1;> ]> <!--<FOO test='KLMNO'>hi <there--><FOO BAR="A{Value of b}C%d;EMF"> <BAZ BAR="A{Value of b}C%d;EMF" DEF1="{default 1}" BIFF="QuidNunc"></BAZ> <BUM>foMo<bar</BUM> </FOO> Warnings: Warning: Ignoring redeclaration of attribute BAR in unnamed entity at line 14 char 45 of file:/home/richard/XML/EntVal2.xml xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Wed Nov 5 18:14:54 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:58:50 2004 Subject: Entity processing (was new msxml behaviour) Message-ID: <7BB61B44F197D011892800805FD4F79201CD64C7@red-03-msg.dns.microsoft.com> Yes, the MSXML code is attempting to match the XML specs as recently revised by the WG decision. (Thanks for the test files, Tim, Henry and Michael: We'll use these in testing.) --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: Tim Bray [SMTP:tbray@textuality.com] > Sent: Wednesday, November 05, 1997 12:36 AM > To: xml-dev@ic.ac.uk > Subject: Entity processing (was new msxml behaviour) > > At 12:41 PM 05/11/97 +1300, Scott Cooper wrote: > >the new msxml contains this code ... > > if (sb.length() > 0) > > sb.append(' '); > >notice the appending of a space. is this appropriate? > > Recent WG decisions require that parameter entity expansions (outside > of entitity values) should be forced to match an even number tokens > simply by appending & prepending spaces to their expansion; I > hypothesize that this is what the msxml code is doing. > > Of course, this can't be done when building the replacement text > of an internal text entity. To aid in sorting this, out, I attach > a couple of my test files; credit is due to Henry Thomson, Michael > Sperberg-McQueen, and likely others for helping cook these up. > > I *think* that the behavior of Lark 0.97 on these is per the spec. > But it's there's enough hair on this set of problems that there's > lots of ways I could be wrong. -Tim << File: EntVal.txt >> << File: > EntVal2.txt >> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Wed Nov 5 19:35:38 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:50 2004 Subject: Entity processing (was new msxml behaviour) Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCEB@red-17-msg.dns.microsoft.com> The code; if (sb.length() > 0) sb.append(' '); is a bug and will be removed from next version, along with several other things :-) > -----Original Message----- > From: Jim Gindling [SMTP:jimg@digitalthink.com] > Sent: Wednesday, November 05, 1997 9:28 AM > To: xml-dev@ic.ac.uk > Subject: RE: Entity processing (was new msxml behaviour) > > Hi all, > > Could somebody post the result of parsing the files Tim Bray posted > according > to spec since there seems to be some question as to whether or not > msxml is > doing it properly. > > Thanks in advance. > > > Jim > > On Wednesday, November 05, 1997 12:36 AM, Tim Bray > [SMTP:tbray@textuality.com] > wrote: > > At 12:41 PM 05/11/97 +1300, Scott Cooper wrote: > > >the new msxml contains this code ... > > > if (sb.length() > 0) > > > sb.append(' '); > > >notice the appending of a space. is this appropriate? > > > > Recent WG decisions require that parameter entity expansions > (outside > > of entitity values) should be forced to match an even number tokens > > simply by appending & prepending spaces to their expansion; I > > hypothesize that this is what the msxml code is doing. > > > > Of course, this can't be done when building the replacement text > > of an internal text entity. To aid in sorting this, out, I attach > > a couple of my test files; credit is due to Henry Thomson, Michael > > Sperberg-McQueen, and likely others for helping cook these up. > > > > I *think* that the behavior of Lark 0.97 on these is per the spec. > > But it's there's enough hair on this set of problems that there's > > lots of ways I could be wrong. -Tim << File: EntVal.xml >> << File: > > EntVal2.xml >> > > xml-dev: A list for W3C XML Developers. To post, > mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Wed Nov 5 19:45:25 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:50 2004 Subject: new msxml behaviour Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCEC@red-17-msg.dns.microsoft.com> > the new msxml contains this code within getText of ElementImpl which > changes the behaviour of entity expansions from the previous version: > > for (Enumeration en = children.elements(); en.hasMoreElements(); ) > { > if (sb.length() > 0) > sb.append(' '); > sb.append(((Element)en.nextElement()).getText()); > } > > return sb.toString(); > > notice the appending of a space. is this appropriate? it means constructs > like 'abc&SOME.ENTITY;def' expand to 'abc SOME.ENTITY.CONTENTS def' rather > than 'abcSOME.ENTITY.CONTENTSdef' like it used to which really stuffs my > application. > [Chris Lovett] This was bogus - and will be removed. > the last time i tried to 'improve' msxml the damn thing proved incredibly > difficult to recompile due to some ridiculuous circular dependancies among > the files - god knows how ms compiled it in the first place - anyway i now > see this awful dll rubbish in there so before i attempt to make a makefile > (*please* supply one next time ms :) ) is this in fact a problem? or > should > i change my approach. > [Chris Lovett] Sorry about that. The wonders of Visual J++ I guess. > i was also wondering whether defaults for attributes should appear to the > application if the attribute isn't explicitly given in the markup. right > now i've added a function to traverse the tree and insert all attribute > defaults (if needed) before i start processing the document - what do you > think of that? > [Chris Lovett] Yes, we've talked about this in the DOM group. > the msxml api was awful for getting schema information such as default > values. now it has a 'toSchema' function which returns an element with > child elements for each attribute. the child element's tag is 'ATTRIBUTE' > and it contains attributes such as 'XML:ID' containing the attribute name > and 'XML:DEFAULT' containing the default, for instance. this is an > incredibly convoluted method for accessing such information - are there > any > other xml parsers out there that attach schema information to the markup > element itself - like element.getAttribute("xyz").getDefaultValue() rather > than > document.getElementDecl("abc").getChild("xyz").getAttribute("XML:DEFAULT") > . > [Chris Lovett] Careful how much you rail against this schema format. This is just plain XML you know. If you are having a hard time navigating plain XML then perhaps the Object Model needs richer navigational methods... > finally (i've been saving up questions) i'd like this construct to be > parsed as a <bar> element... > > <!ENTITY foo '<![CDATA[ > <bar>blah blah</bar> > ]]>'> > > ... > > &foo; > > but instead, &foo; is processed as PCDATA (by msxml). is this correct > behaviour? section 4.4 of the xml ref contains the following: '6.For an > internal (text) entity, the processor must include the entity; that is, > retrieve its replacement text and process it as a part of the document > (i.e. as content or AttValue, whichever was being processed when the > reference was recognized), passing the result to the application in place > of the reference. The replacement text may contain both text and markup, > which must be recognized in the usual way...' [Chris Lovett] You'll get a <BAR> element in the object model if you drop the CDATA section. So you're entity would be: <!ENTITY foo '<bar>blah blah blah</bar>'> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From scott at iguana.co.nz Wed Nov 5 21:06:05 1997 From: scott at iguana.co.nz (Scott Cooper) Date: Mon Jun 7 16:58:50 2004 Subject: conflicting answers Message-ID: <3.0.1.32.19971106100325.00ae6d50@mail.iguana.co.nz> thanks for all the responses - sorry to chris for any 'railing' but i was having an extra annoying day. just to summarize: tim says: Recent WG decisions require that parameter entity expansions (outside of entitity values) should be forced to match an even number tokens simply by appending & prepending spaces to their expansion; I hypothesize that this is what the msxml code is doing. chris maden says: That's absolutely uncool. My résumé is NOT a r ? sum ? . This will, I hope, be fixed before the actual release. andrew says: Yes, the MSXML code is attempting to match the XML specs as recently revised by the WG decision. (Thanks for the test files, Tim, Henry and Michael: We'll use these in testing.) chris says: The code; if (sb.length() > 0) sb.append(' '); is a bug and will be removed from next version, along with several other things :-) so could someone please explain the WG decision and hasn't chris maden got a pretty good point? chris (lovett): i wanted to use CDATA in an entity definition 'cos i might stick a lot of crazy characters in there - like some apostrophes for instance - but reading tim's test files i see it's common to escape tricky characters with their hex equivalents - it's just a pain if you've got a large entity definition. am i using entities incorrectly? - i think of them as macros and i expect them to be expanded as normal text and then the expanded text parsed so sticking a CDATA in there shouldn't really matter. finally (msxml question): in my dtd i want to use entities to set attribute defaults... <!ENTITY % foo 'abc'> <!ENTITY % bar '"def"'> <!ELEMENT baz EMPTY> <!ATTLIST baz arg1 CDATA '%foo;' arg2 CDATA %bar;> under msxml, arg1 gets a default of '%foo;' and arg2 gets a default of 'def', suggesting parameter entity references aren't expanded within the default string in an attdef(?). good, bad and/or ugly? it means tim's entval2.xml has this line in it's expansion under msxml: BAR CDATA "A{Value of b}C%d;EMF" thanks heaps for those test files - i was trying every which way of getting the defaults to expand properly within the dtd - and now i can! --------------------------------------------------------------------- Iguana Information Services Ph +64 4 499 9782 PO Box 10 609 Fax +64 4 499 4439 Wellington Email scott@iguana.co.nz New Zealand HTTP http://www.iguana.co.nz xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From bernerd.anderson at exchange.pnl.gov Wed Nov 5 21:11:17 1997 From: bernerd.anderson at exchange.pnl.gov (Anderson, Bernerd J) Date: Mon Jun 7 16:58:50 2004 Subject: HELP - XML to Oracle (and back)! Message-ID: <7A8CF1DC6A9DD0118EA400A024BF29DA0121EFDE@pnlmse2.pnl.gov> Folks - I'm trying to find some information about loading data in an XML format into an Oracle 7.3 database and am close to 'scraping the bottom of the barrel' without finding anything useful! I have a customer requirement to export data from any of several relational databases (Oracle, Sybase, Informix, etc.) into an XML formatted data file, transport the file across the internet, and load it into an Oracle 7.3 database. SQL*Loader doesn't support this format and I'm not aware of any other means of accomplishing this task. I've submitted an Oracle TAR, tried the MetalLink (Oracle Support) forum, emailed questions to several of my colleagues, and cruised various web sites, following threads - all without success. Questions that I need answers for are: 1) Is there a utility that will translate XML formatted files into a format readable by SQL*Loader (binary or character)? 2) Are there any known means of loading an XML formatted file into Oracle, besides SQL*Loader? 3) Has anyone had any experience loading XML formatted data files into Oracle 7.3? 4) Can XML formatted files be created from Oracle 7.3? If you can answer 'yes' to any of the above questions, details would be gratefully appreciated!! Thanks in advance, Bern Anderson (509) 375-2483 Email: bj.anderson@pnl.gov Battelle Pacific Northwest National Laboratory P.O. Box 999 MSIN: K7-63, Richland, WA 99352 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Nov 5 21:48:08 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:58:50 2004 Subject: conflicting answers In-Reply-To: <3.0.1.32.19971106100325.00ae6d50@mail.iguana.co.nz> References: <3.0.1.32.19971106100325.00ae6d50@mail.iguana.co.nz> Message-ID: <199711052145.QAA07247@unready.microstar.com> Scott Cooper writes: > chris maden says: > That's absolutely uncool. My résumé is NOT a r ? sum ? . > This will, I hope, be fixed before the actual release. > > so could someone please explain the WG decision and hasn't chris maden got > a pretty good point? Chris has a résumé, not a r%eacute;sum%eacute;. If he did have a "r%eacute;sum%eacute;" as part of an attribute value literal, the spaces would not be affixed. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Thu Nov 6 02:24:14 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:50 2004 Subject: XML processing experiments Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCF5@red-17-msg.dns.microsoft.com> > At 07:03 PM 04/11/97 +0700, James Clark wrote: > >If all you want to do is be able to > >correctly parse well-formed XML, and you don't care about detecting > >whether or not it is well-formed, how much code does it take and is it > >significantly faster than using an XML parser ... > >Lark: 10.5 seconds .. MSXML: 24 .. nsgmlsu: 8 .. sgcount:11 .. > >xmlec (C): 0.5 seconds .. (Java): 1.5 seconds. > [Chris Lovett] Tree building is a killer for large documents because of the heap activity. Initial experiments that stub out tree building in MSXML show about a 100% improvement on my machine. > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Thu Nov 6 07:05:46 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:50 2004 Subject: attribute defaults Message-ID: <3.0.32.19971105230218.00b566ac@pop.intergate.bc.ca> At 12:41 PM 05/11/97 +1300, Scott Cooper wrote: >i was also wondering whether defaults for attributes should appear to the >application if the attribute isn't explicitly given in the markup. Yes. Unambiguously. Check the spec, on attribute defaults. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Thu Nov 6 07:06:12 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:50 2004 Subject: conflicting answers Message-ID: <3.0.32.19971105230502.00b6e240@pop.intergate.bc.ca> At 10:03 AM 06/11/97 +1300, Scott Cooper wrote: >chris maden says: >That's absolutely uncool. My résumé is NOT a r ? sum ? . >This will, I hope, be fixed before the actual release. Spaces are attached (a) only to parameter entity expansions, and (b) only when not in entity replacement string declarations. SO Chris has no problem here. >suggesting parameter entity references aren't expanded within the >default string in an attdef(?). good, bad and/or ugly? Wrong. They are supposed to be expanded. -T. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Arjan.Loeffen at let.ruu.nl Thu Nov 6 12:38:00 1997 From: Arjan.Loeffen at let.ruu.nl (Arjan Loeffen) Date: Mon Jun 7 16:58:50 2004 Subject: RMD: required versus optional parsing? Message-ID: <3461BB81.16B40EAF@let.ruu.nl> Dear reader, should I interpret the RMD='NONE' to mean: 1) parsing the DTD is optional; no need to parse it in order to process the instance 2) the DTD must not be parsed, as - the instance holds all information you need for a correct interpretation - it may affect the interpretation of the instance - it may contain errors - it may not exists Section 2.10 of the aug. 7, 1997 draft I have before doesn't make this clear to me. Related to this: does 'parsing' in this sense mean building up an auxiliary information structure (DOM-like), or just determining the validity of (syntax of) the declaration sets? cf.: <?XML VERSION="1.0" RMD="NONE"> <!DOCTYPE ABC [hello]> <ABC>...</ABC> Should this 'parse'? A reference to a previous XML discussion or any other explanatory text is welcome. Thanks in advance, Arjan Loeffen -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 208 bytes Desc: Card for Arjan Loeffen Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971106/914ccfe6/vcard.vcf -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 207 bytes Desc: Card for Arjan Loeffen Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971106/914ccfe6/vcard-0001.vcf From aray at q2.net Thu Nov 6 17:22:53 1997 From: aray at q2.net (Arjun Ray) Date: Mon Jun 7 16:58:50 2004 Subject: conflicting answers In-Reply-To: <3.0.32.19971105230502.00b6e240@pop.intergate.bc.ca> Message-ID: <Pine.LNX.3.95.971106121613.17494M-100000@mail.q2.net> On Wed, 5 Nov 1997, Tim Bray wrote: > At 10:03 AM 06/11/97 +1300, Scott Cooper wrote: > > >suggesting parameter entity references aren't expanded within the > >default string in an attdef(?). good, bad and/or ugly? > > Wrong. They are supposed to be expanded. -T. Really? It's not that way in SGML. Clause 11.3.4 "Default Value" (p. 424-5 in the Handbook) has this [147] default value = ( ( RNI, "FIXED", ps+ ) ?, attribute value specification ) | ... which basically means that a string literal for the a.v.s will not be replaceble parameter data. FWIW, nsgmls doesn't expand PEs here either. Arjun xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Nov 6 20:47:03 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:50 2004 Subject: JUMBO and CML1.2 In-Reply-To: <3.0.32.19971105230502.00b6e240@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971106212837.2ce7601a@pop3.demon.co.uk> The latest snapshot of JUMBO (Java Universal (Molecular | Markup) Browser for Objects) and CML1.2 (Chemical Markup Language) is available at: http://www.vsms.nottingham.ac.uk/vsms/java/jumbo/cml12 JUMBO and CML are independent, but co-evolve. At present they don't have versions, but have deadline-determined snapshots. That latest has been for distribution on CDROM for the Royal Society of Chemistry as part of Henry Rzepa's electronic conferences. (Henry and I are convinced that XML is the right way for technical publishing to go - evangelism from members of this list is taken for granted :-). Amongst the distribution are: - a large number of CML examples - copious (if occasionally diffuse) HTML commentaries, tutorials, etc - JUMBO, aligned to act as an applet under a Java-enabled browser. I have not tested it with MSIE4, but it works under NS4 (much better than 3). It also works standalone with a Java interpreter. The applets *can* be viewed over the net but downloads may take time. Downloading the *.zip/tar.gz is recommended. I *think* there are two up-to-date mirrors. (Anyone prepared to mirror this distribution? :-) JUMBO is NOT just for molecules, but reads PLAY and has (with some effort) eaten its way through Macbeth. There are also examples of taxonomies and a catalog of engineering materials data. The showpiece is a complete scientific paper marked up with HTML, images, molecules, spectra, crystallography, bibliography and XML:links. JUMBO has been tracking the recent specs and now has proof of concept for: - interoperating with Lark and NXP (but not yet Xapi-J - has any parser writer yet implemented this?) - recognising namespaces, and linking to schema files - using the schema files for adding machine-readable semantics (Java classes) on a per-element basis (thus <MOL> loads MOL.class at run-time). - displaying and editing: attributes contents tree hierarchy - displaying (but not editing) mixed content as HTML. - implementing almost all XML:link="SIMPLE". ('EMBED' is difficult for tree-based display at present.) - proof of concept (pre-XML:link) of EXTENDED links - saving files in standalone mode - Almost full implementation of TEI Xpointers (what does 'SPAN' mean in a tree?) - implementation of proof-of-concept for resolution of semantics by linking to Virtual HyperGlossaries. (Soon to be drastically modified for the better with XML:link and XSL.) - on-the-fly conversion of legacy files into trees (and hence to XML). About 15 legacy types from molecular science are covered. Others can be hacked (if the legacy files are easy to read :-) The later snapshot (not yet distributed) includes more use of schema files and first steps in XSL. [Note: JUMBO is still JDK1.02 - I was waiting for full browser support. Some windows do not always display gracefully and the scrolling is horrible. I am not alone in these problems :-). JUMBO is slow for large documents because it (a) creates fully subclassed objects for each node at display time and (b) some of these objects have many data members. One large todo is to devise a lazier model for processing and display of nodes.] CML is a fully XML-compliant application with a minimal tagset designed for maximum flexibility in prototyping molecular applications. It includes generic support for technical data (not just chemistry) especially numeric quantities with SI units ('metric') and others. Those in technical disciplines may find it useful. Several people have asked about the future of JUMBO and some have offered to contribute :-). Ideally I would like JUMBO to evolve along the lines of TeX/LaTeX or tcl/tk. The basis of these is a tightly controlled core with extensions supplied by volunteers. The results are freely available, but not public domain (i.e. a GNU-like or slightly more restrictive license). I was very impressed with the way that the tcl/tk project ran - equalled only in my experience by the XML decision-making process. Because JUMBO tries to track the XML standards and because it is critical not to have mutant implementations I am minded not to release source code except to those actively involved in the development of the core. With Java this is an attractive option as it is a 'run-anywhere' option (Jumbo is 100% pure). Moreover the discipline of writing for extensibility (i.e. through interfaces and subclasses) is an extremely good one for both the developers and the extenders. Note that the key aims of JUMBO do NOT compete with what other members of this list are doing. JUMBO currently has goals like: - provide a sound core for building prototypes - be developed for pedagogy rather than performance - act as a demonstrator for the XML effort - help to explore problems in drafts of standards at an early stage. It will not compete either with commercial browser/editor/transformers (which are optimised for additional criteria such as performance, interoperability with legacy systems, etc.) or with Amaya (the W3C's reference browser). Hopefully JUMBO will interoperate with all of these so that extensions developed for JUMBO are transportable to more efficient environments. I'd be grateful for feedback on these ideas. If there is interest, please let me know - it may take a little while to pull things together (Warning, some of the code reflects the evolution of the specs, some my 'learning curve' :-). If not, JUMBO will plod ahead when I have a few midnights free and my laptop works. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Thu Nov 6 21:12:01 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:50 2004 Subject: conflicting answers Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FD18@red-17-msg.dns.microsoft.com> > >suggesting parameter entity references aren't expanded within the > >default string in an attdef(?). good, bad and/or ugly? > > Wrong. They are supposed to be expanded. -T. > [Chris Lovett] I think the spec says normal references are ok, but parameter entities are not: AttValue ::= '"' ([^<&"] | Reference)* '"' | "'" ([^<&'] | Reference)* "'" and Reference doesn't include PEReference: Reference ::= EntityRef | CharRef > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From bernerd.anderson at exchange.pnl.gov Thu Nov 6 21:53:19 1997 From: bernerd.anderson at exchange.pnl.gov (Anderson, Bernerd J) Date: Mon Jun 7 16:58:50 2004 Subject: Resources/Glossary Available? Message-ID: <7A8CF1DC6A9DD0118EA400A024BF29DA0121EFFC@pnlmse2.pnl.gov> This may be a simple question, but since I'm a 'newbie' to the Web world, is there a comprehensive glossary to XML (also SGML, and HTML) available via the web? I'm trying to overcome the learning curve as rapidly as possible. Also, any good books, papers, or other references to XML in particular, and SGML/HTML in general, that you would recommend? Many TIA, Bern Anderson bj.anderson@pnl.gov xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Thu Nov 6 22:11:38 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:50 2004 Subject: Oops, I screwed up Message-ID: <3.0.32.19971106093334.00d7c918@pop.intergate.bc.ca> At 11:06 PM 05/11/97 -0800, Tim Bray wrote: >>suggesting parameter entity references aren't expanded within the >>default string in an attdef(?). good, bad and/or ugly? > >Wrong. They are supposed to be expanded. -T. Wrong again. They are supposed to be left as-is. Somebody wrote me pointing out that Lark doesn't do this, and I gasped and went and found the code, and there's this comment saying "Don't expand PE's in <!ATTLIST default strings!". Sigh; my apologies. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Thu Nov 6 22:33:18 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:58:51 2004 Subject: Resources/Glossary Available? Message-ID: <7BB61B44F197D011892800805FD4F79201CD64D8@red-03-msg.dns.microsoft.com> You might start with http://www.sil.org/sgml/xml.html --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: Anderson, Bernerd J [SMTP:bernerd.anderson@exchange.pnl.gov] > Sent: Thursday, November 06, 1997 1:53 PM > To: 'xml-dev@ic.ac.uk' > Subject: Resources/Glossary Available? > > This may be a simple question, but since I'm a 'newbie' to the Web > world, is there a comprehensive glossary to XML (also SGML, and HTML) > available via the web? I'm trying to overcome the learning curve as > rapidly as possible. > > Also, any good books, papers, or other references to XML in particular, > and SGML/HTML in general, that you would recommend? > > Many TIA, > > Bern Anderson > bj.anderson@pnl.gov > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Nov 6 22:44:30 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:58:51 2004 Subject: HELP - XML to Oracle (and back)! References: <7A8CF1DC6A9DD0118EA400A024BF29DA0121EFDE@pnlmse2.pnl.gov> Message-ID: <3462265D.51AF0765@technologist.com> There is quite a bit of code out there for parsing XML documents. It should be quite easy to convert that to some other format for loading using a simple Python or Perl script, or to read it in your favourite programming language and then issue the SQL statements to populate the database. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Ingo.Macherius at TU-Clausthal.de Fri Nov 7 04:12:58 1997 From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius) Date: Mon Jun 7 16:58:51 2004 Subject: Towards msxml Makefile Message-ID: <199711070412.FAA22899@sinfonix.rz.tu-clausthal.de> Hi, I figured out a way to produce a "Makefile" for msxml. I?m not finished yet, just want to point out the idea. Maybe someone wants to go on (simply too tired now). The real work, that is finding the dependencies, is done by "JavaDepend", which can be downloaded from http://pluto.njcc.com/~slinky/land_of_science.html Step by step: 1) Get JavaDepend and the necessary library, install them 2) Go to the directory where msxml was unpacked (tar.gz version) 3) Put ./classes into your $CLASSPATH, don't remove *.class files ! 4)Do something like this: ---snip--- #!/bin/bash # Tested with Linux only export WORKFILE=yin export TEMPFILE=yang rm -f $WORKFILE $TEMPFILE ls classes/com/ms/xml/parser/*.java >> $WORKFILE ls classes/com/ms/xml/util/*.java >> $WORKFILE ls classes/com/ms/xml/om/*.java >> $WORKFILE ls msxml.java >> $WORKFILE ls viewer/*.java >>$WORKFILE sed -e 's/$/ \\/g' < $WORKFILE > $TEMPFILE mv $TEMPFILE $WORKFILE echo "java -Dfiles=\"" > $TEMPFILE cat $WORKFILE >> $TEMPFILE echo \" WARREN.tools.JavaDepend >> $TEMPFILE sh $TEMPFILE > $WORKFILE rm -f $TEMPFILE --snip-- 4) Now in $WORKFILE there is list of dependencies useable for a Makefile 5) ... TODO: Add code-snippets to make this a real Makefile ... For the impatient I put a copy of what I received after step 4 at http://www.heim9.tu-clausthal.de/~inim/xml/Makefile.msxml It should be not be far from there to a working Makefile. ++im -- Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/ Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From krawling at vervet.com Fri Nov 7 04:49:05 1997 From: krawling at vervet.com (Ken Rawlings) Date: Mon Jun 7 16:58:51 2004 Subject: Incremental Parsers Message-ID: <Pine.SOL.3.91.971106010209.5652D-100000@logic> Is there any work being done on XML parsers(preferably in Java) that will allow incremental parsing with validation? I'd like to have the ability to add elements to a loaded document tree knowing immediately whether that addition is valid. It appears that the Microsoft parser might be able to accomplish this, but not via the public API. I'd implement this myself, but i'm a little hesitant about the vagaries of on the fly DTD validation. Are there any resources out there that discuss this sort of thing? Thanks, Ken Rawlings(krawling@vervet.com, www.vervet.com) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Fri Nov 7 05:10:31 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:51 2004 Subject: Incremental Parsers Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FD27@red-17-msg.dns.microsoft.com> Is there any work being done on XML parsers(preferably in Java) that will allow incremental parsing with validation? I'd like to have the ability to add elements to a loaded document tree knowing immediately whether that addition is valid. It appears that the Microsoft parser might be able to accomplish this, but not via the public API. I'd implement this myself, but i'm a little hesitant about the vagaries of on the fly DTD validation. Are there any resources out there that discuss this sort of thing? I would call this a "validating XML Object Model", not an incremental parser, but anyway, it would be relatively simple to modify the MSXML Object Model to do this since the Document class holds onto the DTD object and therefore all the state information you need for validation. First you would have to find the Element Declaration for the parent you are adding children to, using Document.getDTD().findElementDecl(e.getParent().getTagName()), then you need to create a Context object so you can call checkContent to validate the addition. The Context is initialized using initContent() then you would have to walk through all the children (including the new child) calling checkContent to make sure the entire content of the parent is still valid. It has to be done this way because checkContent is optimized for validating while you parse. Similar things would be done for removeChild. The attribute validation is a little harder to split out since currently this is done in AttDef.parseAttValue. You'd have to split this into parseAttValue and validateAttValue so you can call the validation without having to parse anything. Once you've done that validation for attributes is easier because it doesn't require any context. Good luck, let us know how it goes ! xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Patrice.Bonhomme at loria.fr Fri Nov 7 07:18:48 1997 From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme) Date: Mon Jun 7 16:58:51 2004 Subject: Towards msxml Makefile In-Reply-To: Your message of "Fri, 07 Nov 1997 05:06:13 GMT." <199711070412.FAA22899@sinfonix.rz.tu-clausthal.de> Message-ID: <199711070718.IAA04106@chimay.loria.fr> I ve got a set of Unix Makefile for msxml too. They are closer to the unix makefile "philosophy". If you are interresting i can make an archive with the whole set of recursive makefiles. Pat. -- ============================================================== bonhomme@loria.fr | Office : B.228 http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 -------------------------------------------------------------- * Projet Aquarelle : http://aqua.inria.fr * Serveur Silfide : http://www.loria.fr/Projet/Silfide ============================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Fri Nov 7 07:29:17 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:58:51 2004 Subject: XML processing experiments References: <3.0.32.19971105003221.00b6529c@pop.intergate.bc.ca> Message-ID: <3462C158.8A1C1336@jclark.com> Tim Bray wrote: > The following are candidates for why a program like Lark or MSXML > might run slower. > - works with Java char rather than byte variables > - does a method dispatch (or at least a few conditionals) per > character processed for at least two reasons: to manage the entity > stack, Given XML's requirements that entity references in the instance are synchronous, I would have thought that the overhead of an entity stack could be avoided for parsing the instance. The parser passes the application an entity reference event, and the application can then, if it chooses, recursively invoke the parser to parse the referenced entity. > and to have a place to put the different character encoding > processing modules. The issue of how to deal with multiple encodings is an interesting one. The straightforward approach is to abstract an encoding as a process that converts from bytes to characters (and vice-versa) and perform this conversion process before parsing. This involves a significant performance hit. This is particularily the case if you want to get correct byte offsets when using a variable width encoding (such as UTF-8); it's hard to do this without a method call per character. I think it ought to be possible to abstract an encoding in a way that avoids this. Instead of having a two-stage process -- convert a stream of bytes to a stream of characters and then divide the stream of characters into tokens -- there would be a one stage process that converted a stream of bytes into a stream of characters already split up into tokens. I haven't worked this through yet. > - does quite a bit more work upon recognizing some markup > constructs; in particular for a start tag it pulls > appart the attribute list and packages up the element type > & attributes in a nice structure convenient for an API user One ought to be able to do at least some of this work lazily. For example, your API can say here's the start-tag, and can then provide a separate function that pulls apart the start-tag if the user needs it. This could be useful in reducing the heap usage when building trees for large documents (MSXML used about 25Mb on the ot.xml file): instead of building some complex data structure representing the element type and attributes for each element, you simply store a pointer to the element's start tag, and then parse the start-tag to extract the element type and attributes if and when the user requests them. This would work particularily well with file mapping (CreateFileMapping/mmap). Is there any way to get at this sort of OS functionality in Java? > One other experiment would be useful, that might shed light from > a different angle. James, how about doing element counts per type; > i.e. actually *using* some of the info come back from the tokenizer, > nothing fancy, just use a java.util.Hashtable or some such; should be > able to run very similar code on Lark and your TokenStream thing; I > wonder if it would change the numbers. I've done this. Here are the timings I got: [0. Using bytes, counting total number of elements: 1.5s] 1. Using chars instead of bytes and Reader instead of InputStream; use the standard InputStreamReader: 2.3s 2. Same as 1, but roll my own XmlInputStreamReader that simply throws an exception on non-ASCII characters: 1.9s 3. Same as 2, but extract element type name on end tag as a String: 2.3s 4. Same as 3, but count element types using a Hashtable: 2.8s 5. Same as 5, but use custom hash table to avoid allocating String when element type name is already in the table: 2.4s James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Fri Nov 7 07:39:38 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:58:51 2004 Subject: Resources/Glossary Available? In-Reply-To: <199711062227.OAA01902@mehitabel.eng.sun.com> Message-ID: <nlYrLFA+BsY0EwOJ@light.demon.co.uk> In message <199711062227.OAA01902@mehitabel.eng.sun.com>, Murray Altheim <altheim@mehitabel.eng.Sun.COM> writes >I recently picked up a copy of Richard Light's 'Presenting XML', published >recently by Sams.net Publishing (ISBN 1-57521-334-6), and while *already* >a bit dated and inaccurate (due to changes we've made since the book went >to print late this Spring), it provides a fairly comprehensive explanation >of XML and plenty of help in understanding the key concepts. I can >certainly recommend it. It was a bit later than that, actually. The original text was delivered by 17 July (engraved on my heart, that date!). This meant that the 30 June updates to XML-Lang and XLL are reflected in the book - I had to do major work on the linking chapter to achieve this! >I couldn't find it in the book, but perhaps Richard could provide an online >errata document for the book, and list that in the book itself. We are putting an errata list onto the book's accompanying web site. However, it's unlikely that I will have time to track all changes to the XML spec. This is just a list of things that I got wrong (or to be precise, the things I got wrong which Tim didn't spot!). Let me know of any actual errors and I will happily post them to the web site. Richard Light. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Fri Nov 7 10:41:57 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:58:51 2004 Subject: MIME types for XML and gangs? Message-ID: <01bceb69$5ca6f3f0$0100007f@localhost> Hello, I have some questions regarding MIME types for XML resources. I can not believe there has not been discussions regarding this issue but I could not find it in the xml-dev archive. Has MIME types been assigned for XML resources and XML-based resources (i.e. CDF, RDF, etc.). Since HTML is "text/html", I guess "text/xml" would makes sense. What about the XML-based resource formats? Should they have separate MIME types? Please enlighten me, Don "JStud" Park Java/MFC Consultant donpark@quake.net xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From murata at apsdc.ksp.fujixerox.co.jp Fri Nov 7 11:09:32 1997 From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto) Date: Mon Jun 7 16:58:51 2004 Subject: MIME types for XML and gangs? In-Reply-To: <01bceb69$5ca6f3f0$0100007f@localhost> Message-ID: <9711071110.AA02495@lute.apsdc.ksp.fujixerox.co.jp> Don Park writes: > >Has MIME types been assigned for XML resources and XML-based resources (i.e. >CDF, RDF, etc.). Since HTML is "text/html", I guess "text/xml" would makes >sense. What about the XML-based resource formats? Should they have >separate MIME types? Yes, the plan of the WG is to register text/xml and application/xml. We should use text/xml usually. application/xml is for sending XML documents in UTF-16 or UCS-2 via e-mail. #I haven't done my homework, which is to write an RFC draft .... MURATA Makoto (FAMILY Given) Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at cogsci.ed.ac.uk Fri Nov 7 15:16:16 1997 From: richard at cogsci.ed.ac.uk (Richard Tobin) Date: Mon Jun 7 16:58:51 2004 Subject: XML processing experiments In-Reply-To: James Clark's message of Fri, 07 Nov 1997 14:20:56 +0700 Message-ID: <199711071515.PAA05664@stevenson.cogsci.ed.ac.uk> > Given XML's requirements that entity references in the instance are > synchronous, I would have thought that the overhead of an entity stack > could be avoided for parsing the instance. The parser passes the > application an entity reference event, and the application can then, if > it chooses, recursively invoke the parser to parse the referenced > entity. A pedant might note that the XML standard requires that for internal entities "the processor must ... retrieve its replacement text ... passing the result to the application in place of the reference". No doubt the same pedant could draw the line between processor and application such that this was satisfied. This scheme seems reaonable for a parser that works in terms of events implemented by callbacks. Our parser on the other hand returns "bits" (these are essentially start tags, end tags, and pcdata) *sequentially*, following the model of reading a plain text file. Entity references are expanded, and a bit may end in a different entity from the one it started in (suppose foo is defined as "a<b/>c"; then the first bit returned from "x&foo;y" is "xa" - as far as I can tell this is quite legal XML). In a language with threads, it's easy to implement this on top of a callback interface (in a sense the procedure stack in the parsing stack would replace the entity stack), but it's much messier in plain C. Partly the reason for using the sequential model is historical: this parser is used in the LT-NSL system, which already worked like that. But it's also for simplicity: I want this parser to be easily usable with existing C applications (for example, someone here wants to be able to read XML-marked-up text into his speech synthesizer). > [...] > This is particularily the case if you want to get > correct byte offsets when using a variable width encoding (such as > UTF-8); it's hard to do this without a method call per character. Misha Wolf tells me that my earlier comment about the non-invertibility of UTF-8 is wrong: the Unicode standard requires that the shortest encoding be used. So, for example, if you know the byte offset of the start of the line then you can find the byte offset of a character in the line by calculating the encoded length of the preceeding characters. On the other hand I note that low-end current machines can do about 10 million trivial non-leaf procedure calls per second, so maybe the overhead of a call per character is not unacceptable (in C I would be doing something like parser->source->get_translated_char(); there would probably be more overhead in an object-oriented language). > [...] > there would be a one stage process that > converted a stream of bytes into a stream of characters already split up > into tokens. Yes - I have been thinking about that too. Outside the dtd the tokenisation is relatively trivial, and the speed of dtd processing is unimportant in many applications so it can just use character-at-a-time translation. -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jarle.stabell at dokpro.uio.no Fri Nov 7 16:29:00 1997 From: jarle.stabell at dokpro.uio.no (Jarle Stabell) Date: Mon Jun 7 16:58:51 2004 Subject: XML processing experiments Message-ID: <3.0.32.19971107173254.0084b610@hedvig.uio.no> James Clark wrote: >> Given XML's requirements that entity references in the instance are >> synchronous, I would have thought that the overhead of an entity stack >> could be avoided for parsing the instance. The parser passes the >> application an entity reference event, and the application can then, if >> it chooses, recursively invoke the parser to parse the referenced >> entity. Richard Tobin wrote: >Entity references are expanded, and a bit may end in a different >entity from the one it started in (suppose foo is defined as "a<b/>c"; >then the first bit returned from "x&foo;y" is "xa" - as far as I can >tell this is quite legal XML). I don't think this is legal. The working draft (sec. 4.1) says: "The logical and physical structures (elements and entities) in an XML document must be synchronous. Tags and elements must each begin and end in the same entity, but may refer to other entities internally; comments, processing instructions, character references, and entity references must each be contained entirely within a single entity" It seems to me that with the current whitespace handling, one could nearly (?) parse the entities locally, and build a subtree of it if the tree is wanted. (This could maybe result in easier error-reporting, and would probably have a positive impact on parsing speed (but could mean a bit more complexity in the implementation?)) As Mr. Clark indicates, a parser doesn't need to take much of a performance hit when entities are not present, the entity stack have no influence (is kept constant) when parsing f.i. a start-tag. (if entity references are present in the attribute values, this can be expanded afterwards if wanted. Authoring tools etc often don't want this expansion to happen.) I (currently!) think it is possible to design a 'real' parser looking locally much the same as Mr. Clark's "quick and dirty" parser. (I'm in the startup implementing one) BTW: Anyone having an example of where the immediate expansion of character references within internal entities actually comes handy? To me this seems to make the parser use more memory and perhaps being slower, but more importantly: ruins copy-paste semantics of entity expansion What will "normal" people think about such things as the example from the draft: <!ENTITY?example?"<p>An?ampersand?(&#38;)?may?be?escaped numerically?(&#38;#38;)?or?with?a?general?entity (&amp;).</p>"?> I think most people will regard this as a bug/design flaw. I would feel better if I knew an example where this behaviour actually comes handy... :-) Cheers, Jarle Stabell xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at cogsci.ed.ac.uk Fri Nov 7 16:48:35 1997 From: richard at cogsci.ed.ac.uk (Richard Tobin) Date: Mon Jun 7 16:58:51 2004 Subject: XML processing experiments In-Reply-To: Jarle Stabell's message of Fri, 07 Nov 1997 17:32:55 +0100 Message-ID: <199711071648.QAA07983@stevenson.cogsci.ed.ac.uk> > >(suppose foo is defined as "a<b/>c"; > >then the first bit returned from "x&foo;y" is "xa" - as far as I can > >tell this is quite legal XML). > I don't think this is legal. The working draft (sec. 4.1) says: > "The logical and physical structures (elements and entities) in an XML > document must be synchronous. Tags and elements must each begin and end in > the same entity, but may refer to other entities internally; comments, > processing instructions, character references, and entity references must > each be contained entirely within a single entity" I don't see how that excludes my example. The tags and elements *do* begin and end in the same entity. There are no comments, PIs, or character references. The entity reference is contained within a single entity. The point is that the draft says nothing about about *pcdata* starting and ending in the same entity. If it did, it would have to be careful to define exactly what it meant by "ending", since in something like "<!ENTITY name 'richard'> ... <p>my name is &name;</p>" (which we certainly want to be legal) the last character of the pcdata is in a different entity from the first. -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jarle.stabell at dokpro.uio.no Fri Nov 7 17:08:51 1997 From: jarle.stabell at dokpro.uio.no (Jarle Stabell) Date: Mon Jun 7 16:58:51 2004 Subject: XML processing experiments Message-ID: <3.0.32.19971107181156.00849e80@hedvig.uio.no> Richard Tobin wrote: >I don't see how that excludes my example. The tags and elements *do* >begin and end in the same entity. Sorry, the sentence "it may end in a different entity from the one it started in" "tricked" me into not reading your example fully, I thought I saw "a</b>c" and not "a<b/>c" (as you stated). >(suppose foo is defined as "a<b/>c"; >then the first bit returned from "x&foo;y" is "xa". Ok. My current design will first return PCData="x", then entity ref="foo", and (if the client want entities expanded: PCData="a" followed by EmptyElement="b" and then PCData="c".) ie it may return two consecutive PCData's, with perhaps some EntityExpansionStart and -End signals between them. (Is this design flawed?) Cheers, Jarle xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dmck at cogsci.ed.ac.uk Fri Nov 7 17:09:33 1997 From: dmck at cogsci.ed.ac.uk (David McKelvie) Date: Mon Jun 7 16:58:51 2004 Subject: XML processing experiments In-Reply-To: <199711071648.QAA07983@stevenson.cogsci.ed.ac.uk> (message from Richard Tobin on Fri, 7 Nov 1997 16:48:16 GMT) Message-ID: <4468.199711071708@scotus.cogsci.ed.ac.uk> >> "<!ENTITY name 'richard'> ... <p>my name is &name;</p>" It's worth pointing out that Richard wants ALL of the PCDATA of the <p> element to be returned as one string of characters "my name is Richard", rather than as two strings "my name is " and "Richard". David xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at cogsci.ed.ac.uk Fri Nov 7 17:24:14 1997 From: richard at cogsci.ed.ac.uk (Richard Tobin) Date: Mon Jun 7 16:58:51 2004 Subject: XML processing experiments In-Reply-To: Jarle Stabell's message of Fri, 07 Nov 1997 18:11:57 +0100 Message-ID: <199711071724.RAA08821@stevenson.cogsci.ed.ac.uk> > Ok. My current design will first return PCData="x", then entity ref="foo", > and (if the client want entities expanded: PCData="a" followed by > EmptyElement="b" and then PCData="c".) > ie it may return two consecutive PCData's, with perhaps some > EntityExpansionStart and -End signals between them. > (Is this design flawed?) This is reasonable, it's just not what we wanted to do, because we have existing programs (which previously processed "normalised SGML") which (a) expect to see all entities fully expanded and (b) expect to see pcdata including references put together into a single bit. For example, our grep-like program should match the example when searching for the text "xa". -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jarle.stabell at dokpro.uio.no Fri Nov 7 17:39:16 1997 From: jarle.stabell at dokpro.uio.no (Jarle Stabell) Date: Mon Jun 7 16:58:51 2004 Subject: XML processing experiments Message-ID: <3.0.32.19971107184336.00844660@hedvig.uio.no> David McKelvie wrote: >>> "<!ENTITY name 'richard'> ... <p>my name is &name;</p>" > >It's worth pointing out that Richard wants ALL of the PCDATA of the ><p> element to be returned as one string of characters "my name is >Richard", rather than as two strings "my name is " and "Richard". Yes. But this requires one to copy (at least the first string) and a concatenation. Some applications may be more interested in the speedup which may result from not doing this copying/concatenation, and happily accept the small increase in complexity handling it. I'm playing with a design involving two pluggable "ESIS-handlers", one "low-level", where GI's, attribute names, attribute values, comments etc points directly into the source. (typically via a filemapping or an in-memory-buffer) The "low-level" ESIS-handler may copy the data into "real" strings, concatenate the consecutive PCDATA sections , build the tree, do validation etc and pass the events to an optional "higher-level" ESIS-handler. I think/hope the layer which triggers the low-level events won't be very different from Mr Clark's "quick and dirty" parser. (Not sure yet whether the low-level handler should just receive events, or whether it should query for the next event/token.) Cheers, Jarle xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Fri Nov 7 17:40:10 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:52 2004 Subject: XML processing experiments Message-ID: <3.0.32.19971107093943.00a52858@pop.intergate.bc.ca> At 06:11 PM 07/11/97 +0100, Jarle Stabell wrote: >Ok. My current design will first return PCData="x", then entity ref="foo", >and (if the client want entities expanded: PCData="a" followed by >EmptyElement="b" and then PCData="c".) >ie it may return two consecutive PCData's, with perhaps some >EntityExpansionStart and -End signals between them. >(Is this design flawed?) If "foo" is an *internal* entity, the spec clearly requires your parser to expand it for the application. But letting the app know that the ref was encountered is also fine. However, the spec says nothing that would require you to merge the text from a variety of entities. For example, Lark's event-stream API will generate a series of Text object events in just this situation. On the other hand, once you've seen the end of the element, Lark has an API just to get all the text. This is strictly a matter of a design choice; as Richard points out, if you want to support a "grep" application, you'd probably like to have entity replacements merged for you. On the other hand, if you're building a full-text index, you probably need to have the separate chunks made visible so that you know what to point at from the index. As James has pointed out more than once, there is no universal document API that meets everybody's application needs. One of the nice things about XML is that if you can't find a parser that has the API you need, you can go build your own without excessive pain. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dmck at cogsci.ed.ac.uk Fri Nov 7 18:04:28 1997 From: dmck at cogsci.ed.ac.uk (David McKelvie) Date: Mon Jun 7 16:58:52 2004 Subject: XML processing experiments In-Reply-To: <3.0.32.19971107184336.00844660@hedvig.uio.no> (message from Jarle Stabell on Fri, 07 Nov 1997 18:43:37 +0100) Message-ID: <4610.199711071804@scotus.cogsci.ed.ac.uk> >> Some applications may be more interested in the speedup which may result >> from not doing this copying/concatenation, and happily accept the small >> increase in complexity handling it. As Tim Bray says that is another fine way to do it. >> I'm playing with a design involving two pluggable "ESIS-handlers", one >> "low-level", where GI's, attribute names, attribute values, comments etc >> points directly into the source. (typically via a filemapping or an >> in-memory-buffer) We started off doing something like this in LTNSL, but stopped doing filemapping (a) because it wasn't very portable and (b) either you do some tricky decisions about when you free these pointers into the source or it makes reading huge corpora like the 2 gigabyte BNC corpus impossible which we wanted to be able to do. David xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Fri Nov 7 21:46:49 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:58:52 2004 Subject: XML processing experiments Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FD2F@red-17-msg.dns.microsoft.com> The Object Model in MSXML handles this by providing a convenience getText() function on all Element nodes that returns the concatenated text. If someone really wants to see the entity ref nodes, they can enumerate the child nodes and find them. This way the client decides what they want. > -----Original Message----- > From: Jarle Stabell [SMTP:jarle.stabell@dokpro.uio.no] > Sent: Friday, November 07, 1997 9:44 AM > To: xml-dev@ic.ac.uk > Subject: Re: XML processing experiments > > David McKelvie wrote: > >>> "<!ENTITY name 'richard'> ... <p>my name is &name;</p>" > > > >It's worth pointing out that Richard wants ALL of the PCDATA of the > ><p> element to be returned as one string of characters "my name is > >Richard", rather than as two strings "my name is " and "Richard". > > Yes. But this requires one to copy (at least the first string) and a > concatenation. > > Some applications may be more interested in the speedup which may result > from not doing this copying/concatenation, and happily accept the small > increase in complexity handling it. > > I'm playing with a design involving two pluggable "ESIS-handlers", one > "low-level", where GI's, attribute names, attribute values, comments etc > points directly into the source. (typically via a filemapping or an > in-memory-buffer) > The "low-level" ESIS-handler may copy the data into "real" strings, > concatenate the consecutive PCDATA sections , build the tree, do > validation > etc and pass the events to an optional "higher-level" ESIS-handler. > > I think/hope the layer which triggers the low-level events won't be very > different from Mr Clark's "quick and dirty" parser. > > (Not sure yet whether the low-level handler should just receive events, or > whether it should query for the next event/token.) > > > Cheers, > Jarle > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Sat Nov 8 06:03:35 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:58:52 2004 Subject: XML processing experiments References: <4610.199711071804@scotus.cogsci.ed.ac.uk> Message-ID: <3463F857.31040CF6@jclark.com> David McKelvie wrote: > We started off doing something like this in LTNSL, but stopped doing > filemapping (a) because it wasn't very portable and What systems did you have problems with? Win32 supports it and I thought most modern Unix systems now did. > (b) either you do > some tricky decisions about when you free these pointers into the > source or it makes reading huge corpora like the 2 gigabyte BNC corpus > impossible which we wanted to be able to do. Yes, I can see that's a problem. How common do people think XML files bigger than 1 gigabyte or so are going to be? How hard would it be do use external entity references to split it up into files smaller than 1 gigabyte? James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Sat Nov 8 12:22:47 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:58:52 2004 Subject: XML processing experiments References: <3.0.32.19971107093943.00a52858@pop.intergate.bc.ca> Message-ID: <346456E7.88135CDA@jclark.com> Tim Bray wrote: > > At 06:11 PM 07/11/97 +0100, Jarle Stabell wrote: > >Ok. My current design will first return PCData="x", then entity ref="foo", > >and (if the client want entities expanded: PCData="a" followed by > >EmptyElement="b" and then PCData="c".) > >ie it may return two consecutive PCData's, with perhaps some > >EntityExpansionStart and -End signals between them. > >(Is this design flawed?) > > If "foo" is an *internal* entity, the spec clearly requires your > parser to expand it for the application. But letting the app know > that the ref was encountered is also fine. I think it's also fine to give the app control over when the parser performs the expansion. One reason to do this is that the internal entity may be defined in an external parameter entity or external DTD subset. An app may not want to wait to retrieve this when it could be continuing to parse the entity in which the reference occurs. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sat Nov 8 16:25:31 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:52 2004 Subject: XML processing experiments Message-ID: <3.0.32.19971108082338.00a2105c@pop.intergate.bc.ca> At 07:11 PM 08/11/97 +0700, James Clark wrote: >> If "foo" is an *internal* entity, the spec clearly requires your >> parser to expand it for the application. ... > >I think it's also fine to give the app control over when the parser >performs the expansion. This may be the case, but it's not what the spec says today. From 4.4 in the 970807 version: For an internal (text) entity, the processor must include the entity; that is, retrieve its replacement text and process it as a part of the document (i.e. as content or AttValue, whichever was being processed when the reference was recognized), passing the result to the application in place of the reference. >One reason to do this is that the internal >entity may be defined in an external parameter entity or external DTD >subset. An app may not want to wait to retrieve this when it could be >continuing to parse the entity in which the reference occurs. I think we're OK on this one. I think we voted that entities whose declarations are not available because they were in an external part of the DTD and the processor skipped that part (as it's allowed to) are treaded as external entity refs and may be skipped even if they happened to be internal entities. -T. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jarle.stabell at dokpro.uio.no Sun Nov 9 22:53:07 1997 From: jarle.stabell at dokpro.uio.no (Jarle Stabell) Date: Mon Jun 7 16:58:52 2004 Subject: XML processing experiments Message-ID: <01BCED6A.8AD2D340@xyplex04.uio.no> <Tim_Bray> At 07:11 PM 08/11/97 +0700, James Clark wrote: >> If "foo" is an *internal* entity, the spec clearly requires your >> parser to expand it for the application. ... > >I think it's also fine to give the app control over when the parser >performs the expansion. This may be the case, but it's not what the spec says today. From 4.4 in the 970807 version: For an internal (text) entity, the processor must include the entity; that is, retrieve its replacement text and process it as a part of the document (i.e. as content or AttValue, whichever was being processed when the reference was recognized), passing the result to the application in place of the reference. </Tim_Bray> [JS] Some apps needs entity expansion *not* to happen, so I think the spec shouldn't forbid the processor to let the app decide upon this. (I can't see any harm in this, just a *very* useful feature for those apps which needs it.) F.i. authoring tools which loads the documents into some sort of "structured editor" shouldn't "flatten" the document if the user doesn't want this. Same applies to tools which updates documents, f.i. synchronizing documents with respect to other data (data in a database etc). (Converters may also want entity expansion not to happen) Of course, one has to add some special logic to the processor in order to fully validate/check the document in this case (or just validate it up front with a "normal" parse (if validation is necessary at all), followed by "semi-parsing" it with "no-expansion") Cheers, Jarle xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From thoki at csi.com Mon Nov 10 11:53:55 1997 From: thoki at csi.com (Thorsten Kitz) Date: Mon Jun 7 16:58:52 2004 Subject: List of possible choices in a DTD Message-ID: <01bcedcf$08216230$0100007f@potter> Hey, I have a really simple problem: I like to define a choice list in a DTD, eg for element <Weekdays> a list of possible values like "Monday", "Tuesday", etc. How can I do this? From my point of knowledge, it can't be done with an entity declaration, because it is just like a text replacement and it can't be done with an element declaration either (maybe I have overseen something). Thanks for any help, Thorsten. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Ingo.Macherius at TU-Clausthal.de Mon Nov 10 12:19:52 1997 From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius) Date: Mon Jun 7 16:58:52 2004 Subject: List of possible choices in a DTD In-Reply-To: <01bcedcf$08216230$0100007f@potter> Message-ID: <199711101219.NAA03392@sinfonix.rz.tu-clausthal.de> > From: "Thorsten Kitz" <thoki@csi.com> > Subject: List of possible choices in a DTD > I have a really simple problem: I like to define a choice list in a > DTD, eg for element <Weekdays> a list of possible values like > "Monday", "Tuesday", etc. So the name of the weekday is data content, not structural information. Consider how this is done with HTML 4: <!ELEMENT SELECT - - (OPTION)+ -- option selector --> <!ELEMENT OPTION - O (#PCDATA) -- selectable choice --> so a typical instance would be <select> <option>Monday</option> [...] <option>Sunday</option> </select> Display of the list is a processing semantic, that can't be expressed in the DTD. It's up to your application to make this a choice list, eiher like a pulldown menu, a item list, etc. Of course you could use specific names for your list, like <week> <day>Monday</day> [...] <day>Sunday</day> </week> or even <week> <monday>[...]</monday> <!-- or <monday/> if no content is necessary --> [...] <sunday>[...]</sunday> </week> but the more tags you have, the more compicated your stylesheets etc. become. To me names of days are data, not structure. Anyway, the last example is different, as it may contain information *about* the weekdays, e.g a hourly schedule. Alternatively, weekdays could be attributes <week> <day name="monday">[...]</day> [..] <day name="sunday">[...]</day> </week> Choice is up to you, but nowhere I can see a need for entities. ++im BTW: XSL does not say anything about forms ! Should there be a standard forms set, just like there are CALS tables and MathML ? -- Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/ Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From pam.gennusa at DPSL.CO.UK Mon Nov 10 14:04:33 1997 From: pam.gennusa at DPSL.CO.UK (Pam Gennusa) Date: Mon Jun 7 16:58:52 2004 Subject: XML product survey Message-ID: <TFSLBVAJ@DPSL.CO.UK>> On 18 November 1997, Technology Appraisals will again be hosting a one-day seminar on XML in the UK (the first one was last April). I have been asked to present a survey of the work to date on XML products and tools. The presentation will not include any evaluation of the products or tools mentioned. However, I would like to be able to give the following information: Vendor or independent developer's name Contact information for vendor or developer (if desired) Name of tool or product General catelgory of tool or product Status (released, in beta, etc.) Commercial details (price, public domain, etc.) Brief description of product or tool highlighting distinguishing characteristics. If you have not got an XML offering yet, can you please let me know if your company: a) has taken a position on XML product support and if so what b) has made any announcements about XML product support and if so what (also any caveats that apply) c) is planning any XML product support that you are comfortable talking about at this time I appreciate any information you can supply. Ideally, I would like to get the information by Thursday 13 November (earlier would be delightful). If you intend to respond, but cannot by that date, please let me know as well. Kind regards, Pam P.S. On another topic, please note that the SGML/XML Europe '98 Call for Papers is out. It is available on the GCA website at www.gca.org or you can contact their office at +1 703 519 8167 to send a copy of the brochure. The closing date for abstract submission is 19 December 1997. We are planning a very high profile for XML at this conference including a new technologies track. **************************************************************************** ********* Pamela L. Gennusa, Managing Director, email: Pam.Gennusa@dpsl.co.uk Database Publishing Systems Ltd, 608 Delta Business Park Great Western Way, Swindon, Wiltshire SN5 7XF, UK Tel:+44 1793 512 515;fax +44 1793 512 516 URL:www.dpsl.co.uk **************************************************************************** ********* xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Mon Nov 10 14:12:44 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:58:52 2004 Subject: List of possible choices in a DTD Message-ID: <199711101408.BAA19941@jawa.chilli.net.au> Is this what you are asking? USING ELEMENT TYPES ------------------- In standard SGML you can select days using parameter entities like this: <!ENTITY % days " ( sunday | monday | tuesday | wednesday | thursday | friday | saturday ) " > <!ELEMENT day-choice %days; > <!ELEMENT %days; EMPTY > In XML, I think you may have to give a different ELEMENT declaration for each day (I cannot remember what was decided, sorry. USING ATTRIBUTES ---------------- <!ENTITY % days " ( sunday | monday | tuesday | wednesday | thursday | friday | saturday ) " > <!ELEMENT day-choice EMPTY> <!ATTLIST day-choice day %days; #REQUIRED > Again, in XML I think you may have to dereference the entity yourself. (Even if you don't, it is probably good practise since parameter entities will not be the first things implemented in beta XML parsers.) <!ELEMENT day-choice EMPTY> <!ATTLIST day-choice day ( sunday | monday | tuesday | wednesday | thursday | friday | saturday ) #REQUIRED > Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From thoki at csi.com Mon Nov 10 15:18:32 1997 From: thoki at csi.com (Thorsten Kitz) Date: Mon Jun 7 16:58:52 2004 Subject: Escaping in entities Message-ID: <01bcedeb$99037790$0100007f@potter> Hello, I have another question concerning Entities. My problem is, that I like to generate a HTML-file out of a XML-document with German "Umlaute". Normally, an "?" (ue) in HTML is written as ü. I tried the following Entitiy-declaration in my DTD, <!ENTITY ? '&uuml;'> The result was just "&uuml;". Then I tried <!ENTITY ? 'ü'> and <!ENTITY ? '&uuml;'> Both resulted in a Jade-error, arguing, that no entity "uuml" is defined. How do I define an entitiy, that the correct HTML code is used? Thanks, Thorsten. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Mon Nov 10 16:12:54 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:58:52 2004 Subject: List of possible choices in a DTD References: <01bcedcf$08216230$0100007f@potter> Message-ID: <346732D7.9E1EA710@technologist.com> Thorsten Kitz wrote: > > Hey, > > I have a really simple problem: I like to define a choice list in a DTD, eg > for element <Weekdays> a list of possible values like "Monday", "Tuesday", > etc. Try this: <!DOCTYPE WEEKDAYS[ <!ELEMENT WEEKDAYS (WEEKDAY+)> <!ELEMENT WEEKDAY EMPTY> <!ATTLIST WEEKDAY DAY ("SUNDAY"|"MONDAY"|"TUESDAY"|"WEDNESDAY"|"THURSDAY"|"FRIDAY"|"SATURDAY") #IMPLIED> ]> <WEEKDAYS> <WEEKDAY DAY="SUNDAY"/> <WEEKDAY DAY="FRIDAY"/> </WEEKDAYS> Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dalapeyre at mulberrytech.com Mon Nov 10 19:57:57 1997 From: dalapeyre at mulberrytech.com (Deborah Aleyne Lapeyre) Date: Mon Jun 7 16:58:52 2004 Subject: Please do an XML Poster at SGML/XML'97!!!! Message-ID: <v03020907b08d16b05ee8@DialupEudora> Dear Developer's List, Posters are a great way to advertise your product at the SGML/XML'97 Conference! (This year in Washington D.C. USA on December 8-11) If you are coming to the conference anyway, it's FREE advertising. So is the New Technology Nursery in the Exhibit Hall! (If you aren't registered, I can sneak you in for ONE day only to do a poster.) I would also really like a few XML case studies and a few XML technical posters, Please! The average SGMLer is very curious as to what is going on in actual XML development, and is also very afraid that XML is a dream and not real. This is your chance to tell them. Don't know what a poster is? Drop me a private email and I'll tell you. Know all about them? The technical details are given below. --Debbie (Co-chair of SGML/XML'97) <dalapeyre@mulberrytech.com> USA Phone: 301/315-9633 ****** SGML/XML'97 POSTER GUIDELINES ****** ---------------------------------------------------------------------- WHAT YOU SEND for the POSTER PROGRAM (Deadline November 24, 1997) (E-mail to : Melanie Yunk <mel@cfi.org>) 1. Title of your poster presentation 2. Poster Abstract (1-3 short paragraphs) 3. Your name(s) and address(s) (including email) ---------------------------------------------------------------------- WHAT YOU BRING TO SGML/XML'97 (or ship) (Deadline December 7/8, 1997) (To post on a 4 foot by 8 foot cork board) 1. Poster(s) -- Text big enough to read from 4 or 5 feet away. Size approximately 22 x 28" (56 by 71 cm). (22 x 26" is fine.) Thin paper, not foam core. 2. Handouts (Optional) ---------------------------------------------------------------------- POSTER CATEGORIES 1. Technical poster (case study or technical topic) 2. Vendor posters (free advertising)) ---------------------------------------------------------------------- *** FREE ENLARGING *** (Deadline: Received BEFORE November 12, 1997) Send 8 1/2 x 11 or A4 paper to GCA and they will enlarge to poster size for free. GCA's address: Graphic Communications Association; Poster Submission; ATTN: Tanya Bose; 100 Daingerfield Road; Alexandria, VA USA 22314-2888 ---------------------------------------------------------------------- Don't know what a poster is, want to know if there is a reward in all this, other questions, comments or for sending title/abstract/name by email: Melanie Yunk <mel@cfi.org> ---- end ---- ====================================================================== Deborah A. Lapeyre Phone: 301/315-9631 Mulberry Technologies, Inc. Fax: 301/315-8285 17 West Jefferson Street, Suite 207 E-mail: dalapeyre@mulberrytech.com Rockville, MD 20850 WWW: http://www.mulberrytech.com ====================================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter.bergstrom at eurostep.se Wed Nov 12 13:19:30 1997 From: peter.bergstrom at eurostep.se (Peter Bergstrom) Date: Mon Jun 7 16:58:52 2004 Subject: Software for MathML? Message-ID: <01BCF1D1.ACAFEA00@WIN95.swipnet.se> I'm trying to find software that works with MathML, especially browsers for the display part of the language. Can someone please point me at something? Peter -- Peter Bergstrom EuroSTEP AB mobile phone: +46 708 111 966 Drottninggatan 71 D mobile fax: +46 708 111 965 S-111 36 Stockholm Sweden http://www.eurostep.se/ Open solutions for open organisations and people xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Wed Nov 12 15:18:28 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:58:52 2004 Subject: Software for MathML? In-Reply-To: <01BCF1D1.ACAFEA00@WIN95.swipnet.se> (message from Peter Bergstrom on Sat, 15 Nov 1997 14:05:43 +-100) Message-ID: <199711121522.KAA24649@geode.ora.com> [Peter Bergstrom] > I'm trying to find software that works with MathML, especially > browsers for the display part of the language. Can someone please > point me at something? >From _World Wide Web Journal_, Volume 2, Issue 4, "XML: Principles, Tools, and Techniques", p. 85 "HTML-Math": There are already two early rendering prototypes: o WebEQ, a Java development, from the Geometry Center at the University of Minnesota o An inclusion in the Techexplorer product from the Interactive Document labs of IBM HTH, Chris -- <!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN" "<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487 <USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Fri Nov 14 03:47:31 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:53 2004 Subject: Query Languages for XML Message-ID: <3.0.5.32.19971113224751.00937100@pop.access.digex.net> What query languages are under development for use with XML documents? I'm talking about data management query languages akin to SQL and OQL. I can envision immense repositories of highly dynamic content. It seems to me that such a query language will eventually become necessary to access or change XML documents that are shared among many users. The language would also need to be a standard to ensure that the clients and servers of different vendors will interoperate. Designing a standard that queries a single repository might be relatively straightforward. Designing a standard that allows queries across multiple repositories might be a bit more of a challenge. (Think of a future internet in which documents are related by extended links that assign roles to everything, and imagine performing a read-only query across the whole mesh of globally distributed documents.) I am aware of the SgmlQL and SDQL languages, although I know only what can be gleaned from an hour's browsing on the web. (See http://www.lpl.univ-aix.fr/projects/SgmlQL/ for info on SgmlQL.) I'd rather see something more object-oriented, like ODMG's OQL, or something that uses XML to specify queries. BTW, Microsoft's XML-Data would be quite a boon for such a large XML repository. Clients could use XML to specify new document types or to change existing document types, and the whole DTD schema could itself reside in the repository. One query language could be used to maintain both the data and the DTDs. If the query language were in XML, it itself would be extensible. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Ingo.Macherius at TU-Clausthal.de Fri Nov 14 04:19:10 1997 From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius) Date: Mon Jun 7 16:58:53 2004 Subject: Query Languages for XML In-Reply-To: <3.0.5.32.19971113224751.00937100@pop.access.digex.net> Message-ID: <199711140418.FAA01837@sinfonix.rz.tu-clausthal.de> > Date: Thu, 13 Nov 1997 22:47:51 -0500 > From: Joe Lapp <jlapp@acm.org> > Subject: Query Languages for XML Joe asks many questions I've asked myself, let me add some more. > I am aware of the SgmlQL and SDQL languages, although I know > only what can be gleaned from an hour's browsing on the web. IMO there are three query languages, for each xml-part: 1) In XLL there are XLinks 2) In XSL there are the pattern parts of a rule 3) In DOM there a navigation functions that query parts of the grove To me all those are similar in a high degree. So why was the DSSSL approach to have a single SDQL abadoned ? Why there isn't a "XML-query" draft, which is mapped to a concrete syntax by XLL, XSL and DOM ? There is much redundancy in this. > BTW, Microsoft's XML-Data would be quite a boon for such a > large XML repository. Aren't XML-Data and MCF superseded by RDF (resource description framework) ? Are there features in XML-Data and MCF that are not to become part of RDF ? > If the query language were in XML, it itself would be extensible. Agreed. This approach was taken by XSL. This is a strong feature, as one may use the same tools on document and meta level. There should be a query language in XML syntax, and it should be modularized. This query module should be imported by XSL, XLL and DOM. The main obstacle is the fact thas XLinks and DOM API functions don't use XML syntax, for obvious reasons. But this feature is closely related to namespaces (or architectural forms) questions, because ideally names need to be changed to fit the conventions of the importing language. This ain't easy, because DOM ist a programming language. In XML terseness matters, so do characters that have to be escaped in URL. How can functionality and/or semantics of XML languages be mapped non-xml languages ? Do architectectural forms offer such functionality ? Clueless, ++im -- Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/ Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Patrice.Bonhomme at loria.fr Fri Nov 14 08:17:27 1997 From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme) Date: Mon Jun 7 16:58:53 2004 Subject: case insensitive and ID Message-ID: <199711140817.JAA09810@chimay.loria.fr> Hi, A very short question. Are "P1S1" and p1s1" are the same ID attributes within an XML document? Thanks Pat. -- ============================================================== bonhomme@loria.fr | Office : B.228 http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 -------------------------------------------------------------- * Projet Aquarelle : http://aqua.inria.fr * Serveur Silfide : http://www.loria.fr/Projet/Silfide ============================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Fri Nov 14 14:59:43 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:53 2004 Subject: Query Languages for XML In-Reply-To: <199711141437.OAA15051@nathaniel.eps.inso.com> References: <199711140418.FAA01837@sinfonix.rz.tu-clausthal.de> Message-ID: <3.0.5.32.19971114095951.0093b2e0@pop.access.digex.net> At 02:37 PM 11/14/1997 GMT, you wrote: >>To me all those are similar in a high degree. So why was the DSSSL >>approach to have a single SDQL abadoned ? Why there isn't a >>"XML-query" draft, which is mapped to a concrete syntax by XLL, XSL >>and DOM ? There is much redundancy in this. > >I have been trying to get the DOM WG to realise this, and for us >to work on a standard DOM API to queries (including syntax, and return >result). Is there any interest in putting a draft together to present to the WG? Having a draft to work with could jump-start things. I am currently on a self-funded sabbatical, and I have a lot of time to devote to such an effort. I wouldn't mind coordinating the process so long as we can gather a pool of people who are willing to put some time into thinking about the issues. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From wendling at ganymede.isdn.uiuc.edu Fri Nov 14 16:50:51 1997 From: wendling at ganymede.isdn.uiuc.edu (Bill Wendling) Date: Mon Jun 7 16:58:53 2004 Subject: Query Languages for XML In-Reply-To: <3.0.5.32.19971114095951.0093b2e0@pop.access.digex.net> Message-ID: <Pine.LNX.3.95.971114104856.9047F-100000@ganymede.isdn.uiuc.edu> On Fri, 14 Nov 1997, Joe Lapp wrote: } }Is there any interest in putting a draft together to present to }the WG? Having a draft to work with could jump-start things. } }I am currently on a self-funded sabbatical, and I have a lot of }time to devote to such an effort. I wouldn't mind coordinating }the process so long as we can gather a pool of people who are }willing to put some time into thinking about the issues. } Excuse me for jumping in in the middle of a thread, but could someone tell me what kind of query language is being asked for? I'm working in a group which is trying to develop just such a thing in XML... || Bill Wendling wendling@ncsa.uiuc.edu xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at cogsci.ed.ac.uk Fri Nov 14 17:21:23 1997 From: richard at cogsci.ed.ac.uk (richard@cogsci.ed.ac.uk) Date: Mon Jun 7 16:58:53 2004 Subject: Alpha-test release of RXP Message-ID: <23145.199711141721@pitcairn.cogsci.ed.ac.uk> To celebrate XML's first birthday, I am releasing an alpha-test version of RXP, an XML parser in C. RXP will be the parser in the next release of the LT XML system. RXP goes some way to addressing the concerns about XML processing speed raised on this mailing list. It can parse ot.xml in 0.8 seconds on a 233MHz Pentium II. This is not a public release, so please don't redistribute the system. RXP is available (in source form only) at ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp.tar.gz -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Fri Nov 14 18:01:02 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:53 2004 Subject: Query Languages for XML References: <Pine.LNX.3.95.971114104856.9047F-100000@ganymede.isdn.uiuc.edu> Message-ID: <346C90F2.A4566219@isogen.com> Any query language for XML would, by necessity, be a syntax for accessing the properties of XML objects (where the object schema would come from either the SGML property set, the DOM, or something derived from one or both). Certainly SDQL provides this. Any new language would, I think, mostly be an exercise in syntax definition, most of which is already inherent in the development of XSL (which is nothing more than a language for applying processes to the results of queries on XML groves). >From another point of view, it's not possible to have *an* XML query language because there are too many different ways that you might want to access XML data: as nodes in groves ala SDQL, as full text using some full-text index, as semantic-specific objects using some domain-specific query mechanism, etc. A language like SDQL coupled with an XML property set (that is, the subset of the SGML property set needed to represent XML documents) provides a complete set of operations for querying XML documents represented as groves. These operations can be used either as primitatives from which more specialized languages are created or as a design spec that drives the development of a new syntax for expressing the equivalent queries. So my question is: is what is desired only a new *syntax* or is there a requirement for a fundamentally different query mechanism? Or have I entirely missed the point of the original question? Cheers, Eliot xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Fri Nov 14 19:01:30 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:53 2004 Subject: MSXML is tied to Windows Message-ID: <3.0.5.32.19971114094153.00917d30@pop.access.digex.net> I've been looking at Microsoft's MSXML. Although it is written in Java, it is tied to the Windows platform. Instead of using Java's URL facilities in java.net.*, it provides an ActiveX control to do the job. Class com.ms.xml.util.XMLInputStream relies on COM interface IXMLStream, which MSXML provides as a DLL written in C++ (see com\ms\xml\XMLStream\XMLURLStream). I'm looking for a 100% pure Java XML parser that is being actively maintained. I've got a few projects up my sleeve, and I want to be sure that the code I write is cross-platform. If I write to MSXML, I tie myself into Microsoft's API. Given that the only implementation of that API works only on Windows, to write to MSXML would be to tie my Java tool to Windows. It seems that Microsoft has the most complete implementation of an XML parser, so Microsoft is doing a very good job of trying to get me to write Java that works only on Windows. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Fri Nov 14 19:18:56 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:58:53 2004 Subject: MSXML is tied to Windows Message-ID: <01bcf131$bdda8ec0$0100007f@localhost> Joe, Windows dependency of MSXML is minimal. All you have to do is following: 1. remove com.ms.xml.dso package. Delete the class files from the jar and/or comment it out of the makefile. DSO is accessed by some of the samples but none of the other MSXML packages. 2. remove dependency on com.ms.xml.xmlstream package. Latest version of MSXML includes an alternate XMLInputStream class located inside the 'make' directory. Replace com.ms.xml.util.XMLInputStream with the alternate version to remove dependency on com.ms.xml.xmlstream package. WIth above two changes, you will end up with a pure-Java version of MSXML. MSXML is the most complete XML parser available right now and you get the source code on top of it. I would be smiling by now if I were you :-) Don Park -----Original Message----- From: Joe Lapp <jlapp@acm.org> To: xml-dev@ic.ac.uk <xml-dev@ic.ac.uk> Date: Friday, November 14, 1997 11:02 AM Subject: MSXML is tied to Windows >I've been looking at Microsoft's MSXML. Although it is written >in Java, it is tied to the Windows platform. Instead of using >Java's URL facilities in java.net.*, it provides an ActiveX >control to do the job. Class com.ms.xml.util.XMLInputStream >relies on COM interface IXMLStream, which MSXML provides as a >DLL written in C++ (see com\ms\xml\XMLStream\XMLURLStream). > >I'm looking for a 100% pure Java XML parser that is being >actively maintained. I've got a few projects up my sleeve, >and I want to be sure that the code I write is cross-platform. >If I write to MSXML, I tie myself into Microsoft's API. Given >that the only implementation of that API works only on Windows, >to write to MSXML would be to tie my Java tool to Windows. > >It seems that Microsoft has the most complete implementation >of an XML parser, so Microsoft is doing a very good job of >trying to get me to write Java that works only on Windows. >-- >Joe Lapp (Java Apps Developer/Consultant) >Unite for Java! - http://www.javalobby.org >jlapp@acm.org > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Fri Nov 14 19:47:36 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:58:53 2004 Subject: MSXML is tied to Windows Message-ID: <7BB61B44F197D011892800805FD4F79201CD6571@red-03-msg.dns.microsoft.com> I believe that the dependency on the XMLInputStream interface is to avoid some bugs in the JDK 1.1 libraries that do not handle byte ordering correctly on Apple platforms. That is my memory; you could alter the code to use the JDK packages and test on Apple if you like. The packages com.ms.xml.xmlstream and the alternate version are functionally equivalent, but the Windows-specific one has much higher performance. Choose portable or fast depending on your needs. --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: Don Park [SMTP:donpark@quake.net] > Sent: Friday, November 14, 1997 11:16 AM > To: Joe Lapp; xml-dev@ic.ac.uk > Subject: Re: MSXML is tied to Windows > > Joe, > > Windows dependency of MSXML is minimal. All you have to do is following: > > 1. remove com.ms.xml.dso package. > > Delete the class files from the jar and/or comment it out of the makefile. > DSO is accessed by some of the samples but none of the other MSXML > packages. > > 2. remove dependency on com.ms.xml.xmlstream package. > > Latest version of MSXML includes an alternate XMLInputStream class located > inside the 'make' directory. Replace com.ms.xml.util.XMLInputStream with > the alternate version to remove dependency on com.ms.xml.xmlstream > package. > > WIth above two changes, you will end up with a pure-Java version of MSXML. > MSXML is the most complete XML parser available right now and you get the > source code on top of it. I would be smiling by now if I were you :-) > > Don Park > > -----Original Message----- > From: Joe Lapp <jlapp@acm.org> > To: xml-dev@ic.ac.uk <xml-dev@ic.ac.uk> > Date: Friday, November 14, 1997 11:02 AM > Subject: MSXML is tied to Windows > > > >I've been looking at Microsoft's MSXML. Although it is written > >in Java, it is tied to the Windows platform. Instead of using > >Java's URL facilities in java.net.*, it provides an ActiveX > >control to do the job. Class com.ms.xml.util.XMLInputStream > >relies on COM interface IXMLStream, which MSXML provides as a > >DLL written in C++ (see com\ms\xml\XMLStream\XMLURLStream). > > > >I'm looking for a 100% pure Java XML parser that is being > >actively maintained. I've got a few projects up my sleeve, > >and I want to be sure that the code I write is cross-platform. > >If I write to MSXML, I tie myself into Microsoft's API. Given > >that the only implementation of that API works only on Windows, > >to write to MSXML would be to tie my Java tool to Windows. > > > >It seems that Microsoft has the most complete implementation > >of an XML parser, so Microsoft is doing a very good job of > >trying to get me to write Java that works only on Windows. > >-- > >Joe Lapp (Java Apps Developer/Consultant) > >Unite for Java! - http://www.javalobby.org > >jlapp@acm.org > > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > >(un)subscribe xml-dev > >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > >subscribe xml-dev-digest > >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > > > > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Per-Ake.Ling at uab.ericsson.se Fri Nov 14 20:03:34 1997 From: Per-Ake.Ling at uab.ericsson.se (Per-Ake Ling) Date: Mon Jun 7 16:58:53 2004 Subject: MSXML is tied to Windows Message-ID: <199711142001.VAA15294@uabs19c26.eua.ericsson.se> > From: Andrew Layman <andrewl@microsoft.com> ...[snip] > I believe that the dependency on the XMLInputStream interface is to avoid > some bugs in the JDK 1.1 libraries that do not handle byte ordering > correctly on Apple platforms. That is my memory; you could alter the code to > use the JDK packages and test on Apple if you like. > > The packages com.ms.xml.xmlstream and the alternate version are functionally > equivalent, but the Windows-specific one has much higher performance. Choose > portable or fast depending on your needs. I can accept the second paragraph but the first one is very confusing: if the portable code may have problems on Apple, use the Windows-specific code so it can't run on Apple or on any other platform ? Per-Åke -- Per-Åke Ling (note: Per-Åke, transliteration Per-Ake) email: Per-Ake.Ling@uab.ericsson.se phone: +46 8 727 5674 Ericsson Utvecklings AB mobile: +46 70 790 2446 AXE Research and Development fax: +46 8 727 3463 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Fri Nov 14 20:05:17 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:53 2004 Subject: MSXML is tied to Windows In-Reply-To: <01bcf131$bdda8ec0$0100007f@localhost> Message-ID: <3.0.5.32.19971114150447.0093ec10@pop.access.digex.net> At 11:15 AM 11/14/1997 -0800, you wrote: >Windows dependency of MSXML is minimal. All you have to do is following: >[...] >WIth above two changes, you will end up with a pure-Java version of MSXML. >MSXML is the most complete XML parser available right now and you get the >source code on top of it. I would be smiling by now if I were you :-) How 'bout that! Microsoft's EULA even grants us the right to redistribute such modified code. Quite generous of them, I must say. Microsoft just went up a point in my rating system. I am indeed smiling now. :-) My apologies to the MSXML team. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Fri Nov 14 21:39:16 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:58:53 2004 Subject: MSXML is tied to Windows Message-ID: <7BB61B44F197D011892800805FD4F79201CD657B@red-03-msg.dns.microsoft.com> Maybe I should have been more clear. The parser uses a newly-defined Interface to a stream library that is specific to XML. The parser does not use the implementations of streams provided in the JDK 1.1 packages for the internet. I believe that this has to do with byte-ordering problems in those implementations. I have not checked this for myself. The interface per se has no platform dependencies. It is shipped with two implementations. One implementation is specific to Windows, the other is generic Java using JDK packages. Neither has the byte-order flaw. You may use whichever one you prefer. Both work. The generic one has lower performance. --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: Per-Ake.Ling@uab.ericsson.se [SMTP:Per-Ake.Ling@uab.ericsson.se] > Sent: Friday, November 14, 1997 12:02 PM > To: xml-dev@ic.ac.uk > Subject: RE: MSXML is tied to Windows > > > > From: Andrew Layman <andrewl@microsoft.com> > ...[snip] > > I believe that the dependency on the XMLInputStream interface is to > avoid > > some bugs in the JDK 1.1 libraries that do not handle byte ordering > > correctly on Apple platforms. That is my memory; you could alter the > code to > > use the JDK packages and test on Apple if you like. > > > > The packages com.ms.xml.xmlstream and the alternate version are > functionally > > equivalent, but the Windows-specific one has much higher performance. > Choose > > portable or fast depending on your needs. > > I can accept the second paragraph but the first one is very confusing: if > the portable code may have problems on Apple, use the Windows-specific > code so it can't run on Apple or on any other platform ? > > Per-Åke > -- > Per-Åke Ling (note: Per-Åke, transliteration Per-Ake) > email: Per-Ake.Ling@uab.ericsson.se phone: +46 8 727 5674 > Ericsson Utvecklings AB mobile: +46 70 790 2446 > AXE Research and Development fax: +46 8 727 3463 > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Fri Nov 14 22:06:05 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:58:53 2004 Subject: How best to represent unrepresentable characters in NAME toke ns? Message-ID: <7BB61B44F197D011892800805FD4F79201CD6581@red-03-msg.dns.microsoft.com> Thank you all for the suggestions you have made to me (many privately) regarding this question. Here is the policy I intend to follow and to recommend: Sometimes you will want to use a character in a name, but that character is not an XML NameChar. In that case, encode it, using a sequence such as "_#xHHHH_" where "HHHH" is a hexadecimal rendition of the Unicode character. For example "Two Words" would encode as "Two_#x0020_Words". Such encoding (and subsequent decoding) is an application function, not part of the XML specification per-se. (This is the closest mapping I could make to using character entities in names.) --Andrew Layman AndrewL@microsoft.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Sat Nov 15 08:08:28 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:58:53 2004 Subject: Query Languages for XML In-Reply-To: <346C90F2.A4566219@isogen.com> Message-ID: <Ymq81JA+dVb0EwA5@light.demon.co.uk> In message <346C90F2.A4566219@isogen.com>, "W. Eliot Kimber" <eliot@isogen.com> writes > >From another point of view, it's not possible to have *an* XML query >language because there are too many different ways that you might want >to access XML data: as nodes in groves ala SDQL, as full text using some >full-text index, as semantic-specific objects using some domain-specific >query mechanism, etc. > >So my question is: is what is desired only a new *syntax* or is there a >requirement for a fundamentally different query mechanism? Or have I >entirely missed the point of the original question? One important thing about "Standard Query Language" is that it doesn't just query. It is actually a complete language for "defining, accessing and otherwise managing relational databases". I think that anyone coming from an SQL background would find SDQL very restricted, _in the sense that_ it provides a set of 'read-only' functions that you can carry out on SGML documents which are, magically, already there. Unlike SQL, SDQL provides you with no means to: - create a schema; - create a new document; - edit an existing document; - delete a document; - manage access to documents; - etc. If you want to use SDQL as the basis for document management, you would have a very hard time of it. And yet, surely that is what someone looking to create and manage XML repositories is going to be interested in having? Obviously this is not just an XML problem: it applies equally to SGML, which is in effect a "read only" standard. One example of this is in the style language's (DSSSL or XSL) online support. As far as I am aware, there is no support for any features of forms. Yet, if SDQL had primitives such as (insert-node), (replace-node) and (add-text-to-node) it wouldn't be too hard to add an "input-line" flow object type. And the implications of even that simple addition would be pretty far- reaching. Richard. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sat Nov 15 09:59:12 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:53 2004 Subject: case insensitive and ID Message-ID: <3.0.32.19971114113638.00a9156c@pop.intergate.bc.ca> At 09:17 AM 14/11/97 +0100, Patrice Bonhomme wrote: >Are "P1S1" and p1s1" are the same ID attributes within an XML document? No. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Nov 15 17:55:21 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:54 2004 Subject: <?XDEV?> and BEHAVIOR: a meta-proposal and a proposal In-Reply-To: <3.0.5.32.19971114094153.00917d30@pop.access.digex.net> Message-ID: <3.0.1.16.19971115184507.1fafd4be@pop3.demon.co.uk> In this posting I make a proposal for the treatment of a certain class of XML files. I offer this in the belief that there is a subset of the XML community who will find this proposal useful and may wish to refine it gently. There is also a meta-proposal for the use of a PI-target associated in a general way with this list. This PI-target can in principle could be used for a wide class of applications. If there are people who believe that the *meta-proposal* is harmful or beneficial to the XML community, I'd be grateful for their views posted to the list. If they believe that the meta-proposal is acceptable, but they don't like the proposal, then they can delete subsequent correspondence on the proposal before reading. Constructive and Destructive criticism of the meta-proposal is appropriate; destructive criticism of the proposal is pointless. <BACKGROUND> A considerable amount of decision-making in XML is left 'to the application' (i.e. some or all of the processing software after the document has been parsed. In some cases the whole authoring/distribution/parsing/application process is under the effective control of some 'organisation'. They will develop their applications to be consistent with the authoring tools and the document instances; this need not concern us. A number of groups and individuals are, however, proposing XML 'applications' where there is unlikely to be a single 'application' for processing. Moreover, many of these may be DTD-less in some way and may also not use style sheets. There is often an implied need for these 'applications' to set constraints on the processing software in ways that are not covered, and not likely to be covered, by the formal specs. In other cases the specs provide syntax, but no semantics, for certain important operations. I believe that there may be cases where many people want a particular generic behaviour where a broad consensus can be obtained and which need not affect the formal spec development. </BACKGROUND> <LIST> <AXIOM>In any of these cases there is no general solution acceptable to everyone </AXIOM> <AXIOM> If no attempt is made to address these problems we shall either end up with a Babel of incompatible solutions, or wait feebly for some powerful autonomous entities to dictate a limited set of actions. </AXIOM> <AXIOM> We have to be careful to avoid the 'only processable with software X' syndrome</AXIOM> <AXIOM> There is a critical mass of readers of this list who feel the need to address the problem. </AXIOM> <AXIOM> Anyone can use any PIs they like in their documents for whatever purposes they like without breaking the spirit of XML. </AXIOM> <AXIOM> That processing software need not (and so far won't) take any notice of these (or perhaps any) PIs </AXIOM> <AXIOM> If a few people find a way of doing something that works for them, and isn't against the spirit of the XML specs, then flaming their ideas is pointless.</AXIOM> </LIST> <NOTE>The proposal I really want to address is, like Month Python's joke, so potentially dangerous that I dare not reveal it yet. The proposal here is also important to me - perhaps to others - and I hope servers as a useful example. It is NOT in a finalised form, but as can be seen from the meta-proposal, there is a method for referring to the a 'pseudo-final' form that is, at least, usable. </NOTE> <META-PROPOSAL> That a PI of the form <?XDEV?> is 'reserved' by members of this list for PI-based proposals on this list. [We cannot use XML-DEV as 'XML' is rightly reserved.] That anyone can post a proposal to this list for the use of this PI. That any author can include an instance of such a proposed PI in their document. That any writer of application software can write software to process such a PI. That both of these should refer to an appropriate URL on this list's archive discussing outlining the use of this PI. That if someone doesn't approve of a proposal they ignore it rather than flaming it. The fittest ideas will survive. </META-PROPOSAL> <EXAMPLE> <![CDATA[ Assume a porridge cooker (a real piece of equipment) is controlled by an XML-document. (This is not governed by the current XSL proposal.) FatherBear proposes a particular use for XLL's ROLE attribute when used to link to <PORRIDGE> elements. There is much discussion on XML-DEV. FatherBear's views are too hot and do not find favour. MotherBear proposes an alternative. There is much more discussion but it doesn't get much further. MotherBear is cool, but too cool. BabyBear makes a third proposal. This is 'just right' for many people (but not everyone, of course). Various people suggest that they could work along with BabyBear's proposal. BabyBear and others hack it into shape. A set of suggested guidelines is posted to XML-DEV. Goldilocks E-pubCorp says that its userAgent will now support BabyBear's proposal, and bolts it into their authoring tool. [None of this need come anywhere near the XML-WG, XML-SIG, W3C or anything else.] The documents authored according to the BabyBear proposal might look like: <MEAL> <PORRIDGE> <?XDEV HREF="http://www.lists.ic.ac.uk/hypermail/xml-dev/9711/6789.html" PORRIDGE-COOKER="BabyBear"?> <OATMEAL> <UNITS>kilograms</UNITS> <VALUE>100</UNITS> </OATMEAL> <TIME> <UNITS>minutes</UNITS> <VALUE>20</UNITS> </TIME> <NACL>NULL</NACL> <RECIPE XML-LINK="SIMPLE" HREF="bears.org/recipes/porridge.xml" ACTUATE="AUTO"> </PORRIDGE> <?XDEV PORRIDGE-COOKER="NULL"?> </MEAL> The JUMBO porridge-cooker (from all good e-marts) is XML-aware and recognises the XDEV PI. The author of PORRIDGE.java makes sure that the software is compliant with the proposal in BabyBear's posting to XML-DEV. The JUMBO corp publishes some amendments to the cooking process (e.g. <FIRE-WARNING>). That's all, folks. Nobody *has* to do any of this. FatherBear teams up with the BigBadWolf corp. Their ideas do not flourish. They simply miss out on porridge. ]]> </EXAMPLE> Now a real proposal. <BACKGROUND> I wish to display objects on the screen in a way not supported by XLL and XSL. Specifically I have an element (object) which may be displayable on its own ('standalone') or may be displayable in the context of another object (perhaps a parent container). An example might be a PERSON in an ORGCHART. (Actually I want to display ATOMS in MOLecules, of course). I might wish to create an XML-LINK to a PERSON which displayed that PERSON. Alternatively I might wish to create a link to that PERSON for display in the context of the org-chart (i.e. when I actuate that link, the org-chart is displayed and all other linked PERSONs. I wish to use the BEHAVIOR attribute of XLL for this. No values for this attribute are defined at present, and some XML-SIGers have suggested that there will never be a definitive list. If values are chosen at random by the community, then in a year's time we shall have chaos on the behaviour attribute. So this proposal can be seen as suggesting a wider discussion of possible values for BEHAVIOR. Note that we don't all have to agree :-). It is perfectly possible that two incompatible proposals appear. Both can use XDEV, but point to different URLs (and hopefully have different mnemonics). No problem. A user agent can implement on, both or neither. What I want to avoid is nine-and-sixty ways that BEHAVIOR is used with no public specification of any semantics. </BACKGROUND> <PROPOSAL> That two attribute values for XML-LINK's BEHAVIOR attribute be recognised through an XDEV PI: BEHAVIOR="DisplayStandAlone" BEHAVIOR="DisplayInContext" That for the second option an additional attribute CONTEXTREF is required, whose value is a valid URL and points to the XML element providing the display context of the current element. The actual details of display are application (and possibly stylesheet) dependent. </PROPOSAl> <NOTE> *This* proposal is identifiable through HREF="http://www.lists.ic.ac.uk/hypermail/xml-dev/9711/xxxx.html" (where xxxx represents the actual address of *this* hypermail (e.g. 6789 :-) A user agent can be given the option of operating these semantics by a PI of the form: <?XDEV HREF="http://www.lists.ic.ac.uk/hypermail/xml-dev/9711/6789.html" XML-LINK-BEHAVIOR="ContextDisplay"?> and can revert to the default or previous semantics by: <?XDEV HREF="http://www.lists.ic.ac.uk/hypermail/xml-dev/9711/6789.html" XML-LINK-BEHAVIOR="Default"?> <!-- or "Off", "Previous", etc. --> </NOTE> <EXAMPLE> Note that the BEHAVIOR attribute is 'inherited' by all the PERSONs (see XLL spec). <![CDATA[ <PERSON XML-LINK="SIMPLE" HREF="boss.xml" BEHAVIOR="DisplayProminently"> <?XDEV HREF="http://www.lists.ic.ac.uk/hypermail/xml-dev/9711/6789.html" XML-LINK-BEHAVIOR="ContextDisplay"?> <PROJECT XML-LINK="EXTENDED" BEHAVIOR="DisplayInContext"> <PERSON XML-LINK="LOCATOR" HREF="fred.xml" CONTEXTREF="saltmines.xml#DESCENDANT(1,ORGCHART)"/> <PERSON XML-LINK="LOCATOR" HREF="wilma.xml" CONTEXTREF="saltmines.xml#DESCENDANT(1,ORGCHART)"/> <PERSON XML-LINK="LOCATOR" HREF="sally.xml" CONTEXTREF="galleys.xml#DESCENDANT(1,ORGCHART)"/> <PERSON XML-LINK="LOCATOR" HREF="sue.xml" BEHAVIOR="DisplayStandAlone"/> </PROJECT> <?XDEV HREF="http://www.lists.ic.ac.uk/hypermail/xml-dev/9711/6789.html" XML-LINK-BEHAVIOR="Off"?> </PROJECT> ]]> The display attribute for boss.xml is not activated by this proposal (there may be a default local protocol for displaying bosses.) The PI switches on a display protocol whereby the team members Fred, Wilma and Sally are displayed in the context of their org-charts (in this case different). Sue is displayed standalone. [The user agent knows how to display objects standalone and in the context of other objects. Note that this can be a fairly generic mechanism - JUMBO acts on any objects which provide a display() method - not just PERSON. It also has/will_have a highlightInContext(Node n) method for displaying *this* in the context of Node n.] </EXAMPLE> P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Sat Nov 15 19:58:35 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:58:54 2004 Subject: Query Languages for XML References: <Ymq81JA+dVb0EwA5@light.demon.co.uk> Message-ID: <346DFFA1.8C0F1ECD@technologist.com> Richard Light wrote: > Obviously this is not just an XML problem: it applies equally to SGML, > which is in effect a "read only" standard. One example of this is in > the style language's (DSSSL or XSL) online support. As far as I am > aware, there is no support for any features of forms. I do not believe that this is true. As I understand it, you can create a "HTML form element" flow object and an "HTML input element" flow object within it. > Yet, if SDQL had > primitives such as (insert-node), (replace-node) and (add-text-to-node) > it wouldn't be too hard to add an "input-line" flow object type. DSSSL has no provisions for adding flow object types in DSSSL code. So we are essentially talking about the DSSSL implementation language (Java or C, probably) It is as easy in these languages to define and implement an "input field" flow object as it is to make a "hyperlink" flow object. There is no need to extend SDQL. The input text does not have to be part of the flow object tree. The "input text" flow object can handle the interactivity just as the "hyperlink" flow object does. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Sat Nov 15 21:17:44 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:58:54 2004 Subject: Query Languages for XML References: <Pine.LNX.3.96.971115153253.15181C-100000@dub> Message-ID: <346E1234.D549A734@technologist.com> Graydon Hoare wrote: > If you run jade in sgml-to-sgml mode, you can make HTML out of arbitrary > XML, but I don't think there are flow-objects representing forms. Sorry I was talking about XSL. As I understand it, XSL will allow you to create any HTML element. > What would it mean to take a form > flow object and render it through a TeX backend? The "interactive" nature > is gone. What happens to a combo-box? About the same as the printed rendition of a link or scroll flow object. It would be completely useless. Stylesheets are tied to a particular medium. Online stylesheets should have elements (link, input, scroll) that allow interactivity and print-oriented stylesheet languages should have elements that describe pages etc. > I think the question being asked is whether you could make an input-text > flow object which had a clearly defined semantics in altering your XML > grove, not your flow object tree. For this, you would need an abstraction > for the form submission/editing cycle, and such SDQL primitives as richard > was mentionning. Only if you take the approach that DSSSL code must manage the form interactivity process. I don't see why it must. It seems simplest to methat it should do the moral equivalent of "put a button here" and leave the processing of the button click to JavaScript, Java or C++. > It makes sense -- he's basically talking about getting > full grove-manipulation into the query language so you can consider the > grove a simple object database. OMDG OQL is probably worth looking over. I can see that, but I don't think it necessarily has anything to do with form input. The SQL model (I'm not familiar with OQL) is that a host language (COBOL, PowerScript, JavaScript, Java, whatever) handles the interactivity and issues data model update instructions. SQL does not handle the user interface itself. In other words, the vast majority of forms will have nothing to do with the document grove itself. They may be forms designed to talk to relational databases or object databases or CGI or whatever. We can create these forms immediately, without touching SDQL. Yes, it would be cool if SDQL allowed grove updates, and of course we expect that if it did, you would be able to call it from the code that handles your button, just as you could call SQL or OQL etc. Anyhow, I think that the DOM allows updates, so if you use DOM functions as your "query language" and the DOM model as your grove, then you will have a read/write document query language. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Sun Nov 16 03:08:58 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:58:54 2004 Subject: <?XDEV?> and BEHAVIOR: a meta-proposal and a proposal Message-ID: <199711160305.OAA25372@jawa.chilli.net.au> > From: Peter Murray-Rust <peter@ursus.demon.co.uk> > <PROPOSAL> > That two attribute values for XML-LINK's BEHAVIOR attribute be recognised > through an XDEV PI: > BEHAVIOR="DisplayStandAlone" > BEHAVIOR="DisplayInContext" > That for the second option an additional attribute CONTEXTREF is required, > whose value is a valid URL and points to the XML element providing the > display context of the current element. > The actual details of display are application (and possibly stylesheet) > dependent. > </PROPOSAl> Another approach might be to use the name prefix XDEV: on attribute values, e.g. BEHAVIOUR="XDEV:DisplayStandAlone" and the contextref attribute you suggest, e.g. BEHAVIOUR="XDEV:DisplayInContext" XDEV:CONTEXTREF="saltmines.xml#DESCENDANT(1,ORGCHART)" Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Nov 16 10:51:21 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:54 2004 Subject: <?XDEV?> and BEHAVIOR: a meta-proposal and a proposal In-Reply-To: <199711160305.OAA25372@jawa.chilli.net.au> Message-ID: <3.0.1.16.19971116114140.2d8f7cae@pop3.demon.co.uk> At 14:09 16/11/97 +1100, Rick Jelliffe wrote: > > >> From: Peter Murray-Rust <peter@ursus.demon.co.uk> > >> <PROPOSAL> >> That two attribute values for XML-LINK's BEHAVIOR attribute be recognised >> through an XDEV PI: >> BEHAVIOR="DisplayStandAlone" >> BEHAVIOR="DisplayInContext" >> That for the second option an additional attribute CONTEXTREF is required, >> whose value is a valid URL and points to the XML element providing the >> display context of the current element. >> The actual details of display are application (and possibly stylesheet) >> dependent. >> </PROPOSAl> > >Another approach might be to use the name prefix XDEV: on attribute >values, e.g. > > BEHAVIOUR="XDEV:DisplayStandAlone" I hadn't thought of these possibilities, thanks Rick. This one is fine and legal, but requires the processor (all friendly processors) to look for namespaces in attribute values. Since attributes can have colons for many other reasons I suspect this approach will cause problems. For example: WAKE-UP-TIME="12:00" > >and the contextref attribute you suggest, e.g. > > BEHAVIOUR="XDEV:DisplayInContext" > XDEV:CONTEXTREF="saltmines.xml#DESCENDANT(1,ORGCHART)" This relies on the namespace proposal being adopted for attribute names. I don't know where that has got to, and its probably confidential to XML-SIG. I can see its attraction in cases like this. The namespace allows the pre-colon prefix to be mapped to a schema file, which could - in turn - contain a reference to the XML-DEV posting(s). Something like XML:BEHAVIOR would *not* be a good idea because it would be a different attribute from BEHAVIOR. So if only the BEHAVIOR attribute were altered (BEHAVIOR="XDEV:BLINK") there would be no formal method of picking it up. I cannot remember whether PIs can be linked to schema files, i.e. something like: <?xml:namespace HREF="http://www.lists.ic.ac.uk/hypermail/xml-dev/9711/6789.html" AS="XDEV"> <?XDEV XML-LINK-:BEHAVIOR="ContextDisplay"> P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From lauren at sqwest.bc.ca Sun Nov 16 18:48:36 1997 From: lauren at sqwest.bc.ca (Lauren Wood) Date: Mon Jun 7 16:58:54 2004 Subject: Query Languages for XML In-Reply-To: <346E1234.D549A734@technologist.com> Message-ID: <m0xX9jh-0009XMC@sqailor.sqwest.bc.ca> > From: Paul Prescod <papresco@technologist.com> > Anyhow, I think that the DOM allows updates, so if you use DOM > functions as your "query language" and the DOM model as your grove, > then you will have a read/write document query language. And the DOM group itself does not wish to use a different syntax for the generalized queries if we can find one already developed that meets our needs. So we will be looking at XLL Xpointer syntax as well as whatever XSL does, and watching what the RDF group chooses since they are also talking about query interfaces. I personally would be happy if everyone could use the same syntax, as long as it meets the DOM needs. cheers, Lauren -- Lauren Wood, SoftQuad, Inc Chair, W3C DOM Activity xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Sun Nov 16 19:35:59 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:54 2004 Subject: Query Languages for XML In-Reply-To: <346C90F2.A4566219@isogen.com> References: <Pine.LNX.3.95.971114104856.9047F-100000@ganymede.isdn.uiuc.edu> Message-ID: <3.0.3.32.19971116132158.00953c30@pop.access.digex.net> At 09:57 AM 11/14/1997 -0800, you wrote: W. Eliot Kimber wrote: <eliot@isogen.com> >[...] Any new language >would, I think, mostly be an exercise in syntax definition, most of >which is already inherent in the development of XSL (which is nothing >more than a language for applying processes to the results of queries on >XML groves). The language would also consist of semantics -- a standard interpretation of the syntax. Some application will have to make sense of the query based on its semantics. The syntax and the semantics of that syntax would have to be standardized so that the applications that are developed to interpret the queries all interpret the queries identically. XSL may be thought of as a query-only tool or as a translation tool. As Richard Light explained, it doesn't provide a mechanism for modifying the originally queried XML document. XSL could be used to convert portions (or all) of an XML document from one XML representation to another, provided that flow object types were available for all element types in the XML document. However, there is no standard mechanism in place to update the original XML document. You might use XSL to create a replacement document and then upload the replacement, but this is not conducive to having many users concurrently querying and updating the document (you'd have to lock the whole document). >From another point of view, it's not possible to have *an* XML query >language because there are too many different ways that you might want >to access XML data: as nodes in groves ala SDQL, as full text using some >full-text index, as semantic-specific objects using some domain-specific >query mechanism, etc. I agree that there will be many different ways to query a document and that it is not possible to anticipate them all in advance. One (read-only) query might be analogous to XSL's pattern rules, which queries based on the physical structure of the document. Another query might be analogous to an AltaVista-style word-based search. Still another might operate by traversing XML's linking facilities or by tranversing RDF's associations. I think this is where XML's extensibility comes into play. We would define a standard query language for the most common querying activities, such as those in XSL patterns (XSL patterns might be the basis of the language). If a user wishes to query an engine that handles extensions, it is likely that the user will want to mix standard query operations with the extended queries. Each query (even each extended query) is likely to return a result set that takes the form of an XML document (a virtual one). Such result sets would be ameniable to additional querying via the XML-Query standard (perhaps all within the same complex query statement). Furthermore, every XML query engine would be able to parse every query. Each would be able to identify the constructs that are not available to the engine. There might be a way for the engine to delegate the queries or operations associated with those constructs to other (perhaps specialized) query engines. If not, the engine could return a meaningful error message to the user (e.g. "Element type 'full-word-index' not supported.") >A language like SDQL coupled with an XML property set (that is, the >subset of the SGML property set needed to represent XML documents) >provides a complete set of operations for querying XML documents >represented as groves. [...] I don't know a thing about SDQL, and I'm having trouble finding useful material on the net. Could someone please point me to something that might be accessible to someone having no DSSSL experience and having only a very rudimentary knowledge of LISP? xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Sun Nov 16 19:36:14 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:54 2004 Subject: Query Languages for XML In-Reply-To: <346DFFA1.8C0F1ECD@technologist.com> References: <Ymq81JA+dVb0EwA5@light.demon.co.uk> Message-ID: <3.0.3.32.19971116143342.0095e950@pop.access.digex.net> Paul Prescod <papresco@technologist.com> wrote: >[...] As I understand it, you can create a >"HTML form element" flow object and an "HTML input element" flow object >within it. [...] >DSSSL has no provisions for adding flow object types in DSSSL code. So >we are essentially talking about the DSSSL implementation language (Java >or C, probably) [...] I'm not sure that this approach addresses the need to have a standard mechanism by which (server-side) XML documents are updated. We'd simply be relegating the standard to being defined by OMG IDL interfaces, as is done in DOM. We could rely on (future) DOM-defined query mechanisms, except that the DOM approach does not provide the kind of flexibility that an XML-based language would provide. For example, in the DOM approach, our queries must be programs, whether they are written in Java, C, C++, VB, or some script language. But then we'd wish we had defined the script language. User's wouldn't have to learn a different language for generating queries on each platform, and the queries themselves would be transportable between platforms. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Sun Nov 16 19:36:33 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML In-Reply-To: <346E1234.D549A734@technologist.com> References: <Pine.LNX.3.96.971115153253.15181C-100000@dub> Message-ID: <3.0.3.32.19971116143400.0095e210@pop.access.digex.net> At 04:20 PM 11/15/1997 -0500, you wrote: >[...] >> I think the question being asked is whether you could make an input-text >> flow object which had a clearly defined semantics in altering your XML >> grove, not your flow object tree. For this, you would need an abstraction >> for the form submission/editing cycle, and such SDQL primitives as richard >> was mentionning. > >Only if you take the approach that DSSSL code must manage the form >interactivity process. I don't see why it must. It seems simplest to >methat it should do the moral equivalent of "put a button here" and >leave the processing of the button click to JavaScript, Java or C++. A standard, platform-independent query language has its merits: (1) Many people use SQL without knowing a thing about programming. It's easier to learn a tiny language than to have to learn a big language in order to make use of a tiny library. SQL is very useful as a filter- specification language. This allows database administrators to manage a database by specifying complex filters that the database tool uses to select the elements to process. The user does not have to know the language in which processes are defined. Imagine trying to manage a database by having to write a different program (or plug-in) for each query operation you wished to perform. (2) If the query language were defined as APIs (interfaces or IDLs) for use in an existing programming language, a person versed in manipulating a database from one language may find his skills of less value when he's asked to manipulate a database in a language he does not yet know. An administrator's skills (or DB developer's skills) are much more valuable if they are directly usable in many different environments. (3) If a query must be expressed in a particular programming language, that query will not be directly usable in other programming languages or other environments. It is very likely that the query would have to be embedded in a plug-in module (or COM component or JavaBean), and that module will not be directly usable in any other environment -- perhaps not even outside the original application for which it was intended. If a standard language were used, applications could share queries, queries could be stored away for future retrieval, and users could share each other's queries just by handing each other files. >[...] The SQL model (I'm not familiar with OQL) is that a host >language (COBOL, PowerScript, JavaScript, Java, whatever) handles the >interactivity and issues data model update instructions. SQL does not >handle the user interface itself. OQL looks very much like SQL, except that it has extensions for accessing object-oriented databases, and except that it throws out the non-object-oriented update mechanisms of SQL. It still uses the SELECT ... FROM ... WHERE syntax. However, both SELECT statements and object methods can return result sets that can be further operated on. OQL does not have the UPDATE or INSERT statements. To perform equivalent actions you must use methods on objects. Such objects might be individuals or the objects in collections retrieved via the query semantics. >In other words, the vast majority of forms will have nothing to do with >the document grove itself. They may be forms designed to talk to >relational databases or object databases or CGI or whatever. We can >create these forms immediately, without touching SDQL. Yes, it would be >cool if SDQL allowed grove updates, and of course we expect that if it >did, you would be able to call it from the code that handles your >button, just as you could call SQL or OQL etc. I think there is a whole class of applications that could arise from being able to manage XML documents from clients. Consider knowledge repositories that retain data in a semantic form (in XML). Users could perform semantic-based queries and updates and all participate together in generating a semantic model and information warehouse. It may be that existing applications won't have much use for the kind of query language I'm proposing -- I'm looking toward the future. >Anyhow, I think that the DOM allows updates, so if you use DOM functions >as your "query language" and the DOM model as your grove, then you will >have a read/write document query language. This is a very significant point. I expect that DOM will define query operations on its objects, so that via IDLs, programs will be able to remotely manage persistent XML databases. However, for reasons I've given in other posts, I think an XML-based query language is necessary. The form of that query language might mirror the form defined by DOM, but the query language will necessarily provide constructs not named by DOM. DOM assumes the existence of a Turing-complete programming language. Just as SQL has, we would need to have mechanisms for piping filters through each other and for performing operations on the result sets. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sun Nov 16 22:07:39 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:58:55 2004 Subject: XML and case-folding Message-ID: <199711162207.RAA00749@unready.microstar.com> I remember some postings recently wondering about the implications of making elements (etc.) case-sensitive in XML. I remember only Tim's followup about IDs -- apologies if I'm going over well-worn ground here. As I understand it, if you set both NAMECASE GENERAL and NAMECASE ENTITY to "NO" in full SGML, then there will be no case substitution anywhere. Since XML is an SGML application profile, that means that you may use <!ELEMENT ...>, <!ATTLIST ...>, <!NOTATION ...>, and <!ENTITY ...> but NOT <!Element ...>, <!Attlist ...>, <!Notation ...>, and <!Entity ...> or <!element ...>, <!attlist ...>, <!notation ...>, and <!entity ...> Furthermore, all element type names, attribute names, notation names, entity names, _and_ attribute values (of any type) are also case sensitive. As a result, if you had this in your XML DTD: <!ATTLIST doc security (unclassified|secret) #REQUIRED> and this in your XML document: <doc security="SECRET"> the parser should report an error. It also means that something like this is legal (though pathologically wierd): <!ATTLIST question value (yes|Yes|yEs|yeS|YEs|YeS|yES|YES)> The contents of processing instructions are never subject to case substitution anyway, though the validation of their contents is also mostly beyond (full) SGML's mandate; for consistency, however, it would make sense to require everything there to be in upper-case as well. In other words, <?XML VERSION="1.0" ENCODING="ISO-LATIN-1"?> would be acceptable, but not <?Xml version="1.0" encoding="Iso-Latin-1"?> Any comment on this last point? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mtbryan at sgml.u-net.com Mon Nov 17 08:50:46 1997 From: mtbryan at sgml.u-net.com (Martin Bryan) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML Message-ID: <E0xXMx9-0007eb-00@oveja> Joe Lapp wrote: >You might use XSL to create a replacement >document and then upload the replacement, but this is not >conducive to having many users concurrently querying and updating >the document (you'd have to lock the whole document). There is no need to lock the whole document, just that part of the document that consititutes an updatable record for the database it is being used to update or being updated from. Such a record could consist of a a number of contiguous fields, a set of discrete fields taken from appropriate parts of a document, or even a single field. There in no reason why fields not likely to be affected by change need to be locked in any way. Martin Bryan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mecom-gmbh at mixx.de Mon Nov 17 11:17:25 1997 From: mecom-gmbh at mixx.de (james anderson) Date: Mon Jun 7 16:58:55 2004 Subject: what's the reason for mixed data? Message-ID: <199711171117.MAA19602@hermes.mixx.net> greetings, we're trying to understand why the XML spec (wrt "http://www.w3.org/TR/WD-xml-970807.html") specifies a special status for elements which contain mixed data. to make it specific, why is [43] cp ::= (Name | choice | seq) ('?' | '*' | '+')? not [43] cp ::= (#PCDATA | CDATA | Name | choice | seq) ('?' | '*' | '+')? ? what's the reason to specify a form (mixed data) which <EM>must</EM> permit repetition and arbitrary order as soon as PCDATA is allowed? to give an example of the problem, assume the following CLOS declarations: (defClass class-1 () ((slot-1 :type string))) (defClass class-2 () ((slot-2 :type (or string class-1)))) how would this be declared? <!ELEMENT class-1 (slot-1)> <!ELEMENT class-2 (slot-2)> <!ELEMENT slot-1 #PCDATA> <!ELEMENT slot-2 (#PCDATA | class-1)> makes sense, but would seem to be disallowed by [50] Mixed ::= '(' S? %( %'#PCDATA' (S? '|' S? %Mtoks)* ) S? ')*' | '(' S? %('#PCDATA') S? ')' which would appear to stipulate the repetition as soon as elements and PCDATA appear together. on the other hand, <!ELEMENT slot-2 (#PCDATA | class-1)*> would not be a correct translation, since that is the equivalent of (defClass class-2 () ((slot-2 :type (list (or string class-1))))) can anyone explain the ')*' requirement in [50]? thanks, james. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Mon Nov 17 14:45:05 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML In-Reply-To: <E0xXMx9-0007eb-00@oveja> Message-ID: <3.0.3.32.19971117094533.0094fce0@pop.access.digex.net> "Martin Bryan" <mtbryan@sgml.u-net.com> wrote: >Joe Lapp wrote: >>You might use XSL to create a replacement >>document and then upload the replacement, but this is not >>conducive to having many users concurrently querying and updating >>the document (you'd have to lock the whole document). > >There is no need to lock the whole document, just that part of the document >that consititutes an updatable record for the database it is being used to >update or being updated from. Such a record could consist of a a number of >contiguous fields, a set of discrete fields taken from appropriate parts of >a document, or even a single field. There in no reason why fields not likely >to be affected by change need to be locked in any way. I agree that under the appropriate circumstances you wouldn't have to lock the whole document. However, were you to do the trick with what is currently XSL, it seems to me that you would have to create a _replacement_ document and then replace the original document. If in the time between reading the original and generating the replacement another user reads the original, and if you the other user posts his replacement after you post your replacement, then your changes do not take. Or maybe you are suggesting there is no need to replace the whole document using an XSL approach. XSL or some other XML standard would need to define a standard mechanism for identifying and modifying a portion of a document. I am aware of some sort of 'chunking' initiative, but I don't know exactly what the scope of the effort is. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mtbryan at sgml.u-net.com Mon Nov 17 16:06:07 1997 From: mtbryan at sgml.u-net.com (Martin Bryan) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML Message-ID: <E0xXTks-0001SF-00@oveja> Joe Lapp wrote: >I agree that under the appropriate circumstances you wouldn't have >to lock the whole document. However, were you to do the trick with >what is currently XSL, it seems to me that you would have to create >a _replacement_ document and then replace the original document. This presumes that the "document" is the thing you want to remove. What if: a) the document was built from a set of entities? b) only part of the document consisted of updatable data fields? The key factor is "what proportion of the data needs to be modified?" >If in the time between reading the original and generating the >replacement another user reads the original, and if you the other >user posts his replacement after you post your replacement, then >your changes do not take. Always a problem with databases, but fields that are "temporarily locked" can always be assigned an attribute that the presentation software can use to indicate that the data is in a state of flux to read-only users of the data during the update period. >Or maybe you are suggesting there is no need to replace the whole >document using an XSL approach. XSL or some other XML standard would >need to define a standard mechanism for identifying and modifying a >portion of a document. The XML/EDI crew will be looking into this problem as it is key to running an electronic business using XML. > I am aware of some sort of 'chunking' >initiative, but I don't know exactly what the scope of the effort is. Why not join the XML/EDI research teams (see http://www.xmledi.net for details) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From simeons at allaire.com Mon Nov 17 16:07:51 1997 From: simeons at allaire.com (Simeon Simeonov) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML Message-ID: <01bcf373$8dcc8230$4a15b5cd@sim.allaire.com> Joe Lapp wrote: >This is a very significant point. I expect that DOM will define >query operations on its objects, so that via IDLs, programs will be >able to remotely manage persistent XML databases. However, for >reasons I've given in other posts, I think an XML-based query >language is necessary. The form of that query language might >mirror the form defined by DOM, but the query language will >necessarily provide constructs not named by DOM. DOM assumes the >existence of a Turing-complete programming language. Just as SQL >has, we would need to have mechanisms for piping filters through >each other and for performing operations on the result sets. The simpler operations of an XML-based query language can have profound impact on the usability of data on the Web. The high demand for web applications is drawing individuals with little to no programming experience to web development. They may find it quite difficult to write a script-based traversal algorithm using the XML DOM to extract some piece of information from a document. However, experience from the client-server world tells us that most people can easily learn how to formulate simple SELECT statements in SQL. I would speculate that, if a standard does not emerge by the next browser releases, vendors will move to provide their own query mechanisms. Why? Because they would like to make the consumption of arbitrary XML from within HTML as easy as possible. (IE4 DSOs are a move in the right direction.) Simeon Simeonov Allaire xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Mon Nov 17 17:16:39 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML In-Reply-To: <E0xXTks-0001SF-00@oveja> Message-ID: <3.0.3.32.19971117121706.0094fc90@pop.access.digex.net> "Martin Bryan" <mtbryan@sgml.u-net.com> wrote: >This presumes that the "document" is the thing you want to remove. What if: > >a) the document was built from a set of entities? >b) only part of the document consisted of updatable data fields? > >The key factor is "what proportion of the data needs to be modified?" I think I've just discovered that we are both arguing for the same thing. My point is exactly that the _document_ is not the smallest unit we care to change. I just meant to point out that because we care for finer granularity, and because currently no standard exists for updating at arbitrary granularity, we need a standard. I was giving an example with XSL only to demonstrate that XSL does not itself provide us with a way to work at that granularity. Currently, using XSL alone, we'd be replacing the entire document -- which is exactly what we _do_not_ want to do. >The XML/EDI crew will be looking into this problem as it is key to running >an electronic business using XML. I am preparing a report on what I believe is the fundamental issue, and I hope to post it before the day is out. >Why not join the XML/EDI research teams (see http://www.xmledi.net for >details) I'll look into it as soon as I finish this report. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Mon Nov 17 17:28:35 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:58:55 2004 Subject: <?XDEV?> and BEHAVIOR: a meta-proposal and a proposal In-Reply-To: <3.0.1.16.19971115184507.1fafd4be@pop3.demon.co.uk> References: <3.0.5.32.19971114094153.00917d30@pop.access.digex.net> Message-ID: <v03007801b09685590445@[205.181.197.113]> I want to respond to the "meta-proposal" a bit, because I disagree with some of the axioms, and the proposed procedures. I don't have time or energy right now to respond to the specific proposal, though I may well do so later (based on my own, somewhat divergent, axioms). At 6:45 PM -0000 11/15/97, Peter Murray-Rust wrote: ><LIST> > <AXIOM>In any of these cases there is no general solution acceptable to >everyone > </AXIOM> > <AXIOM> If no attempt is made to address these problems we shall either >end up with a Babel of incompatible solutions, or wait feebly for some >powerful autonomous entities to dictate a limited set of actions. > </AXIOM> Not necessarily. In fact, for many problems the correct response is to ensure that the stylesheet ans processing specification langauges can _implement_ each of the specific solutions desired, _without_ forcing the specific solutions on whihc divergence of opinion may exist. More on this with the "PI" axioms. > <AXIOM> We have to be careful to avoid the 'only processable with >software X' syndrome</AXIOM> Yes. The way to do this is to _avoid_ PIs as much as possible. PIs that are required to interpret a document correctly are _inherently_ anti-portability, since the rule for PIs is that _any application_ should be free to ignore them without changin the meaning of the document. The use of SGML's PI syntax in XML is a not a good model for the use of PIs in general, since they are being used in XML as a syntactic "escape hatch" for compatibility with SGML. It would not be necessary (or desirable) if XML were not (to some very small extent) changing SGML facilitied (as with specifying the character encoding of entitites in PIs, rather than an SGML declaration). If XML had been able to add declarations to SGML, that would have been done instead of using the PI syntax. > <AXIOM> There is a critical mass of readers of this list who feel the >need to address the problem. </AXIOM> Without a problem statement I'm not sure how to judge this, but it may well be true. > <AXIOM> Anyone can use any PIs they like in their documents for whatever >purposes they like without breaking the spirit of XML. </AXIOM> This is assuredly incorrect. PIs are intended for use in the case where a practical _use_ of a document with _particular software_ requires additional information that _should not_ have been indicated ina structural description of the content. A paradigmatic example is the occasional need to insert a page or column break in order to get acceptable formatting in a particular processing situation (including: software, stylesheet, output device). This is not information that _should_ be encoded in the abstract representation of a document, but _may be essential_ for "getting the thing to print right". > <AXIOM> That processing software need not (and so far won't) take any >notice of these (or perhaps any) PIs > </AXIOM> This is certainly essential. If you are saying something about you document that you can imagine being useful to some software that you aren't using right now -- then it should probably be in the markup. PIs are for things that can be ignored without changing the interpretation of a document. > <AXIOM> If a few people find a way of doing something that works for >them, and isn't against the spirit of the XML specs, then flaming their >ideas is pointless.</AXIOM> Even this is not necessarily true -- attacking the dissemination of false or bad ideas is _never_ pointless, in that dissemination of bad information (even if it serves a local porpose adequately well) can seriously mislead people. For instance the use of styles in word-processing programs is usually a very good idea. The fact that in some instances direct formatting may work out, or even work better, should not stop people from quarreling with public assertions about the utility of stylesheets based on those situations. To the extent that these axioms seem to be intended to rule out disagreement of the merits of future proposals, I must take immediate and strong exception to them. It's not possible for a responsible discussant who disagrees with a public proposal of working practice to remain silent on the topic. "Flaming" is usually not responsible discussion, but principled disagreements should be expressed so that the issues are clear to all. ></LIST> ><NOTE>The proposal I really want to address is, like Month Python's joke, >so potentially dangerous that I dare not reveal it yet. The proposal here >is also important to me - perhaps to others - and I hope servers as a >useful example. It is NOT in a finalised form, but as can be seen from the >meta-proposal, there is a method for referring to the a 'pseudo-final' form >that is, at least, usable. ></NOTE> This makes me nervous ><META-PROPOSAL> >That a PI of the form <?XDEV?> is 'reserved' by members of this list for >PI-based proposals on this list. [We cannot use XML-DEV as 'XML' is rightly >reserved.] We can certainly do this -- but as I said above, there are good reasons to oppose the use of PIs for _any_ use that affects the semantics of documents. For example, even the proposed namespace PI would be vulnerable on this account, except for the facts that: 1. It's intended for use in _experiment_ with a proposed _extension_ of XML. (In other words, the PI, should it be generally accepted for use with all interested XML applications, would become part of XML). 2. The prefix can be processed (and thus, the semantic information accessed) _without_ software having to be aware of the namespace PI. In other words, the PI can be treated as equivalent to a comment describing the proposed intent of the tags that share a prefix. (In other words, you can ignore the namespace PI, and still detect the semantic distinctions in the document) > >That anyone can post a proposal to this list for the use of this PI. Anyone can post anything anywhere. >That any author can include an instance of such a proposed PI in their >document. Again, any author can put anything they want anywhere, good idea or not. >That any writer of application software can write software to process such >a PI. Again, how could anyone stop them? >That both of these should refer to an appropriate URL on this list's >archive discussing outlining the use of this PI. Certainly not a bad idea.. >That if someone doesn't approve of a proposal they ignore it rather than >flaming it. The fittest ideas will survive. In the long run this may (or for a number of reasons may not) be true. However, bad ideas that are initially plausible but unworkable in the long term (e.g., from a related, but different doamin, the creation and management of large structured information cropora in raw HTML) would get an artificial (and community-harmful) boost if an effective social convention forbidding disagreement were in effect. I agree that polite, reasoned disagreement is better than flaming (impolite, ad-hominem disagreement) but in the intellectual world the unfit perish faster under the lash of criticism. ></META-PROPOSAL> _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Mon Nov 17 17:48:41 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML Message-ID: <3.0.32.19971117112909.00bc757c@swbell.net> At 08:04 AM 11/15/97 +0000, Richard Light wrote: >One important thing about "Standard Query Language" is that it doesn't >just query. It is actually a complete language for "defining, accessing >and otherwise managing relational databases". In other words, SQL, in addition to enabling *queries* (that is, request for information about tables) is *also* an editor scripting language where the documents are relational tables. The SGML/XML world view can be thought of as a place where there are two fundamental types of activity: query and edit. A query is always read-only. An edit results in a new document. This also suggests that there is no fundamental difference between editors and document management systems that manage abstractions of documents (like Crystal's Astorial or Texcels Information Manager). In other words, a document management system is just a very beefy editor with a poor user interface or editors are weak document management systems with poor persistence but good interfaces. Thus, SDQL is a "pure" query language in that it's only purpose is to return the results of queries on the properties of nodes in groves. However, the DSSSL transformation language can be thought of as an editing scripting language because the result of applying a DSSSL transformation to a document is a new document. Note that it doesn't matter how the creation of the new document is *implemented*. Whether you literally generate an entirely new grove from scratch or simply add and remove nodes and properties from the one you have, the result is the same: a new grove, which means a new document. DSSSL simplifies its abstract processing model by making groves static *in the abstract*. However, implementations are free to make groves dynamic *under the covers*. Remember also that unless you're talking about SED scripts or Perl hacks, it's not meaningful to talk about operations on XML documents--it's only meaningful to talk about operations on abstractions of XML documents, i.e., groves. This is why both the DSSSL and HyTime standards are defined in terms of operations on groves, not operations on SGML documents. If we define "editing" as the process by which the abstraction of a document is modified and a new document is created (here using the term "document" as it's defined by SGML and XML, that is, a character string conforming to the syntax defined by the standard), then *any process* that creates a new document is an editor. The only question then is whether or not the editor is interactive or batch, which is really a question of user interface, not functionality. All editing languages must include a query language because you must be able to examine the properties of the objects the editor is manipulating, but I think that it is confusing to call an editing language a query language just because SQL is incorrectly called a query language. Or said another way: given a robust query mechanism, such as SDQL, it is possible to create an infinite number of editing languages that provide the appropriate interaction and convenience characteristics needed for a particular editing application. When the tasks of querying and editing are kept separate, it becomes clear that it is not necessary to bind them together (although doing so may have advantages in some environments). Thus, the argument that SDQL is insufficient for complete XML processing and is thus not useful misses the point that what was asked for was not a query language at all, but an editing scripting language, which SDQL is not. However, SDQL could be of service to any number of scripting languages by providing a ready-made syntax and set of semantics that can be used directly. I can easily imagine creating a simple set of DSSSL expression language functions that provide the grove manipulation actions needed: delete node, add node, set property, delete property. Implementing these would be easy enough to do once you had code that managed groves (i.e., a DOM-based read-write browser), which we have in both Netscape and IE4 and will likely have in SGML/XML editors in the near future. Cheers, Eliot -- <Address HyTime=bibloc> W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com </Address> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Mon Nov 17 17:49:19 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML Message-ID: <3.0.32.19971117113804.00bd0010@swbell.net> At 04:20 PM 11/15/97 -0500, Paul Prescod wrote: >Graydon Hoare wrote: >> What would it mean to take a form >> flow object and render it through a TeX backend? The "interactive" nature >> is gone. What happens to a combo-box? > >About the same as the printed rendition of a link or scroll flow object. >It would be completely useless. Stylesheets are tied to a particular >medium. Online stylesheets should have elements (link, input, scroll) >that allow interactivity and print-oriented stylesheet languages should >have elements that describe pages etc. There are many very useful static representations of forms, not least of which is to document the design thereof. My first exposure to SGML was writing a process to generate printed specifications for an online application of several 100 (if not thousands) of interactive panels, all created in SGML using a now-defunct language IBM developed for use in OS/2 (it may still live in CICS, I'm not sure--it was also used there for a while). Because the documents that defined the panels included references to variables, described branching and control structures, and on so, I was able to generate both pictures of the panels (using character-based graphics, no less) and generate lots of information about the panels. By doing this, we eliminated the need to do screen snaps to document the panels, which we estimated saved a minimum of two calandar weeks per rev of the spec (that being the amount of time it would take to make the snaps and assemble the document). Likewise, hyperlinks can be represented in print in any number of ways (witness the SGML handbook). The interactivity of hyperlinks is not what distinquishes them, it is the relationship they represent. There are many ways to present and make useful such relationships, of which interactive traversal is only one (and not necessarily the most useful). Cheers, E. -- <Address HyTime=bibloc> W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com </Address> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Mon Nov 17 18:08:36 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML Message-ID: <3.0.32.19971117115001.00bc8a74@swbell.net> At 12:17 PM 11/17/97 -0500, Joe Lapp wrote: >"Martin Bryan" <mtbryan@sgml.u-net.com> wrote: >>This presumes that the "document" is the thing you want to remove. What if: >> >>a) the document was built from a set of entities? >>b) only part of the document consisted of updatable data fields? >> >>The key factor is "what proportion of the data needs to be modified?" >I think I've just discovered that we are both arguing for the same >thing. My point is exactly that the _document_ is not the smallest >unit we care to change. I just meant to point out that because we >care for finer granularity, and because currently no standard exists >for updating at arbitrary granularity, we need a standard. A standard *does* exist for defining the objects you might want to update: the SGML property set (possibly reflected through the DOM). Given this definition, defining operations on it is a simple matter of programming. Or said another way, you don't need a standard for the control language (although it's useful to have one) if you have a standard for the data model to be controlled. Cheers, E. -- <Address HyTime=bibloc> W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com </Address> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Mon Nov 17 19:04:04 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML In-Reply-To: <01bcf373$8dcc8230$4a15b5cd@sim.allaire.com> Message-ID: <3.0.3.32.19971117140421.0095ab80@pop.access.digex.net> "Simeon Simeonov" <simeons@allaire.com> wrote: >[...] >I would speculate that, if a standard does not emerge by the next browser >releases, vendors will move to provide their own query mechanisms. Why? >Because they would like to make the consumption of arbitrary XML from within >HTML as easy as possible. (IE4 DSOs are a move in the right direction.) I think there is another reason why we wouldn't need to wait for the DOM spec to complete. I'm preparing an argument for why DOM cannot be extended to do the job given the way it is currently architected. Look for my upcoming post on the subject (I'm not done yet). -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Mon Nov 17 21:13:02 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML References: <3.0.32.19971117115001.00bc8a74@swbell.net> Message-ID: <3470B42D.3A1FB5A9@technologist.com> W. Eliot Kimber wrote: > Or said another way, you don't need a standard for the control language > (although it's useful to have one) if you have a standard for the data > model to be controlled. I would think that there are major optimization benefits to having a standard query language. Each database vendor can take a complete query describing a node and choose the quickest way to find the node, vs. passively waiting for each query component (e.g. get this node-list....now reverse it...now find the first node of type element....now check its GI etc. etc.). Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From bmhughes at ozemail.com.au Mon Nov 17 21:58:01 1997 From: bmhughes at ozemail.com.au (Baden Hughes) Date: Mon Jun 7 16:58:55 2004 Subject: ot.xml Message-ID: <3.0.1.32.19971118083440.006cc664@ozemail.com.au> Can someone tell me where I can pick up ot.xml ? Thanks Baden xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Mon Nov 17 22:00:42 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML Message-ID: <1.5.4.32.19971117220012.00a49368@pop.mindspring.com> I'm not sure whether I understand the range of things that will be queried. I would think that we would want to be able to do queries of at least the following kinds: 1. Queries of non-markup data, with the goal of creating mark-up from databases, e.g. relational or object-oriented databases. The end result is to return a grove, but there's a lot that has to be defined in-between. 2. Full-text searches which return groves. 3. Structured document queries which return groves. There's an interesting discussion of queries that need to be supported here: "http://www.ceth.rutgers.edu/programs/TEI97/SESSIONS/GREGORY/search.sgm.html" Standard database query languages like OQL and SQL are not very useful for queries of type 3 unless we know the actual names of the data structures used in a particular implementation. For instance, in a relational database, what are the names of the tables and columns that must be used to create a query for a given document structure? Standard database query languages like OQL and SQL do not have full-text search operators to allow them to do queries of type 2, though some people have defined full-text operators as extensions of such languages. When it comes to the return type for such a query, we have the same problem mentioned in the previous paragraph. I don't know much about SDQL. It is part of the DSSSL standard - is it scheme based? Is it procedural? Is it based on SGML/XML document structure? Can it be used for queries of types 1 and 2? Jonathan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Mon Nov 17 22:40:57 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:55 2004 Subject: Query Languages for XML Message-ID: <3.0.32.19971117163812.00bd28c0@swbell.net> At 05:00 PM 11/17/97 -0500, Jonathan Robie wrote: >I don't know much about SDQL. It is part of the DSSSL standard - is it >scheme based? Is it procedural? Is it based on SGML/XML document structure? >Can it be used for queries of types 1 and 2? SDQL is simply that part of the larger DSSSL expression language that enables the accessing of properties of nodes in groves and the navigation of groves. It uses the same syntax as the rest of DSSSL, that is a Scheme variant. It is based on the basic grove data model (nodes and their properties) but has some built-in functions related to SGML (e.g., "gi", "att-string", etc.). All the built-in functions are or can be defined in terms of primitives (e.g., node-property). It includes some basic string-matching functions but does not attempt to provide any sort of complete full-text facility (which would be outside the stated scope of DSSSL in any case). Note, however, that the syntax is largely arbitrary: what's important are the semantics of grove access. Thus, you can expect XSL to include the functional equivalent (more or less) of SDQL even though it may provide an alternative syntax. Cheers, E. -- <Address HyTime=bibloc> W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com </Address> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From bmhughes at ozemail.com.au Tue Nov 18 00:48:28 1997 From: bmhughes at ozemail.com.au (Baden Hughes) Date: Mon Jun 7 16:58:56 2004 Subject: ot.xml In-Reply-To: <199711172204.OAA01491@mehitabel.eng.sun.com> Message-ID: <3.0.1.32.19971118114608.006ebf54@ozemail.com.au> >I moved a copy over into a directory where you can get it: ... >Perhaps someone can mirror a copy down in your part of the world. Thanks for Murray Altheim for his reply ... For those interested, the ot.xml file is now also online at: http://fdnet.com.au/bmhughes/otxml.zip Baden xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Tue Nov 18 01:37:37 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:56 2004 Subject: ot.xml In-Reply-To: <3.0.1.32.19971118114608.006ebf54@ozemail.com.au> (message from Baden Hughes on Tue, 18 Nov 1997 11:46:08 +1100) Message-ID: <199711180136.RAA14414@boethius.eng.sun.com> [Baden Hughes:] | >I moved a copy over into a directory where you can get it: | ... | >Perhaps someone can mirror a copy down in your part of the world. | | Thanks for Murray Altheim for his reply ... | | For those interested, the ot.xml file is now also online at: | | http://fdnet.com.au/bmhughes/otxml.zip I released that file into the world a long time ago, so I have no legal claim over it, but as a courtesy I would appreciate it if people would keep the set together. I went to the trouble of marking up the Old Testament, the New Testament, the Book of Mormon, and the Quran because I did not wish to be associated with a project that preferred the scriptures of any particular religion over those of any other. At the time (1992) I could not find any other scriptures in electronic form, or I would have included them as well. I still feel this way. If you think that I have contributed something useful, you would be doing me a favor if you distributed only the entire set, which can be found at http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/religion.1.02.xml.zip along with its mate, http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/shakespeare.1.02.xml.zip Thanks. Jon xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From bmhughes at ozemail.com.au Tue Nov 18 04:57:29 1997 From: bmhughes at ozemail.com.au (Baden Hughes) Date: Mon Jun 7 16:58:56 2004 Subject: [2] ot.xml In-Reply-To: <199711180136.RAA14414@boethius.eng.sun.com> References: <3.0.1.32.19971118114608.006ebf54@ozemail.com.au> Message-ID: <3.0.1.32.19971118142939.00690024@ozemail.com.au> Thanks to Jon for his contribution with regard to text markup and his recent followup note ... As per Jon's request, the entire file set of religion.1.02.xml.zip can be found at: http://fdnet.com.au/bmhughes/religion.1.02.xml.zip Baden xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Tue Nov 18 08:18:02 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:58:56 2004 Subject: Data manipulation languages for XML (was Query Languages ...) In-Reply-To: <3.0.32.19971117115001.00bc8a74@swbell.net> Message-ID: <WWAl1CAS5Uc0Ewr7@light.demon.co.uk> In message <3.0.32.19971117115001.00bc8a74@swbell.net>, "W. Eliot Kimber" <eliot@isogen.com> writes >>[Joe Lapp:] >>I think I've just discovered that we are both arguing for the same >>thing. My point is exactly that the _document_ is not the smallest >>unit we care to change. I just meant to point out that because we >>care for finer granularity, and because currently no standard exists >>for updating at arbitrary granularity, we need a standard. > >A standard *does* exist for defining the objects you might want to update: >the SGML property set (possibly reflected through the DOM). Given this >definition, defining operations on it is a simple matter of programming. > >Or said another way, you don't need a standard for the control language >(although it's useful to have one) if you have a standard for the data >model to be controlled. Said another way again: since we have a good, conceptually clear standard for describing the objects we want to update, we are well- placed to 'go the extra mile' and define a standard for updating those objects. May we return to SQL, as a precedent for the type of language Joe was originally asking about? SQL's primary purpose is to support the use and updating, of distributed database information, by multiple users, in real time. Surely that is a reasonable expectation for XML information, too? If so, we need mechanisms to specify changes to existing documents. I don't really buy the model that says that every change to an XML document produces a completely new document. You will certainly have a hard time selling that idea to an end-user who changes one word in a document, or to a database vendor who has to take back the complete document and work out for themselves what (if anything) has changed, in order to update the relevant nodes. Also, in the real world you need access control (c.f. GRANT in SQL). The very nature of XML documents means that this control needs to be at the node rather than the document level, if only to deal with entities. Also, you need to know which parts of the document you are allowed to change as you start editing - it is not good enough to be told some time afterwards that certain changes should not have been made! I agree that you can perfectly well define changes to an XML document via its representation as a grove, but this grove needs to be linked back to the physical objects that gave rise to it. For example, if you edit a phrase that happens to be within an entity that is referenced more than once within the document you are editing, then perform an UPDATE, in principle _all_ references to that entity should be updated. Richard Light. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Tue Nov 18 11:44:11 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:58:56 2004 Subject: Query Languages for XML References: <Ymq81JA+dVb0EwA5@light.demon.co.uk> <3.0.3.32.19971116143342.0095e950@pop.access.digex.net> Message-ID: <3471801E.6F148BD6@technologist.com> Joe Lapp wrote: > > Paul Prescod <papresco@technologist.com> wrote: > >[...] As I understand it, you can create a > >"HTML form element" flow object and an "HTML input element" flow object > >within it. [...] > >DSSSL has no provisions for adding flow object types in DSSSL code. So > >we are essentially talking about the DSSSL implementation language (Java > >or C, probably) [...] > > I'm not sure that this approach addresses the need to have a standard > mechanism by which (server-side) XML documents are updated. It certainly does not. It wasn't intended to. My point was merely that there is no real relationship between the need to be able to make form elements and other interactive elements (hyperlinks, collapsable trees) and the need to be able to make modifications to a document through a query language. They are both good ideas -- they are just not necessarily related ideas. We already have forms, and it doesn't require an updatable SDQL. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Tue Nov 18 14:55:55 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:56 2004 Subject: Data manipulation languages for XML (was Query Languages ...) In-Reply-To: <WWAl1CAS5Uc0Ewr7@light.demon.co.uk> References: <3.0.32.19971117115001.00bc8a74@swbell.net> Message-ID: <3.0.3.32.19971118095544.00955100@pop.access.digex.net> Richard Light suggests that we use the term "Data Manipulation Language" when talking about this query/edit language in order to avoid further confusion. In the computer security industry ("rainbow series" books), we use the term "access" to denote any kind of interaction with information objects. For example, we say, "Read access" or "Write access." The term "DAC" ("Discretionary Access Control") uses "access" in this sense to describe the security policy that may be in place to protect information objects. I like the term "Data Access Language" or just "Access Language" a bit more. This is partly because of my security background and partly because it is quite shorter. Besides, in my mind the term "manipulation" conjures images of editing and not querying. Just a suggestion. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Tue Nov 18 15:22:16 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:56 2004 Subject: Data manipulation languages for XML (was Query Languages ...) Message-ID: <3.0.32.19971118091934.009f7260@swbell.net> At 08:14 AM 11/18/97 +0000, Richard Light wrote: >If so, we need mechanisms to specify changes to existing documents. I >don't really buy the model that says that every change to an XML >document produces a completely new document. You will certainly have a >hard time selling that idea to an end-user who changes one word in a >document, or to a database vendor who has to take back the complete >document and work out for themselves what (if anything) has changed, in >order to update the relevant nodes. I think you're misunderstanding my use of the term "document" and my reference to the *abstract* processing model of DSSSL and groves, as opposed to how an implementation might work or how a user might perceive the result. By "document" I mean what SGML and XML mean by document: a character string conforming to the rules of the standard. Identity for documents is defined by no differences in the character string. If I change one character *I have a new document*. However, when using the term "document" to mean "an abstraction of a container for information", which is the usual everyday meaning of "document", then the document is not a new document, unless the user considers it to be one. Note the difference: I'm talking about the mechanics of data manipulation as related to the formal definition of SGML and XML, users are thinking about the abstractions of information creation. These are two different domains. For the purpose of thinking about standards for defining document processing, it is a very useful simplification to think of every change as creating a new *grove* (which, if used to generate an SGML or XML character string, would result in a new SGML or XML document). Obviously, in an implementation, you would probably not literally create an entirely new grove, but would simply modify the one you have and, presumably, remember the actions that transformed grove[0] to grove[1]. But that implementation approach doesn't change the truth of the abstract model, which is that grove[1] *is a different grove* from grove[0]. That's all I'm getting at. >Also, in the real world you need access control (c.f. GRANT in SQL). >The very nature of XML documents means that this control needs to be at >the node rather than the document level, if only to deal with entities. Not a problem. Remember that we're talking about *editing* here, which *can only happen* on groves, which consist of nodes, which can therefore be individually locked if your editor provides that function. There is nothing in the definition of groves or the DSSSL expression language that precludes node-level access control within an editor. That's an editing issue, which is outside the scope of SGML, XML Lang, or DSSSL (as they are only data representation languages, not editor specifications). >Also, you need to know which parts of the document you are allowed to >change as you start editing - it is not good enough to be told some time >afterwards that certain changes should not have been made! Again, not a problem as long as your editor provides some system for associating access policies with nodes, either directly (by addressing individual nodes) or by algorithm (e.g., elements in context). Again, this is an editor design issue, not a data representation issue. >I agree that you can perfectly well define changes to an XML document >via its representation as a grove, but this grove needs to be linked >back to the physical objects that gave rise to it. For example, if you >edit a phrase that happens to be within an entity that is referenced >more than once within the document you are editing, then perform an >UPDATE, in principle _all_ references to that entity should be updated. What's your point? A grove that includes information about the text entities used to organize it has enough information to correlate references to entities to their content. How could it be otherwise? A grove has to enable *complete* representation of the original document. In a complete grove (one that includes all the properties defined in the property set), the original document can be recreated byte for byte because the original document string is stored as part of the grove (using the so-called "markup" properties). I'm afraid I don't see how using groves as the fundamental abstraction for editing is inconsistent with satisfaction of any of the requirements. All that's needed on top of what DSSSL provides are functions that represent the editing actions needed (as opposed to modeling editing as a transform, which is probably not a useful approach). If SQL provides a useful model for defining such functions, we should use it. Cheers, Eliot -- <Address HyTime=bibloc> W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com </Address> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Tue Nov 18 20:15:42 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:56 2004 Subject: Three Access Language Paradigms Message-ID: <3.0.3.32.19971118151445.0093beb0@pop.access.digex.net> I have been thinking intensely about several issues these past few days, and I've been trying to put them all together into a coherent whole. So far I'm not succeeding, so I'm initiating a series of discussions to help me make sense of things. Here's the first... We would like clients to be able to remotely manage documents residing on servers. Clients need to be able to both query and edit those documents. This might be done via OMG CORBA interfaces, or it might be done via a human-readable query language. Whatever the mechanism, I'd like to call the mechanism a "document access language" or just an "access language" for purposes of this discussion. In this posting I explore three different access language paradigms. It seems to me that so far the W3C has focused on using DOM as the language by which clients remotely access documents. Under DOM, clients view documents through CORBA interfaces that make the document look like a tree of XML objects. Once the W3C has established all of the necessary interfaces, a client will have full control over a document's contents, subject to DTD and access control constraints. More recently, we have discussed possibly supplementing the DOM approach with a human-readable access language. A streamable access expression would be shipped to the server, and the server would provide a streamed response. Document content would have to transfer between client and server, and the form of the content would be constrained by the DTD that defines the document. The syntax of the human-readable language is undecided. It might be OQL or it be SDQL with extensions or it might be XML with embedded content. I'd like to present still another form of access language. This approach is based on a different way of thinking about documents. Instead of asking document repositories to look like XML documents to the external world, we only ask that the repositories speak XML with the external world. DTDs would be defined for the protocols that repositories might care to speak. The DTDs would define the structure of the protocol messages rather than the structures of documents. One repository might speak several protocols (e.g. 'Patient Records Protocol V.152' or 'Bank Transaction Protocol 2A'). If the repository were capable of containing arbitrary XML documents, the repository might speak a specific protocol called 'XML Document Protocol V.1.0'. Under the third approach, XML documents would appear less often as persistent repositories and more often as transient messages between clients and servers. It would still be necessary to define the base DTD for all of these protocols since one server port must be able to parse them all well enough to identify the protocol. It may even be possible to define the syntax for queries, insertions, and updates, so that the individual protocols have less inventing to do. Briefly consider the benefits of the third approach. The most significant benefit is that it completely frees the repository from having to conform to an XML object model. We could expose a legacy database to the world through one of the protocols with only a thin wrapper around the database. New databases could restrict the protocols they support and specialize their structures according to the kind of data they care to represent. They could be based on custom object-oriented schemas or relational schemas. This approach also lowers the entry level into the data repository server world. We could think of servers more as information warehouses than as virtual documents. The most significant drawback of this approach is that it doesn't give us a single access language. It probably gives us a different access language for each protocol. (Somebody please let me know whether this need not be so.) One of those access languages would be defined in the 'XML Document Protocol,' and this is the language that we have been looking for so far. Ideally, the access languages for all of the protocols would have the same syntactic substrate, so that the only new additions to each protocol would be elements that are specific to the information being represented. However, it is not immediately apparent to me that this will be possible. Yet, there are so many ways to represent data in XML and in other formats such as relational and persistent OO. The database vendor should not be constrained to use an architecture that will export the repository as something that looks like XML (such as DOM). For example, many different DTDs can be invented to represent a given set of data, and no standard should constrain a vendor to use a specific DTD for organizing the information. A standard should exist for how to query and update information and for how to represent the data of concern (e.g. patient records or transactions) -- that's what the DTDs should define. Hence, I came to the protocol proposal. Now it's time to talk about SQL and OQL. To a large degree these languages expose the representation underlying the database. SQL exposes tables and columns, while OQL exposes the persistent classes and their methods. These access languages are defined based on the schemas, so that once the schemas are defined, voila, so are the access languages. We save ourselves a lot of time. The SQL and OQL approach has one extremely significant drawback: compatible databases have identical schemas. Where are the clients that speak 'Patient Record Schema V.2.1,' and where are all the databases that are compliant with this schema standard? Everybody uses generic database backends, and no little guys can come in to compete by specializing for a given standard. If we had based these older query languages on protocols, it wouldn't have been much of a problem for object- oriented vendor X to come in and replace relational vendor Y's server implementation of a standard; there would have been no need to replace the clients. Shouldn't we be building that sort of flexibility into our new XML-compliant databases now, so that we will be able to accomodate tomorrow's unexpected architectures? I do not believe that it is necessary for an access language to expose the database's architecture. In our case, I do not believe an access language must assume that the database is architected in a way that allows it to appear externally as an XML document. It might be desirable to do this, since it could keep us from having to extend the query language for each protocol, but I do not think that it is necessary. It is only necessary that the client and the server agree on the structure and the meanings of messages sent between them. We ought not place constraints on our servers that need not be there. I think DTDs for persistent documents are going to be over-constraining. I have more issues to discuss regarding DOM and the required nature of an XML-document query language. Everything seems related to everything else, but I'll end this topic here just to get things started. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From lauren at sqwest.bc.ca Tue Nov 18 20:29:43 1997 From: lauren at sqwest.bc.ca (Lauren Wood) Date: Mon Jun 7 16:58:56 2004 Subject: Three Access Language Paradigms In-Reply-To: <3.0.32.19971116194534.0068a6b0@pophost.arbortext.com>; from "lauren" at Tue Nov 18 12:29:05 1997 Message-ID: <m0xXuGg-0009WiC@sqailor.sqwest.bc.ca> % From: Joe Lapp <jlapp@acm.org> % % % It seems to me that so far the W3C has focused on using DOM % as the language by which clients remotely access documents. % Under DOM, clients view documents through CORBA interfaces % that make the document look like a tree of XML objects. % Once the W3C has established all of the necessary interfaces, % a client will have full control over a document's contents, % subject to DTD and access control constraints. You should not confuse the use of OMG IDL to describe interfaces with requiring implementations to use CORBA interfaces. The DOM specification is quite clear that CORBA is not needed. OMG IDL is simply used as a language. >From the DOM spec: "The Object Management Group Interface Definition Language (OMG IDL) was chosen as it was designed for specifying language and implementation-neutral interfaces. Various other IDLs could be used; the use of OMG IDL does not imply a requirement to use a specific object binding runtime. " >From the DOM FAQ, at http://www.w3.org/DOM/faq.html "We expect that the DOM can be implemented using CORBA, COM, or Java Virtual Machine runtime bindings." Lauren -- Lauren Wood, SoftQuad, Inc. Chair, W3C DOM Activity Lauren xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gannon at commerce.net Tue Nov 18 21:17:17 1997 From: gannon at commerce.net (Patrick Gannon) Date: Mon Jun 7 16:58:56 2004 Subject: Three Access Language Paradigms Message-ID: <01BCF423.4B86F620@arrow-d23.sierra.net> Joe, A very interesting view you present. Let me comment from the Internet commerce view of similar research efforts that may overlap with some of your suggestions. ---------- From: Joe Lapp[SMTP:jlapp@acm.org] Sent: Tuesday, November 18, 1997 12:14 PM To: xml-dev@ic.ac.uk Subject: Three Access Language Paradigms Whatever the mechanism, I'd like to call the mechanism a "document access language" or just an "access language" for purposes of this discussion. In this posting I explore three different access language paradigms. Within the Information Access Portfolio of CommerceNet, we are also exploring methods to provide a common "language" and common "protocol" for web-based entities (browsers, agents, directories, registries, other catalogs) can access and exchange product information, whether that information is stored in a web "document" (HTML or XML) or in a database. I'd like to present still another form of access language. This approach is based on a different way of thinking about documents. Instead of asking document repositories to look like XML documents to the external world, we only ask that the repositories speak XML with the external world. DTDs would be defined for the protocols that repositories might care to speak. The DTDs would define the structure of the protocol messages rather than the structures of documents. One repository might speak several protocols (e.g. 'Patient Records Protocol V.152' or 'Bank Transaction Protocol 2A'). If the repository were capable of containing arbitrary XML documents, the repository might speak a specific protocol called 'XML Document Protocol V.1.0'. Under CommerceNet's eCo architecture, we see the use of marketplace "registries" that "know" about web objects (who they are, what products they make/sell, what data stucture they employ) and have access to the business rules and data mapping (possibly through DTDs) to provide "seamless" access to the source "documents" (i.e. product catalogs). A developing Common Business Language would define some of the protocols you are suggesting. For instance we are defining a Product Information eXchange (PIX) platform as a framework for how some of these protocols can be easily developed in an open, interoperable way. Your following suggestions are quite interesting and make some valid points in terms of learning from past efforts of developing query languages and underlying data structures. Under the third approach, XML documents would appear less often as persistent repositories and more often as transient messages between clients and servers. It would still be necessary to define the base DTD for all of these protocols since one server port must be able to parse them all well enough to identify the protocol. It may even be possible to define the syntax for queries, insertions, and updates, so that the individual protocols have less inventing to do. Briefly consider the benefits of the third approach. The most significant benefit is that it completely frees the repository from having to conform to an XML object model. We could expose a legacy database to the world through one of the protocols with only a thin wrapper around the database. New databases could restrict the protocols they support and specialize their structures according to the kind of data they care to represent. They could be based on custom object-oriented schemas or relational schemas. This approach also lowers the entry level into the data repository server world. We could think of servers more as information warehouses than as virtual documents. The most significant drawback of this approach is that it doesn't give us a single access language. It probably gives us a different access language for each protocol. (Somebody please let me know whether this need not be so.) One of those access languages would be defined in the 'XML Document Protocol,' and this is the language that we have been looking for so far. Ideally, the access languages for all of the protocols would have the same syntactic substrate, so that the only new additions to each protocol would be elements that are specific to the information being represented. However, it is not immediately apparent to me that this will be possible. Yet, there are so many ways to represent data in XML and in other formats such as relational and persistent OO. The database vendor should not be constrained to use an architecture that will export the repository as something that looks like XML (such as DOM). For example, many different DTDs can be invented to represent a given set of data, and no standard should constrain a vendor to use a specific DTD for organizing the information. A standard should exist for how to query and update information and for how to represent the data of concern (e.g. patient records or transactions) -- that's what the DTDs should define. Hence, I came to the protocol proposal. Now it's time to talk about SQL and OQL. To a large degree these languages expose the representation underlying the database. SQL exposes tables and columns, while OQL exposes the persistent classes and their methods. These access languages are defined based on the schemas, so that once the schemas are defined, voila, so are the access languages. We save ourselves a lot of time. The SQL and OQL approach has one extremely significant drawback: compatible databases have identical schemas. Where are the clients that speak 'Patient Record Schema V.2.1,' and where are all the databases that are compliant with this schema standard? Everybody uses generic database backends, and no little guys can come in to compete by specializing for a given standard. If we had based these older query languages on protocols, it wouldn't have been much of a problem for object- oriented vendor X to come in and replace relational vendor Y's server implementation of a standard; there would have been no need to replace the clients. Shouldn't we be building that sort of flexibility into our new XML-compliant databases now, so that we will be able to accomodate tomorrow's unexpected architectures? I do not believe that it is necessary for an access language to expose the database's architecture. In our case, I do not believe an access language must assume that the database is architected in a way that allows it to appear externally as an XML document. It might be desirable to do this, since it could keep us from having to extend the query language for each protocol, but I do not think that it is necessary. It is only necessary that the client and the server agree on the structure and the meanings of messages sent between them. We ought not place constraints on our servers that need not be there. I think DTDs for persistent documents are going to be over-constraining. I have more issues to discuss regarding DOM and the required nature of an XML-document query language. Everything seems related to everything else, but I'll end this topic here just to get things started. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) Patrick Gannon, Executive Director Information Access Portfolio, CommerceNet mailto:gannon@commerce.net http://www.commerce.net/services/portfolios/ ------------------------------------------------------ 865 Tahoe Blvd., Suite 211, Incline Village, NV 89451 702-831-2251 702-831-3925 (Fax) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Tue Nov 18 21:40:43 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:56 2004 Subject: Query Languages for XML Message-ID: <1.5.4.32.19971118214011.00a4b2b4@pop.mindspring.com> At 04:38 PM 11/17/97 -0600, W. Eliot Kimber wrote: >SDQL is simply that part of the larger DSSSL expression language that >enables the accessing of properties of nodes in groves and the navigation >of groves. It uses the same syntax as the rest of DSSSL, that is a Scheme >variant. It is based on the basic grove data model (nodes and their >properties) but has some built-in functions related to SGML (e.g., "gi", >"att-string", etc.). All the built-in functions are or can be defined in >terms of primitives (e.g., node-property). It includes some basic >string-matching functions but does not attempt to provide any sort of >complete full-text facility (which would be outside the stated scope of >DSSSL in any case). In the database world, what you describe would not be called a query language; at least, not if I understand you correctly. Certainly, something like SDQL is useful, but it doesn't seem to be a query language, nor does it seem to eliminate the need for a query language. I think we can learn something from the history of databases - and if we do, we will not be condemned to repeat this history! 1. Navigational databases (hierarchical and network) allowed complex data structures, including hierarchical structures, and used navigation to retrieve data. Indexes on certain fields could allow a kind of random access to records. Advantages: complex data structures possible, records always express their relationships to other records, good run-time efficiency. Disadvantages: dependent on physical format of records, dependent on the exact way that records are threaded together, minor changes in the database produced significant changes to the algorithms used to process them, difficult to write code for general-purpose queries, queries are dependent on the programming language used to implement them, query optimization is virtually impossible. Your description of SDQL makes me think that it is analogous to navigational databases, and would probably have these disadvantages: (A) query optimization is very difficult, because the query is procedural, and tells precisely how the data is to be retrieved - even if a particular repository or database has a faster way of retrieving the data, it can not do so, because the query tells how to find it, not what to find; (B) language dependence - there is no way to formulate a query string that will work for any implementation of SDQL, regardless of language (and for now, you have to formulate SDQL in scheme); (C) physical dependence - if the manner in which the data is structured changes, the algorithms no longer work. I'm not saying that SDQL isn't useful, I'm saying merely that it doesn't do what query languages do. 2. Relational databases introduced the concept of real query languages, and of logical independence - the operation of a database should not be dependent on its physical layout. Advantages: significantly easier to change and maintain databases, queries can be formulated as simple strings, query language is independent of implementation language, logical independence. Another, non-technical advantage is that an awful lot of the data we want to retrieve from databases is currently stored in relational databases. Disadvantages: logical independence only works as long as you *think* that everything is a two dimensional table, complex data structures can not be expressed (and SGML documents can not be managed efficiently using two dimensional tables), relationships are not supported directly and must be reestablished at run-time via primary/foreign key pairs, the results of a query do not always maintain the original relationships among data. Relational databases won't be a useful way to store structured documents, but they do contain lots of data that we might want to import into our structured documents. If we ignore relational databases, we are leaving out a lot of important functionality. 3. Object-relational and object oriented databases are fairly diverse, so I have to make some qualifications before I can say anything. The fundamental difference between object-relational and object-oriented databases has to do with persistence, a way of automatically storing programming-language objects; this is something that object-oriented databases do, and object-relational databases don't. More relevant for us is the underlying data model, which is very similar for SQL 3, object-relational databases like Illustra and UniSQL, or object-oriented databases like POET, O2, Versant, and the ODMG standard for object databases (I am intentionally omitting ObjectStore, which is largely a navigational database with object persistence). These databases combine the rich data structures of navigational databases with the logical independence and query languages of relational databases. Objects can have complex relationships or complex structure, and both the structure and relationships can be used as the basis for queries. Because hierarchical structures and their relationships are easily used in queries, this makes a lot of sense for SGML and XML documents. For instance, here is an OQL query that finds all SECT1 elements that have an ID attribute and at least one PARA sub-element: select e from e in SGMLElement, a in e.attributes, s in e.subElements where e.tagName = "SECT1" and a.tagName = "ID" and s.tagName = "PARA"; This kind of query is very useful - it can be understood fairly easily, the system that performs the query can make its own decisions about the most efficient way to perform such a query, and the query can explicitly reference subelements, reflecting the hierachical structure of SGML and XML. And fortunately, the major relational database vendors are also moving towards object-relational databases; soon, we will be able to do this kind of query in SQL-3. One SGML repository vendor has also added a fulltext operator to allow fulltext queries to be formulated as part of a structured OQL query - this is really cool because structured queries and fulltext queries can be combined in the conditions of a query. Another advantage of object databases is that the results are presented as a grove - when it is returned as part of a query, each element maintains its relationships to the other elements of the grove. Cool, eh? But there are also some problems here: a. There is no support for hierarchical queries or for transitive closure, a fancy term for "if you keep going this way, you get there eventually". It is nice to be able to say that you want SECT1 elements that have at least one PARA element somewhere below them, or ask for those elements which have ID attributes and which are somewhere below some particular element. Some research database systems like semantic network databases have supported these kinds of operations, but they are not widespread. b. The form of the query depends on the data structures used to implement the database. I modified the names for my query to make them friendly - no real repository would allow you to use exactly those names. On the other hand, it might not be unreasonable to create standard names to describe the grove structure, specify how queries can be created using those names, and have individual vendors map this abstraction onto their own implementations. 4. Some SGML databases have an SGML aware query syntax that is non-procedural. I am thinking particularly of Texcel and LT-XML, which have similar query languages. For instance, here is a Texcel query that finds title elements with a parent of section with an ancestor of appendix whose type attribute is "informational" and that has a descendant of introduction: title { -- section { -* appendix { type = "informational" && +* introduction }}} This query language, like LT-XML's, directly supports hierarchical queries and transitive closure, and is designed to support queries on SGML and XML documents. It is non-procedural, setting no constraints on the system that will implement the query or the language to be used to carry it out. It would be interesting to add fulltext operators to a language like this. As I understand it, DSSSL/SDQL could be used fairly easily to implement queries designed in a query language like this. I would think that solutions like this might be useful for queries on SGML/XML documents, fulltext searches, and queries that combine the two. This does *not* address the need to use data from non-document databases to create markup, e.g. to bring data from relational or object-oriented databases into a dynamic document. I apologize for the length of this document - I hope it contains enough useful information to be worth reading. Jonathan ________________________________ Jonathan Robie Email: jonathan@texcel.no Texcel Research, Inc. ("http://www.texcel.no") xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Tue Nov 18 21:57:56 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:57 2004 Subject: Three Access Language Paradigms Message-ID: <1.5.4.32.19971118215738.00aaefdc@pop.mindspring.com> At 03:14 PM 11/18/97 -0500, Joe Lapp wrote: >I'd like to present still another form of access language. >This approach is based on a different way of thinking about >documents. Instead of asking document repositories to look >like XML documents to the external world, we only ask that >the repositories speak XML with the external world. DTDs >would be defined for the protocols that repositories might >care to speak. The DTDs would define the structure of the >protocol messages rather than the structures of documents. >One repository might speak several protocols (e.g. 'Patient >Records Protocol V.152' or 'Bank Transaction Protocol 2A'). >If the repository were capable of containing arbitrary XML >documents, the repository might speak a specific protocol >called 'XML Document Protocol V.1.0'. This is an interesting idea, and would allow queries to be defined in an SGML/XML-aware syntax. For instance, if we want to get "billables" from a patient record system, we could ask an external system like a relational database for this information using a query defined in an SGML-aware language: billable { patient_id = 7537053 } The external system would have to have a mapping between the DTD structure that defines the abstract model for this protocol and the internal data structures used on that particular system. In this case, it would have to know what a "billable" is, where to find it, and how to find those "billables" that belong to the patient with this particular patient id. Offhand, this seems like a reasonable amount of effort to ask people to do in order to interface their databases to document management systems. On the repository side, one query could be used to support any external system that uses this particular DTD, and general-purpose techniques could be used to manage any virtual document. On the database / external system side, each DTD abstraction would be a separate programming project, but I don't really see any way around that. I'll have to think about it, but at first blush, I like it. Jonathan ________________________________ Jonathan Robie Email: jonathan@texcel.no Texcel Research, Inc. ("http://www.texcel.no") xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From bernerd.anderson at exchange.pnl.gov Tue Nov 18 22:05:49 1997 From: bernerd.anderson at exchange.pnl.gov (Anderson, Bernerd J) Date: Mon Jun 7 16:58:57 2004 Subject: HELP - XML to Oracle (and back)!; Thanks! Message-ID: <7A8CF1DC6A9DD0118EA400A024BF29DA0121F047@pnlmse2.pnl.gov> All - Just wanted to say 'Thank You' for all of the response that you all have collectively provided to the questions that I originally posted on this server. I haven't found exactly what I'm looking for yet, but at least feel that "I'm in the ballpark". Thanks also for suggesting resources to help me with the XML learning curve! Best regards, Bern Anderson (509) 375-2483 * bj.anderson@pnl.gov Battelle Pacific Northwest National Laboratory P.O. Box 999, MSIN: K7-63, Richland WA 99352 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ddb at criinc.com Tue Nov 18 22:37:21 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:58:57 2004 Subject: Three Access Language Paradigms Message-ID: <3.0.32.19971118143751.00a76eb0@mailhost.criinc.com> At 03:14 PM 11/18/97 -0500, Joe Lapp wrote: >We would like clients to be able to remotely manage documents >residing on servers. Clients need to be able to both query >and edit those documents. and later, >This approach is based on a different way of thinking about >documents. Instead of asking document repositories to look >like XML documents to the external world, we only ask that >the repositories speak XML with the external world. DTDs >would be defined for the protocols that repositories might >care to speak. The DTDs would define the structure of the >protocol messages rather than the structures of documents. >One repository might speak several protocols (e.g. 'Patient >Records Protocol V.152' or 'Bank Transaction Protocol 2A'). >If the repository were capable of containing arbitrary XML >documents, the repository might speak a specific protocol >called 'XML Document Protocol V.1.0'. I am not sure that the term "document" is clearly defined for your usage. The problem I see is that a user might change a "Document" which then affects a number of other "Documents" because they are all just abstracted views of a database. This breaks my own intuitive definition of "document" which I would normally use to interpret your first statement. There are no "Documents residing on servers", but only documents which are generated as part of a interchange protocol. Or do you mean that there are documents (A) (which may not be XML) and then there are XML "documents" (B) which are generated as part of the protocol to interchange the documents (A)? I am in complete agreement about the use of XML for information interchange. XML helps solve a number of the problems which CORBA users are facing currently, esp. in situations demanding high levels of information content in each query. CORBA is great for a simple (to formulate & express) query which a server has to think hard about and can eventually deliver a simple (to express) answer. XML is excelent for situations where either the query or the responce is not so easily simplified, and structured data interchange is desired. An example I have worked on is that we have a java applet which presents an expandable tree view of some data. The full tree is _very_ large, and the network connection may not be fast, so we deliver segments of the tree, using XML, to the applet, as requested. Thus the user does not need to wait for information they do not need. In a production system the server could analyse the network connection to determine aproximate-optimal packet sizes. CORBA is horrible for this type of thing, relative to our implementation, since we can deliver N nodes of a tree-graph in one network transaction, while CORBA would require 1 transactions for each node. (yes there are work-arounds, but the XML solution is the simples, and most versatile I have seen yet.) Another thing which XML solves when used as a protocol is the problem of adding information to an existing protocol without breaking existing implementations. This is a serious concern. Try and load a Word97 document into Word95 and you will have a number of problems. Same with different versions of PDF. With NAMESPACES, or some carefull DTD and implementation design, it is possible to use XML so that this is no longer a problem. For example, you have a NAME field, which is currently interchanged via a NAME element like this: <!ELEMENT NAME - - #PCDATA> If the implementations are designed to ignore unknown element tags, then you might have (in the next version of the protocol) <!ELEMENT NAME - - ((FNAME, LNAME) | #PCDATA)> <!ELEMENT FNAME - - #PCDATA> <!ELEMENT LNAME - - #PCDATA> which can handle to the old protocol format, and a new format with more "meta-info". This strongly appeals to me, since this not only applies to protocols but configuration info, etc.. the applications are virtually unlimited. Suddenly I can share information amonst tools without having to succumb to the least-common-denominator problem. With regard to the protocol issue, we now have a MIME-ish thing with extensibility! So the point of my responce, is that some of the ideas in your original most strike a significant cord with my own ideas, but that the language for a discussion of these topics is not clear. There is also an problem that the real requirments for what you (Joe) are trying to do are extremely fuzzy at this point. A clearer language for talking about this is needed (clarify some terms) and the requirements of what you are trying to do need to be specified more clearly. -derek Derek E. Denny-Brown II || ddb@criinc.com "Reality is that which, || Seattle, WA USA when you stop believing in it, || WWW/SGML/HyTime/XML doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Nov 18 23:00:36 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:57 2004 Subject: <?XDEV?> and BEHAVIOR: a meta-proposal and a proposal In-Reply-To: <v03007801b09685590445@[205.181.197.113]> References: <3.0.1.16.19971115184507.1fafd4be@pop3.demon.co.uk> <3.0.5.32.19971114094153.00917d30@pop.access.digex.net> Message-ID: <3.0.1.16.19971118234002.332fb736@pop3.demon.co.uk> At 12:28 17/11/97 -0500, David G. Durand wrote: >I want to respond to the "meta-proposal" a bit, because I disagree with >some of the axioms, and the proposed procedures. I don't have time or >energy right now to respond to the specific proposal, though I may well do >so later (based on my own, somewhat divergent, axioms). ^^^^^^^^Good!^^^^^^^^ They may be preferable to mine :-) > >At 6:45 PM -0000 11/15/97, Peter Murray-Rust wrote: >><LIST> >> <AXIOM>In any of these cases there is no general solution acceptable to >>everyone >> </AXIOM> >> <AXIOM> If no attempt is made to address these problems we shall either >>end up with a Babel of incompatible solutions, or wait feebly for some >>powerful autonomous entities to dictate a limited set of actions. >> </AXIOM> > >Not necessarily. In fact, for many problems the correct response is to >ensure that the stylesheet ans processing specification langauges can >_implement_ each of the specific solutions desired, _without_ forcing the >specific solutions on whihc divergence of opinion may exist. More on this >with the "PI" axioms. I have thought about this and have taken it to heart. I agree that stylesheets are usually preferable to PIs. I shall therefore mentally look for stylesheet-based solutions (or attribute-based solutions) before PI-based solutions. The XSL stylesheet proposal is still very young. I have read the proposal carefully and tried to understand how to implement it - I've got about halfway. the problem (for me) is that it is very paper-oriented (or paper-like screen displays) and doesn't easily have a mechanism of implementing BEHAVIOR (which an XLL processor should have). It also doesn't specify anything about transformation (i.e. XML2XML). It will also be a little while before it's out fully. > >> <AXIOM> We have to be careful to avoid the 'only processable with >>software X' syndrome</AXIOM> > >Yes. The way to do this is to _avoid_ PIs as much as possible. PIs that are >required to interpret a document correctly are _inherently_ >anti-portability, since the rule for PIs is that _any application_ should >be free to ignore them without changin the meaning of the document. The use >of SGML's PI syntax in XML is a not a good model for the use of PIs in >general, since they are being used in XML as a syntactic "escape hatch" for >compatibility with SGML. It would not be necessary (or desirable) if XML >were not (to some very small extent) changing SGML facilitied (as with >specifying the character encoding of entitites in PIs, rather than an SGML >declaration). > >If XML had been able to add declarations to SGML, that would have been done >instead of using the PI syntax. If I understand this, you are saying that PIs are required to get XML to work (e.g. <?XML?>, namespace etc.) but they are too dangerous for normal mortals. I can go along with that view, but the spec-authors should restrict the PI-targets to XML. The message the current spec gives is: - Here are PIs. Use them if you want. What (perhaps) they should say is: - PIs should be reserved for things we (the XML-WG) can't do in XML any other way. Using them otherwise can seriously damage your readers' health. > >> <AXIOM> There is a critical mass of readers of this list who feel the >>need to address the problem. </AXIOM> > >Without a problem statement I'm not sure how to judge this, but it may well >be true. > >> <AXIOM> Anyone can use any PIs they like in their documents for whatever >>purposes they like without breaking the spirit of XML. </AXIOM> > >This is assuredly incorrect. PIs are intended for use in the case where a >practical _use_ of a document with _particular software_ requires >additional information that _should not_ have been indicated ina structural >description of the content. A paradigmatic example is the occasional need >to insert a page or column break in order to get acceptable formatting in a >particular processing situation (including: software, stylesheet, output >device). This is not information that _should_ be encoded in the abstract >representation of a document, but _may be essential_ for "getting the thing >to print right". Understood. Even in TeX you have to fudge it occasionally. But quite a lot of XML applications won't have any pages. IMO XML is not yet prepared for the non-document-oriented applications. We shall want to do other things with XML documents than read them. :-) Stylesheets are very highly oriented to typesetting on paper. > >> <AXIOM> That processing software need not (and so far won't) take any >>notice of these (or perhaps any) PIs >> </AXIOM> > >This is certainly essential. If you are saying something about you document >that you can imagine being useful to some software that you aren't using >right now -- then it should probably be in the markup. PIs are for things >that can be ignored without changing the interpretation of a document. Yes. Actually that was true of my proposals as well. The PI was modifying the production of porridge. If the document had gone to a Postscript formatter instead, it wouldn't have changed the meaning of the document, just not cooked any porridge. > >> <AXIOM> If a few people find a way of doing something that works for >>them, and isn't against the spirit of the XML specs, then flaming their >>ideas is pointless.</AXIOM> > >Even this is not necessarily true -- attacking the dissemination of false >or bad ideas is _never_ pointless, in that dissemination of bad information >(even if it serves a local porpose adequately well) can seriously mislead >people. For instance the use of styles in word-processing programs is >usually a very good idea. The fact that in some instances direct formatting >may work out, or even work better, should not stop people from quarreling >with public assertions about the utility of stylesheets based on those >situations. > >To the extent that these axioms seem to be intended to rule out >disagreement of the merits of future proposals, I must take immediate and >strong exception to them. It's not possible for a responsible discussant >who disagrees with a public proposal of working practice to remain silent >on the topic. "Flaming" is usually not responsible discussion, but >principled disagreements should be expressed so that the issues are clear >to all. Good. The axiom might benefit from revision or deletion - we'll see... > >></LIST> >><NOTE>The proposal I really want to address is, like Month Python's joke, >>so potentially dangerous that I dare not reveal it yet. The proposal here >>is also important to me - perhaps to others - and I hope servers as a >>useful example. It is NOT in a finalised form, but as can be seen from the >>meta-proposal, there is a method for referring to the a 'pseudo-final' form >>that is, at least, usable. >></NOTE> > >This makes me nervous Wasn't meant to. I am more nervous of implementations which take place without any discussion at all. > >><META-PROPOSAL> >>That a PI of the form <?XDEV?> is 'reserved' by members of this list for >>PI-based proposals on this list. [We cannot use XML-DEV as 'XML' is rightly >>reserved.] > >We can certainly do this -- but as I said above, there are good reasons to >oppose the use of PIs for _any_ use that affects the semantics of >documents. <REVISION> That the characters XDEV be used in places such as Attribute names, values, elements, namespaces and (in the last resort) PIs where they serve to clarify the semantics by referring to discussions on this list </REVISION> > >For example, even the proposed namespace PI would be vulnerable on this >account, except for the facts that: > > 1. It's intended for use in _experiment_ with a proposed _extension_ of >XML. (In other words, the PI, should it be generally accepted for use with >all interested XML applications, would become part of XML). Understood. And I am experimenting with it. It does great things for me and JUMBO. > > 2. The prefix can be processed (and thus, the semantic information >accessed) _without_ software having to be aware of the namespace PI. In >other words, the PI can be treated as equivalent to a comment describing >the proposed intent of the tags that share a prefix. (In other words, you >can ignore the namespace PI, and still detect the semantic distinctions in >the document) I don't understand this. My understanding of the namespace proposal is that: <?xml:namespace HREF="foo.org/bar.xml" AS="FOO"?> identifies a namespace FOO used as FOO:xyz in certain names, etc. The HREF points to a 'schema' for some undefined purpose. When a processor (not a parser) finds an element of type <FOO:plugh/> it can: - treat it as semantically void - realise from the PI that bar.xml might say something useful about it (this is what JUMBO does) - realise that it knows privately about the FOO namespace and looks up FOO:plugh - matches the action with FOO:plugh is a stylesheet If you are treating the PI as a comment, but relying on a stylesheet, why use the PI at all. (JUMBO uses the PI, because it can't use stylesheets for some of the things it wants to do). >In the long run this may (or for a number of reasons may not) be true. >However, bad ideas that are initially plausible but unworkable in the long >term (e.g., from a related, but different doamin, the creation and >management of large structured information cropora in raw HTML) would get >an artificial (and community-harmful) boost if an effective social >convention forbidding disagreement were in effect. Perhaps. The idea was to create spaces on this list where people with a common vision can devise approaches to which they can make semantic reference. At present I'm asking whether that's a good idea. If enough people think it is, then I would hope the discussion would be ignored by those not interested. Those who object to it can start their own parallel discussion - no harm in that. As you can see - and I'll elaborate later - I think there is virtue in trying out new ideas in public, even if they have potential flaws or limitations. HTML is a good example; it was designed to be tolerant of broken systems, implemented to be even more tolerant. Even in XML, everything will not be gold plated. > >I agree that polite, reasoned disagreement is better than flaming >(impolite, ad-hominem disagreement) but in the intellectual world the unfit >perish faster under the lash of criticism. We'll find somewhere in the middle :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Nov 18 23:01:22 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:57 2004 Subject: <?XDEV?> and BEHAVIOR: a meta-proposal and a proposal In-Reply-To: <199711160305.OAA25372@jawa.chilli.net.au> Message-ID: <3.0.1.16.19971118222745.3c0f82e0@pop3.demon.co.uk> At 14:09 16/11/97 +1100, Rick Jelliffe wrote: > > >> From: Peter Murray-Rust <peter@ursus.demon.co.uk> > >> <PROPOSAL> >> That two attribute values for XML-LINK's BEHAVIOR attribute be recognised >> through an XDEV PI: >> BEHAVIOR="DisplayStandAlone" >> BEHAVIOR="DisplayInContext" >> That for the second option an additional attribute CONTEXTREF is required, >> whose value is a valid URL and points to the XML element providing the >> display context of the current element. >> The actual details of display are application (and possibly stylesheet) >> dependent. >> </PROPOSAl> > >Another approach might be to use the name prefix XDEV: on attribute >values, e.g. > > BEHAVIOUR="XDEV:DisplayStandAlone" > >and the contextref attribute you suggest, e.g. > > BEHAVIOUR="XDEV:DisplayInContext" > XDEV:CONTEXTREF="saltmines.xml#DESCENDANT(1,ORGCHART)" > Rick, I have now realised (forgive my slow thinking) that this provides exactly what is needed and I was too hasty in my earlier reply. As you say, the attribute value simply needs to be unique and the XDEV: mechanism provides that (to a certain extent). It can even be linked to a namespace if that is allowed when the namespace proposal is finalised. So, in the first proposal there is no need for PIs. Rick's suggestion meets my needs, so it can be bolted in very easily. The result is that an XML-LINK-aware processor may, but need not, recognise BEHAVIOR attribute values prefixed by XDEV, and one or more additional attributes with names prefixed by XDEV. An XDEV-unaware processor will give a graceful message saying it doesn't understand the XDEV: attribute and the BEHAVIOR value. [At present it will say it doesn't understand *any* BEHAVIOR values except by private negotiation, because none have been suggested. I'll write more later...] This could be a good time for those more expert than me to suggest BEHAVIOR values. [I have asked at regular intervals whether the XML-LINK attributes would have suggested values (i.e. for ROLE, BEHAVIOR and more guidance on CONTENT-ROLE, etc.) I think the current idea is to keep it semantically neutral. That's why I'm raising it here... An XDEV-aware processor will be able to do lots of wonderful things with the BEHAVIOR values... especially when coupled to equipment... David, You are rightly concerned about the meta-proposal - I'll reply in more detail, but say that PIs are now not an essential part of the meta-proposal (though they may be required sometimes). Your comments are very useful and I will certainly make sure that I stress standard mechanisms (stylesheets, for example) where possible. [I am trying to code them into JUMBO, but am still trying to work out how closely they are coupled to a page-like output or whether they can be used more generally.] I do not think that stylesheets can do everything, although if XSL included a transformation language that might help in some places. I shall not unleash the Monty Python proposal until we have addressed the meta proposal a bit more. :-) Cheers, P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Tue Nov 18 23:04:47 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:58:57 2004 Subject: Three Access Language Paradigms In-Reply-To: Jonathan Robie's message of Tue, 18 Nov 1997 16:57:38 -0500 References: <1.5.4.32.19971118215738.00aaefdc@pop.mindspring.com> Message-ID: <f5bpvnxztvf.fsf@cogsci.ed.ac.uk> Um, why doesn't XLL address all the goals of this thread and then some? ht -- Henry S. Thompson, Human Communication Research Centre, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.cogsci.ed.ac.uk/~ht/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Wed Nov 19 01:49:26 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:57 2004 Subject: Three Access Language Paradigms In-Reply-To: <m0xXuGg-0009WiC@sqailor.sqwest.bc.ca> References: <3.0.32.19971116194534.0068a6b0@pophost.arbortext.com> Message-ID: <3.0.3.32.19971118204952.00968c30@pop.access.digex.net> lauren@sqwest.bc.ca (Lauren Wood) wrote: >You should not confuse the use of OMG IDL to describe interfaces >with requiring implementations to use CORBA interfaces. The DOM >specification is quite clear that CORBA is not needed. OMG IDL >is simply used as a language. Thanks for the correction. Shows you how much I know about CORBA. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Wed Nov 19 02:04:02 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:57 2004 Subject: Three Access Language Paradigms Message-ID: <1.5.4.32.19971119020338.00ab71dc@pop.mindspring.com> At 11:04 PM 11/18/97 +0000, Henry S. Thompson wrote: >Um, why doesn't XLL address all the goals of this thread and then >some? If I remember what I learned in high school rhetoric, I think the burden of proof is on the affirmative! Jonathan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Wed Nov 19 03:04:43 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:57 2004 Subject: Three Access Language Paradigms In-Reply-To: <3.0.32.19971118143751.00a76eb0@mailhost.criinc.com> Message-ID: <3.0.3.32.19971118220503.0095a4f0@pop.access.digex.net> Derek Denny-Brown <ddb@criinc.com> wrote: >I am not sure that the term "document" is clearly defined for your usage. I guess I did overuse and under-define the word 'document.' I'll try to convey what I intended to mean. I understand that pretty much any well- formed construct can serve as a document in XML. I also understand that we might want to talk about a single document that consists of multiple documents that are linked together. However, in my post I intended the word 'document' to mean a single XML file or any system that makes itself appear as if it were analogous to an XML file, such as a database that exposes DOM IDL :-) interfaces. That's the meaning I was using, though I realize that it's probably not the best definition to work with. In light of your response, I see that this is kind of a constraining definition. However, I think the language of my posting can be amended so that it still has general applicability. The word 'document' might be taken in its most general sense, so that it applies to anything you might think of. Next, everywhere I talk about the DTD of the document, we'd have to modify that to talk about the set of DTDs and structure of links by which documents of those DTDs are intended to be linked. >[...] There are no >"Documents residing on servers", but only documents which are generated as >part of a interchange protocol. Or do you mean that there are documents >(A) (which may not be XML) and then there are XML "documents" (B) which are >generated as part of the protocol to interchange the documents (A)? Boy I really was being quite inconsistent. When I talk about the protocol messages being documents I was talking about a single serializable stream of well-formed XML. I guess I really was quite confusing. >[...] CORBA is great for a simple (to >formulate & express) query which a server has to think hard about and can >eventually deliver a simple (to express) answer. XML is excelent for >situations where either the query or the responce is not so easily >simplified, and structured data interchange is desired. XML seems to remove the client's responsibility for constructing complex objects from primitive ones. The object arrives complex already. I agree. Another side-benefit is that complex requests and responses can be batched and transmitted over single short-lived connections. >[...] >Another thing which XML solves when used as a protocol is the problem of >adding information to an existing protocol without breaking existing >implementations. This is a serious concern. I didn't even think of that. >[...] With regard to the >protocol issue, we now have a MIME-ish thing with extensibility! Nor did I think of that, but this may be because I'm more familiar with mimes that play charades than mail-protocol MIME. >So the point of my responce, is that some of the ideas in your original >most strike a significant cord with my own ideas, but that the language for >a discussion of these topics is not clear. The language does need to be cleaned up, and I'd certainly appreciate any help I can get. Let me know whether this post clears things up any or whether it further muddies the waters. >There is also an problem that the real requirments for what you (Joe) are >trying to do are extremely fuzzy at this point. A clearer language for >talking about this is needed (clarify some terms) and the requirements of >what you are trying to do need to be specified more clearly. I know. They are fuzzy in my brain too. I'm working on that one. I've got something up there, but it is proving to be a very slippery beast (with fangs and horns and a ferocious roar!). -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Wed Nov 19 06:05:16 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:58:57 2004 Subject: Three Access Language Paradigms Message-ID: <199711190602.RAA03174@jawa.chilli.net.au> > From: Joe Lapp <jlapp@acm.org> > Derek Denny-Brown <ddb@criinc.com> wrote: > >I am not sure that the term "document" is clearly defined for your usage. > > I guess I did overuse and under-define the word 'document.' Another very useful terminological distinction is between "document" and "publication". A publication is one or more documents rendered for some medium. After the XML document has been parsed and groved, and auto links embedded, and transformations and stylesheets applied, and then sent to some output device, that is the publication. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Wed Nov 19 07:54:34 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:57 2004 Subject: Three Access Language Paradigms Message-ID: <3.0.32.19971118230246.00b6cc98@pop.intergate.bc.ca> At 09:03 PM 18/11/97 -0500, Jonathan Robie wrote: >At 11:04 PM 11/18/97 +0000, Henry S. Thompson wrote: >>Um, why doesn't XLL address all the goals of this thread and then >>some? >If I remember what I learned in high school rhetoric, I think the burden of >proof is on the affirmative! Let me rephrase Henry's comment: I suggest that those who are proposing brave new query language worlds go have a look at XLL. It *may* be the case that XLL xpointers hit a good 80-20 point in terms of what we'd like in a query language and in ease of implementation. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Nov 19 08:24:10 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:57 2004 Subject: Three Access Language Paradigms In-Reply-To: <3.0.32.19971118230246.00b6cc98@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971119092048.2c6f8b6c@pop3.demon.co.uk> At 23:54 18/11/97 -0800, Tim Bray wrote: [...] > >Let me rephrase Henry's comment: I suggest that those who are proposing >brave new query language worlds go have a look at XLL. It *may* be the >case that XLL xpointers hit a good 80-20 point in terms of what we'd >like in a query language and in ease of implementation. -Tim > I support this. I have found TEI Xpointers in XLL *extremely* useful - they have revolutionised my thinking about XML documents. Essentially, in many cases, the 'document is the database' (for smallish applications). I also use them inside JUMBO for navigating within known structures (e.g. seeing whether an element has certain relatives and, if so, taking some action.) For certain purposes the TEI Xpointer is limited. Initial discussions suggested: - SPACE (for coordinate systems such as images, tables) - some sort of regexp - FOREIGN for adding your own methods on. - and I'd be happy to see something for numeric and other types values I was in favour of these (I have to use them somehow in JUMBO), but it was made clear that Xpointers were intended as an addressing scheme and not a query language. I respect this distinction, but it would be very nice to be able to extend TEI syntax to allow this. As I understand it, TEI syntax (a la XLL) is confined the use in HREFs within elements with XML-LINK attributes. Any of the proposed extensions is (rightly) illegal there. But it would be possible to extend TEI for use *elsewhere* (e.g. in querying documents) and I would be very happy to see keywords of the sort above added *for query purposes*. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mtbryan at sgml.u-net.com Wed Nov 19 09:30:03 1997 From: mtbryan at sgml.u-net.com (Martin Bryan) Date: Mon Jun 7 16:58:57 2004 Subject: Query Languages for XML Message-ID: <E0xY6WT-0000EG-00@oveja> Paul Prescod wrote > We already have forms, and it doesn't require >an updatable SDQL. Where in XML do we have forms, or any statement that tells anyone what will happen to data placed into an editable field? ----------------------------------------------------------------- Martin Bryan, 29 Oldbury Orchard, Churchdown, Glos GL3 2PU, UK Phone/Fax: +44 1452 714029 E-mail: mtbryan@sgml.u-net.com For more information about The SGML Centre contact http://www.sgml.u-net.com For more information about the European Commission's Open Information Interchange (OII) initiative contact http://www.echo.lu/oii/en/oiistand.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Nov 19 10:34:12 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:58:58 2004 Subject: Query Languages for XML References: <E0xY6WT-0000EG-00@oveja> Message-ID: <3472C157.BE6EE6FD@technologist.com> Martin Bryan wrote: > > Where in XML do we have forms, "To reduce the initial barriers to adoption, a core set of HTML flow objects is recommended in addition to the core DSSSL flow objects. The HTML/CSS formatting model is somewhat different from the DSSSL model, and the inclusion of the HTML/CSS flow objects will make it possible to use XSL with HTML and CSS. It simplifies the targeting of HTML as the output format, and retains consistency of the object model and dynamic behaviors." - http://www.w3.org/TR/NOTE-XSL.html Included in the list are: "FORM INPUT SELECT TEXTAREA" > or any statement that tells anyone what will > happen to data placed into an editable field? This is specified in the HTML 4.0 proposed recommendation which has provisions for interactive processing on either the client or server sides. If and when someone standardizes an updateable document data manipulation language, it can be accessed from these forms just as SQL and ODQL are today. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mtbryan at sgml.u-net.com Wed Nov 19 11:40:20 1997 From: mtbryan at sgml.u-net.com (Martin Bryan) Date: Mon Jun 7 16:58:58 2004 Subject: Query Languages for XML Message-ID: <E0xY8Ym-0000tK-00@oveja> Paul Prescod wrote >> Where in XML do we have forms, > >"To reduce the initial barriers to adoption, a core set of HTML flow >objects is recommended in addition to the core DSSSL flow objects. The >HTML/CSS formatting model is somewhat different from the DSSSL model, >and the inclusion of the HTML/CSS flow objects will make it possible to >use XSL with HTML and CSS. It simplifies the targeting of HTML as the >output format, and retains consistency of the object model and dynamic >behaviors." > - http://www.w3.org/TR/NOTE-XSL.html > >Included in the list are: > >"FORM > INPUT > SELECT > TEXTAREA" > > >> or any statement that tells anyone what will >> happen to data placed into an editable field? > >This is specified in the HTML 4.0 proposed recommendation which has >provisions for interactive processing on either the client or server >sides. If and when someone standardizes an updateable document data >manipulation language, it can be accessed from these forms just as SQL >and ODQL are today. > So we are constrained to using the types of form objects defined in HTML using the processes defined in HTML 4.0, and can add no new functionality via XSL? Martin Bryan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Wed Nov 19 12:03:57 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:58 2004 Subject: Three Access Language Paradigms Message-ID: <1.5.4.32.19971119120312.00ac60a0@pop.mindspring.com> At 11:54 PM 11/18/97 -0800, Tim Bray wrote: >At 09:03 PM 18/11/97 -0500, Jonathan Robie wrote: >>At 11:04 PM 11/18/97 +0000, Henry S. Thompson wrote: >>>Um, why doesn't XLL address all the goals of this thread and then >>>some? >> >>If I remember what I learned in high school rhetoric, I think the burden of >>proof is on the affirmative! > >Let me rephrase Henry's comment: I suggest that those who are proposing >brave new query language worlds go have a look at XLL. It *may* be the >case that XLL xpointers hit a good 80-20 point in terms of what we'd >like in a query language and in ease of implementation. -Tim I agree - XLL pointers may be a good starting point for a query language, and this would have the advantage of reducing the number of things that people have to learn. It really *is* a nonprocedural query language, independent of the implementation language, etc., and it is easy to read. I am not sure, however, that it "addresses all the goals of this thread and then some". I'll have to take a closer look at it, and ask myself what it would take if, at some point, the other 20% needed to be added to it. Jonathan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Wed Nov 19 12:53:32 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:58:58 2004 Subject: Data manipulation languages for XML (was Query Languages ...) In-Reply-To: <3.0.32.19971118091934.009f7260@swbell.net> Message-ID: <T9VwfKAbutc0EwZ3@light.demon.co.uk> In message <3.0.32.19971118091934.009f7260@swbell.net>, "W. Eliot Kimber" <eliot@isogen.com> writes > >I'm afraid I don't see how using groves as the fundamental abstraction for >editing is inconsistent with satisfaction of any of the requirements. All >that's needed on top of what DSSSL provides are functions that represent >the editing actions needed (as opposed to modeling editing as a transform, >which is probably not a useful approach). If SQL provides a useful model >for defining such functions, we should use it. I'm perfectly happy with this idea too, and agree that we wouldn't need to add much to DSSSL/SDQL to allow the abstract representation of an editing process. SQL can act as a touchstone for us to check the completeness of the set of additional functions - I'm not sure it is a useful model as such. However, what I am really arguing is that once we have done this, there is still a case for going on to define a more user-friendly SQL-like syntax for specifying data manipulations. This syntax would have exactly the same relationship to SDQL as XSL does: it would be a simple front-end into a subset of SDQL's functionality. Richard. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Wed Nov 19 14:48:01 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:58 2004 Subject: Three Access Language Paradigms In-Reply-To: <3.0.32.19971118230246.00b6cc98@pop.intergate.bc.ca> Message-ID: <3.0.3.32.19971119094813.0095d780@pop.access.digex.net> Tim Bray <tbray@textuality.com> wrote: >Let me rephrase Henry's comment: I suggest that those who are proposing >brave new query language worlds go have a look at XLL. It *may* be the >case that XLL xpointers hit a good 80-20 point in terms of what we'd >like in a query language and in ease of implementation. -Tim I assume that XLL is what the XML-LINK document describes. If so, then for starters, what sort of editing mechanisms does XLL have? -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Patrice.Bonhomme at loria.fr Wed Nov 19 14:54:33 1997 From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme) Date: Mon Jun 7 16:58:58 2004 Subject: msxml 1.6 : ID without DTD Declaration Message-ID: <199711191453.PAA04490@chimay.loria.fr> <?HI?> I am developping an XLL package using the msxml parser. But i wondered if we can use ID attribute without any DTD declararion ? MSXML use the method DTD.findID(Name name) to retrieve an Element with the attribute ID=name, but without a DTD declaration i cant call DTD.findID(Name name) ! Is there a way to get round this ? A kind of : <!ATTLIST ANY ID ID #IMPLIED> Pat. -- ============================================================== bonhomme@loria.fr | Office : B.228 http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 -------------------------------------------------------------- * Projet Aquarelle : http://aqua.inria.fr * Serveur Silfide : http://www.loria.fr/Projet/Silfide ============================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Wed Nov 19 15:52:58 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:58 2004 Subject: Three Access Language Paradigms Message-ID: <1.5.4.32.19971119155058.00a6527c@pop.mindspring.com> At 11:54 PM 11/18/97 -0800, Tim Bray wrote: >It *may* be the case that XLL xpointers hit a good 80-20 point >in terms of what we'd like in a query language and in ease of >implementation. -Tim Tim, In XLL, is there a way to combine conditions with boolean operators? Say I am using XL7, and I need to do a query for those billable items for a particular patient number AND for a particular physician. Can I do this with XLL? If there are boolean operators, is there a way to specify precedence? Jonathan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Wed Nov 19 16:20:15 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:58:58 2004 Subject: msxml 1.6 : ID without DTD Declaration Message-ID: <199711191617.DAA14394@jawa.chilli.net.au> >From: Patrice Bonhomme <Patrice.Bonhomme@loria.fr> >I am developping an XLL package using the msxml parser. But i wondered if we >can use ID attribute without any DTD declararion ? MSXML use the method >DTD.findID(Name name) to retrieve an Element with the attribute ID=name, but >without a DTD declaration i cant call DTD.findID(Name name) ! >Is there a way to get round this ? A kind of : <!ATTLIST ANY ID ID >#IMPLIED> The current enhancements to SGML allow pretty much exactly what you suggest. <!ATTLIST #ALL id ID #IMPLIED> I am not sure when this will be added into XML. If it is are not in XML 1.0 then you should lobby for it to go into XML 1.1 (if such a thing comes). Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Wed Nov 19 17:11:10 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:58:58 2004 Subject: Three Access Language Paradigms In-Reply-To: <3.0.3.32.19971119094813.0095d780@pop.access.digex.net> Message-ID: <soq9IEA$axc0Ewe+@light.demon.co.uk> In message <3.0.3.32.19971119094813.0095d780@pop.access.digex.net>, Joe Lapp <jlapp@acm.org> writes >I assume that XLL is what the XML-LINK document describes. If so, then >for starters, what sort of editing mechanisms does XLL have? None - we are still talking read-only access here. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Wed Nov 19 17:43:56 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:58:58 2004 Subject: Three Access Language Paradigms In-Reply-To: <1.5.4.32.19971119155058.00a6527c@pop.mindspring.com> Message-ID: <t4598AAyaxc0Ewfj@light.demon.co.uk> In message <1.5.4.32.19971119155058.00a6527c@pop.mindspring.com>, Jonathan Robie <jwrobie@mindspring.com> writes > >In XLL, is there a way to combine conditions with boolean operators? Say I >am using XL7, and I need to do a query for those billable items for a >particular patient number AND for a particular physician. Can I do this with >XLL? If there are boolean operators, is there a way to specify precedence? No. An XLL expression supports a chain of locators, each of which starts from the last place you got to in the target document's structure. You can have a second chain, pointing to somewhere else, in which case the XPointer is deemed to point to the span witihn the document whose end-points are the two elements or characters you have specified by your locators. I've just checked over the original TEI Extended Pointer mechanism on which XPointers are based, and there is nothing in that to support boolean logic either. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Wed Nov 19 18:18:08 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:58 2004 Subject: Three Access Language Paradigms Message-ID: <1.5.4.32.19971119181728.00a3f010@pop.mindspring.com> At 04:41 PM 11/19/97 +0000, Richard Light wrote: >In message <1.5.4.32.19971119155058.00a6527c@pop.mindspring.com>, >Jonathan Robie <jwrobie@mindspring.com> writes >> >>In XLL, is there a way to combine conditions with boolean operators? Say I >>am using XL7, and I need to do a query for those billable items for a >>particular patient number AND for a particular physician. Can I do this with >>XLL? If there are boolean operators, is there a way to specify precedence? > >No. An XLL expression supports a chain of locators, each of which >starts from the last place you got to in the target document's >structure. You can have a second chain, pointing to somewhere else, in >which case the XPointer is deemed to point to the span witihn the >document whose end-points are the two elements or characters you have >specified by your locators. That's pretty much what I had thought when I read the XLL spec. Personally, in evaluating the 80/20 mix for a query language, I would think that boolean operators, boolean functions, and precedence would be pretty important. Another significant limitation of XPointers as a query language is that each term specifies *one* location, if I understand the spec correctly. It doesn't seem to be set up to allow result sets, e.g. the set of patient records that satisfy a particular requirement, the set of catalog entries that specify a particular requirement, etc. I would think that result sets are pretty important for query languages. I really like the simplicity, readability, and design cohesiveness of XLL, and I do think that the functionality it contains should be present in a query language for SGML/XML documents. It is not clear to me whether there is a good, orthogonal way to add in some of this other functionality to XLL; if so, XLL could be used as the basis for a query language. Using the same primitives would be nice, since anybody working with XML is going to have to learn XLL, and we don't want every poor schmo to have to learn 50 different ways to do a query. Jonathan ________________________________ Jonathan Robie Email: jonathan@texcel.no Texcel Research, Inc. ("http://www.texcel.no") xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ddb at criinc.com Wed Nov 19 19:15:17 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:58:58 2004 Subject: Three Access Language Paradigms Message-ID: <3.0.32.19971119111601.00a73eb0@mailhost.criinc.com> At 10:05 PM 11/18/97 -0500, Joe Lapp wrote: >Derek Denny-Brown <ddb@criinc.com> wrote: >>I am not sure that the term "document" is clearly defined for your usage. > >... in my post I intended the >word 'document' to mean a single XML file or any system that makes itself >appear as if it were analogous to an XML file, such as a database that >exposes DOM IDL :-) interfaces. That's the meaning I was using, though >I realize that it's probably not the best definition to work with. that helps. >>[...] There are no >>"Documents residing on servers", but only documents which are generated as >>part of a interchange protocol. Or do you mean that there are documents >>(A) (which may not be XML) and then there are XML "documents" (B) which are >>generated as part of the protocol to interchange the documents (A)? > >Boy I really was being quite inconsistent. When I talk about the protocol >messages being documents I was talking about a single serializable stream >of well-formed XML. I guess I really was quite confusing. this really helps. (for me at least) So this gives us two separate issues, which you are talking about. 1) XML used as an interchange abstraction (your discussion of protocols) 2) XML used as a data modeling abstraction (your XML on the server and XML document query discussions) Regarding (1) my last post had a number of my comments on using XML as a foundation for interchange (i.e. as a layer in a protocol implementation) so I won't go into it much more, other than to say that I see this as one of XML's greatest potentials. One real need though is some good, simple, free software that people can use to make this easy. LT-XML is a good step, but what I think is needed is a GPL version of something similar. One really good way to get XML into regular use beyond HTML-NextGeneration would be to get it into some GNU projects... just my 2 cents... If I had more time that I could devote to freeware projects, I would already be working on this. Regarding (2) XML with XLL provides all the pieces, but is almost too flexible to be used as a general purpose data-modelling abstraction. I think something somewhere between XML and RDF and Tim Bray's typed data extensions to XML. The problem is that XML is all about marking up text. For it to be used as a general data-modeling tool, you need some further mechanisms to constrain the actual data/document instances. With some basic work to add some more typing information to XML, and place some limits on element content models for parts of the document which are not really just text streams. At least in my mind there is a significant difference between: <PERSON> <NOMEN><FNAME>Derek</FNAME> <LNAME>Denny-Brown</LNAME> </NOMEN> <CONTACT><EMAIL>ddb@criinc.com</EMAIL> <POSTAL>blah.. blah .blah</POSTAL> </CONTACT> </PERSON> and <P><PERSON-INFO refid=ddb><FNAME>Derek</FNAME> <LNAME>Denny-Brown</LNAME></PERSON-INFO> is contactable via email at <PERSON-INFO refid=ddb><EMAIL>ddb@criinc.com</EMAIL></PERSON-INFO> or via the more traditional postal services at <PERSON-INFO refid=ddb><EMAIL>blah.. blah .blah</EMAIL></PERSON-INFO> </P> they contain the same info, but one is a very tightly constrained structure which enforces some nice rules (like you can have only one current NOMEN, though it might provide for alternate (non-prefered) NOMENs etc..) while the second is good for pulling the information to build the first from a free form document. The second would be much better if it included the first and all the PEERSON-INFO blocks were just references to the PERSON block to pull the appropriate structures. My general point being that XML is _too_ flexible for use as a general purpose data modelling tool, without some additional information. If I really wanted to use XML as a data modeling tool, I would require all sorts of data-type meta-info and content modeling constraints to allow XML to be used as a sort of snapshot of a data-set which stradled the relational and object oriented data modeling worlds. Used this way it provides a kind of object oriented (with some relation capabilities) database view, with strong support for dynamic quiries. Then again if what you are really after is a marked up text stream, then XML is a better tool than most, if only because so many people seem to like it. Java and Microsoft (independently) have helped show the world that mass marketing and the "boardroom sell" can take something a lot farther than it might ever have gotten on its own. -derek Derek E. Denny-Brown II || ddb@criinc.com "Reality is that which, || Seattle, WA USA when you stop believing in it, || WWW/SGML/HyTime/XML doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ddb at criinc.com Wed Nov 19 19:17:45 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:58:58 2004 Subject: Three Access Language Paradigms Message-ID: <3.0.32.19971119111803.00932b70@mailhost.criinc.com> At 05:06 PM 11/19/97 +1100, Rick Jelliffe wrote: >After the XML document has been parsed and groved, and auto links embedded, and transformations and stylesheets applied, and then >sent to some output device, that is the publication. what if the output device is a network interface for sending it to a client for interpretation? i.e. it is never intended to be rendered on screen or paper? One place I would like to sue XML is for application configuration files. When does that become a publication? -derek Derek E. Denny-Brown II || ddb@criinc.com "Reality is that which, || Seattle, WA USA when you stop believing in it, || WWW/SGML/HyTime/XML doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From neil at bradley.co.uk Wed Nov 19 20:36:59 1997 From: neil at bradley.co.uk (Neil Bradley) Date: Mon Jun 7 16:58:58 2004 Subject: CSS2 and XML tables Message-ID: <199711192036.UAA02763@andromeda.ndirect.co.uk> The CSS2 proposal mentions XML (thank goodness) several times, and even CSS1 had the capability to specify in-line and block styles, and list and list item styles for arbitary XML elements. When I saw that CSS2 had additional features for handling tables, I immediately thought there would be property types for use with XML. Maybe they are there, and I cannot find them. If not, can they be added to the Display property, as in 'table', 'head-row', body-row' and 'cell' shown below: Property name: 'display' Value: block | inline | list-item | run-in | compact | none | table | head-row | body-row | cell Initial: block Applies to: all elements If something like this is not done, I fear that rendering XML tables will only be achievable if HTML element names are used, or some other nasty technique is adopted. Neil. ----------------------------------------------- Neil Bradley - Author of The Concise SGML Companion. neil@bradley.co.uk www.bradley.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From paul at arbortext.com Wed Nov 19 21:04:17 1997 From: paul at arbortext.com (Paul Grosso) Date: Mon Jun 7 16:58:58 2004 Subject: Three Access Language Paradigms Message-ID: <3.0.32.19971119144056.0068dddc@pophost.arbortext.com> At 13:17 1997 11 19 -0500, Jonathan Robie wrote: > Using the same >primitives would be nice, since anybody working with XML is going to have to >learn XLL. . . I sure hope it is not the case that anybody working with XML is going to have to learn XLL. paul xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Wed Nov 19 21:23:26 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:58 2004 Subject: Three Access Language Paradigms Message-ID: <1.5.4.32.19971119212307.00a70970@pop.mindspring.com> At 04:03 PM 11/19/97 -0500, Paul Grosso wrote: >At 13:17 1997 11 19 -0500, Jonathan Robie wrote: >> Using the same >>primitives would be nice, since anybody working with XML is going to have to >>learn XLL. . . > >I sure hope it is not the case that anybody working with XML is going >to have to learn XLL. Oops! I guess that was a bit of an overstatement, wasn't it ;-> Jonathan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From markb at iosphere.net Wed Nov 19 21:56:32 1997 From: markb at iosphere.net (Mark Baker) Date: Mon Jun 7 16:58:58 2004 Subject: XML and Bean serialization Message-ID: <Pine.BSD/.3.91.971119160127.25480K-100000@mrburns.iosphere.net> I've recently proposed to Javasoft, via their public RMI-USERS mailing list, that they adopt XML as the serialization format for Beans and JARs. I see this is a critical move in unifying the "web" and "object" implementations of the distributed future (their respective *visions* are already practically identical). Interest, what little there has been so far, has been very positive. But unfortunately, Javasoft themselves have not yet responded. I'm trying to drum up public interest so that we might be able to push a little harder on this, perhaps even constructing a prototype two-way Bean/XML serializer to demonstrate our case (somewhat similar to Netscape's JavaScript Beans). Thanks. MB -- Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069 Will distribute business objects for food. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From norbert at datachannel.com Wed Nov 19 22:08:18 1997 From: norbert at datachannel.com (Norbert Mikula) Date: Mon Jun 7 16:58:59 2004 Subject: XML and Bean serialization References: <Pine.BSD/.3.91.971119160127.25480K-100000@mrburns.iosphere.net> Message-ID: <34736334.65CAE8CB@datachannel.com> Mark Baker wrote: > I see this is a critical move in unifying the "web" and > "object" implementations of the distributed future (their respective > *visions* are already practically identical). I think you might also be interested in some interesting thoughts by John Tigue. "XML Enabled Mechanisms for Distributed Computing on the Web" http://www.datachannel.com/channelworld/feature.htm -- Norbert H. Mikula Sr. Online Information Architect Norbert@DataChannel.com DataChannel, 155 108th Avenue NE Ste 400, Bellevue, WA 98004 Phone: 425.455.5450 Fax: 425.637.1192 http://www.datachannel.com -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 428 bytes Desc: Card for Norbert Mikula Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971119/7b21fae1/vcard.vcf From peter at ursus.demon.co.uk Thu Nov 20 00:39:09 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:59 2004 Subject: Query Languages for XML In-Reply-To: <E0xY8Ym-0000tK-00@oveja> Message-ID: <3.0.1.16.19971120000132.09f72e3c@pop3.demon.co.uk> At 11:02 19/11/97 -0000, Martin Bryan wrote: >Paul Prescod wrote >>> Where in XML do we have forms, >> >>"To reduce the initial barriers to adoption, a core set of HTML flow >>objects is recommended in addition to the core DSSSL flow objects. The >>HTML/CSS formatting model is somewhat different from the DSSSL model, >>and the inclusion of the HTML/CSS flow objects will make it possible to >>use XSL with HTML and CSS. It simplifies the targeting of HTML as the >>output format, and retains consistency of the object model and dynamic >>behaviors." >> - http://www.w3.org/TR/NOTE-XSL.html >> >>Included in the list are: >> >>"FORM >> INPUT >> SELECT >> TEXTAREA" >> >> >>> or any statement that tells anyone what will >>> happen to data placed into an editable field? >> >>This is specified in the HTML 4.0 proposed recommendation which has >>provisions for interactive processing on either the client or server >>sides. If and when someone standardizes an updateable document data >>manipulation language, it can be accessed from these forms just as SQL >>and ODQL are today. >> >So we are constrained to using the types of form objects defined in HTML >using the processes defined in HTML 4.0, and can add no new functionality >via XSL? The XSL/HTML4.0 looks an exciting place to start from (which I had overlooked). It would seem to be the most appropriate way to think about forms in XML (rather than developing them from scratch) Currently XSL (IMO) seems to derive almost entirely from a paper based metaphor. Although 'screen' is mentioned (just) under SCROLL flowobjects, these are little more than inanimate chunks of pixels. XML does not address how to add buttons, etc. This *could* be done by ECMAScript, I suppose, but it starts to look a bit kludgy. I am much more concerned with the potential interactive properties of XSL than laying out text to the nearest micron. I am not disparaging that - it's very important - but it seems to be the main philosophy behind XSL. I'd like to see an interactive component built in. P. > >Martin Bryan > > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ddb at criinc.com Thu Nov 20 02:28:16 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:58:59 2004 Subject: Query Languages for XML Message-ID: <3.0.32.19971119182940.00a66b80@mailhost.criinc.com> At 12:01 AM 11/20/97, Peter Murray-Rust wrote: >The XSL/HTML4.0 looks an exciting place to start from (which I had >overlooked). It would seem to be the most appropriate way to think about >forms in XML (rather than developing them from scratch) > >Currently XSL (IMO) seems to derive almost entirely from a paper based >metaphor. Although 'screen' is mentioned (just) under SCROLL flowobjects, >these are little more than inanimate chunks of pixels. XML does not address >how to add buttons, etc. This *could* be done by ECMAScript, I suppose, but >it starts to look a bit kludgy. > >I am much more concerned with the potential interactive properties of XSL >than laying out text to the nearest micron. I am not disparaging that - >it's very important - but it seems to be the main philosophy behind XSL. >I'd like to see an interactive component built in. I think there is some real potential for an extension to XSL to allow something akin to Microsoft's dHTML (dynamic HTML). I am not sure that it should neccessarily be in XSL 1.0, and it could be really hairy if people are using XSL-grove interface to the XML and DOM interface to the output. I have not quite figured out how to factor in DOM into XSL without making things really confusing... But, I tend to agree, XSL allow people to get to Netscape 3.0/IE 3.0 level from XML, but not the full "4.0" range that people are (justifiably) going wild over. -derek Derek E. Denny-Brown II || ddb@criinc.com "Reality is that which, || Seattle, WA USA when you stop believing in it, || WWW/SGML/HyTime/XML doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From lauren at sqwest.bc.ca Thu Nov 20 02:39:32 1997 From: lauren at sqwest.bc.ca (Lauren Wood) Date: Mon Jun 7 16:58:59 2004 Subject: Query Languages for XML In-Reply-To: <3.0.32.19971119182940.00a66b80@mailhost.criinc.com>; from "lauren" at Wed Nov 19 18:38:51 1997 Message-ID: <m0xYMW5-0009WiC@sqailor.sqwest.bc.ca> Derek Denny-Brown wrote: % I think there is some real potential for an extension to XSL to allow % something akin to Microsoft's dHTML (dynamic HTML). I am not sure that it % should neccessarily be in XSL 1.0, and it could be really hairy if people % are using XSL-grove interface to the XML and DOM interface to the output. % I have not quite figured out how to factor in DOM into XSL without making % things really confusing... I'm confused by this. The idea of the DOM is to standardize the object model part of "dynamic HTML" (whatever that might mean; the definition seems to change with the application that supports it, the person talking about it, and probably the phase of the moon as well). So what sort of extension to XSL do you mean? I also don't understand why the XML would have an XSL-grove interface, and the "output" (what does output mean?) would have a DOM interface, when the DOM should be an interface to an XML document... cheers, Lauren xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Thu Nov 20 05:20:25 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:58:59 2004 Subject: Three Access Language Paradigms Message-ID: <199711200517.QAA03826@jawa.chilli.net.au> > From: Derek Denny-Brown <ddb@criinc.com> > At 05:06 PM 11/19/97 +1100, Rick Jelliffe wrote: > >After the XML document has been parsed and groved, and auto links > embedded, and transformations and stylesheets applied, and then > >sent to some output device, that is the publication. > > what if the output device is a network interface for sending it to a client > for interpretation? i.e. it is never intended to be rendered on screen or > paper? If you need a second word for this, then "publication" is available. If you don't "document" is fine. But usually "publication" refers to (the result of) the processing chains that end at some computer interaction medium (e.g. a printer or screen). Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Thu Nov 20 14:28:02 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:58:59 2004 Subject: Access Languages are Tied to Schemas Message-ID: <3.0.3.32.19971120092659.0093cad0@pop.access.digex.net> I have been searching for the properties that a repository access language must have. Here I present an argument for why an access language must be tied to a repository's architecture in the manner analogous to how SQL and OQL are tied to database schemas. I infer what this implies for XML DTDs and then ask a question whose answer I think has important repercussions. Let's say that a "repository" is any software object that contains information and that provides a way for clients to read, write, and modify that information. A client must know how to talk to the repository in order to get the repository to do anything. We'll call the language that the client must speak the "access language." The client uses this language to submit requests and to understand responses. The server uses this language to make sense of requests and to submit responses. Both the client and the repository must house knowledge of this access language. (The access language may use distinct subset languages for requests and responses, but both software objects would still have to contain knowledge of both subset languages. For simplicity, I assume that requests and responses use the same language, but my argument should hold even if they are different.) The access language must convey information in two directions. In order for the information to be comprehensible, it must be conveyed in recognizable units. Both the client and the repository must know how to generate and parse these units. Hence, a standard must exist to which both sides conform. This standard says what kind of information units there are and what they look like. Information units usually have relationships with one another. A client often cares about accessing units that have a particular relationship with some other unit. For example, a client might care to retrieve all liens on a particular property. The access language must allow a client to select units according to their relationships with other units. In particular, a client must be able to identify the relationships of concern. Both the client and the repository must now be in agreement about the kinds of relationships that may exist among information units. We find we also need a standard that says what kinds of relationships there are and what kinds of information units participate in them. It seems that the standard has quite a bit to say. It says what kinds of information units there are, what kinds of information they contain, what kinds of relationships there are, and what information units participate in those relationships. What we have is an object model. This is the kind of thing that OMT and UML are very good at expressing. We have learned that both the client and the repository must have knowledge of the same object model. Moreover, in the spirit of object-oriented design, each side should harbor some representation of this model. That is, both sides have components that share a common architecture. In retrospect, this makes sense. Were the two sides working with different models we'd have a case of the infamous impedance mismatch. We normally think of impedance mismatch as occurring between an object-oriented application and a relational database, but it can also occur between two object-oriented applications. One organization may decide that liens are not useful entities in themselves and so bottle them up with their associated properties (i.e. properties would be aggregates containing liens, and liens would not be classes of the schema). Another organization may want to store liens separately so that they can select all liens that meet a given criterion (i.e. properties would be associated with liens, and liens would be classes of the schema). When the second organization decides to hook its client up to the first organization's database, the client can neither select among liens nor properly interpret property objects. Okay, so we've established the need for industries to standardize on object models. These standard object models would only say what the repositories need to look like through an access language. Any given repository is free to transparently translate that model into a more suitable internal one. We've also established the need for access languages to reflect these object models. SQL and OQL conform to this requirement by having clients use the language of the database's persistent storage schema. XML introduces another way to model information, a way that is distinct from the relational approach but somewhat similar to the object-oriented approach. XML repositories have schemas too, and these schemas are defined by the DTDs. Before concluding I'd like to ask a question whose answer may have significant repercussions. It seems that by asking an XML repository to manage information for a particular industry, we are asking ourselves to create DTDs that model the industry. The question is this: to what extent are DTDs to specify the object model of a given industry? More specifically, do we intend for the following capabilities to fully implement an object model: (1) the ability of a repository to ensure that the information it contains is always in conformance with the DTDs, and (2) the ability of the clients to properly interpret the informational units and the relationships that the DTDs declare? In conclusion, it seems that that an access language must impose architectural constraints on at least a component of a repository and that these architectural constraints will apply to all repositories that conform to a particular industry standard. In particular, it does not seem possible to create individual access language protocols that won't to some degree constrain the architectures of the repositories. Such languages are probably feasible only when we can think of a repository as a flat file of unrelated information units. Since an object model will have to be developed for each industry, we might as well standardize on a way to access object models in general. This way we won't be asking industries to perform the additional work of inventing an access language for each object model. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Nov 20 14:41:09 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:58:59 2004 Subject: Query Languages for XML References: <E0xY8Ym-0000tK-00@oveja> Message-ID: <34744CC7.E233E27C@technologist.com> > So we are constrained to using the types of form objects defined in HTML > using the processes defined in HTML 4.0, and can add no new functionality > via XSL? We can add functionality in XSL, but I think that it should be in the spirit of these basic form elements, in other words XSL should leave the interactive processing of user interface elements to languages that are explicitly designed to do it, such as ECMA(Java)script, TCL and Python. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Nov 20 14:42:48 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:58:59 2004 Subject: Query Languages for XML References: <3.0.1.16.19971120000132.09f72e3c@pop3.demon.co.uk> Message-ID: <34744D2A.3609B868@technologist.com> > XML does not address > how to add buttons, etc. This *could* be done by ECMAScript, I suppose, but > it starts to look a bit kludgy. I don't think that it is a generic markup language's role to address how to add buttons. ECMAScript is a good language for creating scriptable code components. XSL and DSSSL are good languages for specifying which scriptable code component should be used to represent which XML object. > I am much more concerned with the potential interactive properties of XSL > than laying out text to the nearest micron. I am not disparaging that - > it's very important - but it seems to be the main philosophy behind XSL. > I'd like to see an interactive component built in. I don't think that that is its job. XSL specifies a mapping from structured document nodes to (perhaps interactive) graphical components. I think it is going too far to ask it to also script those components. I would expect to make a tree control in DSSSL like this: (make component system-id: "http://www.controls.are.us.com/tree.js" parameters: '(()) ) Of course if a huge number of stylesheets needed a tree control, then it would be a good idea to make a tree control flow object: (make tree-control width: height: ...) Then the behaviour would be implicit in the flow object. Putting the code for the control inside the stylesheet would be, in my mind, rather ugly and confusing. Perhaps it wouldn't be too bad if the code snippet is very short: (make button onClick: "doit()") Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Thu Nov 20 16:03:52 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:59 2004 Subject: Access Languages are Tied to Schemas Message-ID: <1.5.4.32.19971120160239.00ab6714@pop.mindspring.com> At 09:26 AM 11/20/97 -0500, Joe Lapp wrote: >I have been searching for the properties that a repository access >language must have. Here I present an argument for why an access >language must be tied to a repository's architecture in the manner >analogous to how SQL and OQL are tied to database schemas. Ideally, the logical model exposed by an SGML repository should be the structure of the document itself, not the implementation details used for a particular repository architecture. An SGML DTD defines structures in the same way that the table declarations do for SQL, and in the same way that the class declarations do for object databases that use OQL. This is in keeping with the fundamental idea behind object persistence in object oriented databases: if you use an object oriented database with C++, your C++ class declarations are your schema. In the same way, if you use a repository with SGML or XML, the logical model is declared by the DTD. >A client must know how to talk to >the repository in order to get the repository to do anything. >We'll call the language that the client must speak the "access >language." The client uses this language to submit requests and >to understand responses. The server uses this language to make >sense of requests and to submit responses. Both the client and >the repository must house knowledge of this access language. If we're talking traditional databases, that means that both sides must know SQL, or both sides must know OQL, or whatever. Since we are talking SGML or XML repositories, that means that both sides must know SGML or both sides must know XML. >The access language must convey information in two directions. In >order for the information to be comprehensible, it must be conveyed >in recognizable units. Both the client and the repository must >know how to generate and parse these units. Hence, a standard must >exist to which both sides conform. This standard says what kind of >information units there are and what they look like. For an SGML repository, these recognizable units are SGML elements. Of course, for any particular SGML application, there would also be a DTD that defines the schema for the applications, and the clients may well have knowledge of this schema. The server might not need to have this knowledge in some cases, as long as it knows how to manage SGML in general. And there may be some clients that do not need this knowledge, either - e.g. a general purpose querying and browsing client should be written to work for any DTD, as should a formatting and printing engine, etc. In order to make general-purpose clients possible, clients must have some way of asking the repository for the schema - either the DTD schema or the structure of a particular document. >Information units usually have relationships with one another. A >client often cares about accessing units that have a particular >relationship with some other unit. For example, a client might >care to retrieve all liens on a particular property. The access >language must allow a client to select units according to their >relationships with other units. In particular, a client must be >able to identify the relationships of concern. The relationships among objects often express much of the semantics of any system - "it's not what you know, it's who you know". SGML/XML has two kinds of relationships: containment and links. Queries should be able to handle both. This has proven invaluable in OQL and SQL-3. >We find we >also need a standard that says what kinds of relationships there >are and what kinds of information units participate in them. But this can be quite general, e.g. the definition of SGML/XML. Again, this is analogous to using C++ or Java to define schemas in object oriented databases. >It seems that the standard has quite a bit to say. It says what >kinds of information units there are, what kinds of information >they contain, what kinds of relationships there are, and what >information units participate in those relationships. What we >have is an object model. An object model of the kind you discuss here seems like the object model of a particular application. >Moreover, in the spirit of object-oriented design, each >side should harbor some representation of this model. That is, >both sides have components that share a common architecture. In the spirit of object oriented systems, metadata is the way one system finds out about another system, unless they belong to the same application, in which case they share class declarations. The same should hold for SGML/XML repositories: programs that are part of the same application may have knowledge of the DTD, but metadata is the way to write general purpose programs, and writing general purpose software as much as possible is usually a big win. >We normally think of impedance mismatch as occurring >between an object-oriented application and a relational database, >but it can also occur between two object-oriented applications. >One organization may decide that liens are not useful entities in >themselves and so bottle them up with their associated properties >(i.e. properties would be aggregates containing liens, and liens >would not be classes of the schema). Another organization may >want to store liens separately so that they can select all liens >that meet a given criterion (i.e. properties would be associated >with liens, and liens would be classes of the schema). When the >second organization decides to hook its client up to the first >organization's database, the client can neither select among >liens nor properly interpret property objects. That depends, of course, on how the programs function. As long as I have access, I can log into anybody's database, browse it, formulate queries to find information, etc., because I use a general-purpose browsing and query facility. If I have programs dependent on the classes defined in a particular schema, then my programs do need to know the schema, e.g. the DTD. One of the great advantages of architectural forms is that they make it possible to write programs that work only on an agreed-upon abstract representation of the schema, and each individual organization can build on that abstraction to build documents that meet their own needs. This is a real strength of the HL7 Kona proposal for medical record attachments, which would allow parties to interchange information based on a set of well-defined architectural forms, yet allow freedom for each party to implement their own DTDs based on these architectural forms in order to accomodate their own needs. This is, of course, analogous to the "design patterns" approach of object oriented design, which strongly encourages writing programs that use the abstract base classes which define the interfaces rather than write programs that use the concrete classes that implement them. Jonathan ________________________________ Jonathan Robie Email: jonathan@texcel.no Texcel Research, Inc. ("http://www.texcel.no") xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From lauren at sqwest.bc.ca Thu Nov 20 16:19:23 1997 From: lauren at sqwest.bc.ca (Lauren Wood) Date: Mon Jun 7 16:58:59 2004 Subject: Access Languages are Tied to Schemas In-Reply-To: <3.0.3.32.19971120092659.0093cad0@pop.access.digex.net> Message-ID: <m0xYZJJ-0009X7C@sqailor.sqwest.bc.ca> > From: Joe Lapp <jlapp@acm.org> > Okay, so we've established the need for industries to standardize on > object models. These standard object models would only say what the > repositories need to look like through an access language. Any > given repository is free to transparently translate that model into > a more suitable internal one. We've also established the need for > access languages to reflect these object models. A nice summary of what the principles of the DOM are all about - defining an interface in a language-independent way that clients and hosts can implement without necessarily implementing any given underlying representation of the information. So the DOM is not really properly named, since it's really the specification of the interface rather than the object model that we are concerned with. > Before concluding I'd like to ask a question whose answer may > have significant repercussions. It seems that by asking an XML > repository to manage information for a particular industry, we are > asking ourselves to create DTDs that model the industry. The > question is this: to what extent are DTDs to specify the object > model of a given industry? More specifically, do we intend for the > following capabilities to fully implement an object model: (1) the > ability of a repository to ensure that the information it contains > is always in conformance with the DTDs, and (2) the ability of the > clients to properly interpret the informational units and the > relationships that the DTDs declare? One example (though not the only possible) is in the DOM work, which has three parts. 1) core - this contains the general methods, functions, definitions which are applicable to HTML and XML documents, e.g., what is a Node, how is an element represented, how does an attribute relate to the element it is attached to, etc. 2) HTML -this knows the HTML DTD and therefore can build on top of the DOM core with functions specific to that DTD 3) XML - this contains the stuff that HTML doesn't need that is in XML, such as CDATA sections I could imagine industry-specific versions of part 2), that build on the DOM core to add DTD-specific functionality for that industry. > In conclusion, it seems that that an access language must impose > architectural constraints on at least a component of a repository > and that these architectural constraints will apply to all > repositories that conform to a particular industry standard. In > particular, it does not seem possible to create individual access > language protocols that won't to some degree constrain the > architectures of the repositories. Such languages are probably > feasible only when we can think of a repository as a flat file of > unrelated information units. Since an object model will have to be > developed for each industry, we might as well standardize on a way > to access object models in general. This way we won't be asking > industries to perform the additional work of inventing an access > language for each object model. I think it is possible to build a general API for XML documents, so if one of your imposed requirements on a repository is that it be in XML, and a general solution would not require that, then I agree. I do not agree that an object model must be developed for each industry - if the access method is standard, then whichever underlying model of the information a given tool uses doesn't really matter. It will have implications in performance etc, but it should be possible to implement the interfaces if they have been reasonably defined. cheers, Lauren xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Thu Nov 20 16:56:49 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:59:00 2004 Subject: Access Languages are Tied to Schemas In-Reply-To: <m0xYZJJ-0009X7C@sqailor.sqwest.bc.ca> References: <3.0.3.32.19971120092659.0093cad0@pop.access.digex.net> Message-ID: <3.0.3.32.19971120115219.0093b740@pop.access.digex.net> "Lauren Wood" <lauren@sqwest.bc.ca> wrote: >[...] >I think it is possible to build a general API for XML documents, so >if one of your imposed requirements on a repository is that it be in >XML, and a general solution would not require that, then I agree. I >do not agree that an object model must be developed for each industry >- if the access method is standard, then whichever underlying model >of the information a given tool uses doesn't really matter. It will >have implications in performance etc, but it should be possible to >implement the interfaces if they have been reasonably defined. >From reading Jonathan's and Lauren's responses, it looks like I need to throw in a quick clarification. I agree that a repository need not have any knowledge of the semantics of a particular industry. We could use a general SGML repository to store any kind of document, where the repository's only knowledge of the document is its DTD. Relational databases (for example) give us this sort of approach, since they need not understand what is meant by the schemas that are stored within them. Elements are the informational units of an SGML/XML repository in the same way that tables and columns and rows are the informational units of relational databases. However, each domain does have information units that are specific to that domain, and they exist as units regardless of the more fundamental units from which they are constructed. An RDBMS's schema specifies these domain-specific units, as does an XML-document's DTD. Hence, the DTD does intend to capture the object-model of a particular domain, even if this object model is expressed in the language of a more general object model. I'm asking a question about what we expect our DTD schemas to accomplish for these domain-specific object models. Do we expect general SGML/XML repositories to be powerful enough to allow them to represent almost any domain-specific object model? BTW, I agree that IDL interfaces are another kind of access language to a repository and that DOM in particular satisfies the property of access languages I was arguing for. It provides fundamental contructs from which domain-specific information units can be built. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Thu Nov 20 17:11:37 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:59:00 2004 Subject: Access Languages are Tied to Schemas Message-ID: <1.5.4.32.19971120171103.00a78064@pop.mindspring.com> At 11:52 AM 11/20/97 -0500, Joe Lapp wrote: >Do we expect general SGML/XML repositories to be powerful >enough to allow them to represent almost any domain-specific >object model? Yes. There are at least three SGML/XML repositories that claim to be able to import any SGML document, and which also support XML. To my knowledge, none of them currently supports queries that take advantage of the relationships expressed in links, but at least two of them support queries that combine structure and content in at least some form, and which support queries based on containment relationships. Jonathan jonathan@texcel.no Texcel - http://www.texcel.no xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Thu Nov 20 17:20:24 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:59:00 2004 Subject: Access Languages are Tied to Schemas In-Reply-To: <3.0.3.32.19971120115219.0093b740@pop.access.digex.net> References: <m0xYZJJ-0009X7C@sqailor.sqwest.bc.ca> <3.0.3.32.19971120092659.0093cad0@pop.access.digex.net> Message-ID: <3.0.3.32.19971120122026.0096c830@pop.access.digex.net> Joe Lapp <jlapp@acm.org> wrote: >[...] Do we >expect general SGML/XML repositories to be powerful enough to allow >them to represent almost any domain-specific object model? I don't like how I worded the question here. Let's try again: What sorts of object models do we want to be able to represent in SGML/XML? An answer that says "whatever SGML/XML can represent as it is currently defined" doesn't help me here. I care about what we intend to do with these future repositories and what it's going to take to do it. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ddb at criinc.com Thu Nov 20 17:57:07 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:59:00 2004 Subject: Query Languages for XML Message-ID: <3.0.32.19971120095847.00a8ce80@mailhost.criinc.com> At 06:38 PM 11/19/97 -0800, Lauren Wood wrote: > The idea of the DOM is to standardize the object >model part of "dynamic HTML" (whatever that might mean; the definition >seems to change with the application that supports it, the person talking >about it, and probably the phase of the moon as well). So what sort of >extension to XSL do you mean? I also don't understand why the XML >would have an XSL-grove interface, and the "output" (what does >output mean?) would have a DOM interface, when the DOM should >be an interface to an XML document... I am not neccessily saying that DOM = dynamic HTML, but rather it is my expectaction that dynamic HTML will depend on the DOM model, which from what little I have glimpsed (admission of a failure to properly look into it on my part), is quite different from the SGML/XML Grove model. I envision that a number of the initial XSL implementations which use the HTML/CSS flow objects, will be based on existing HTML display engines. These engines, asuming they have any real "dynamic" HTML potential, will be using javascript/jscript/vbscript and something at least DOMish to provide the "dynamic" part of the dynamic HTML. Thus I would expect that a XSL implementation that did more than build a static page would need to work with these engines using a DOMish interface. This means in the case of some XSL document, that the 'input' is a XML document (and a XSL stylesheet) and the output is the screen via this HTML-based display engine which allows some "dynamic" behaviour via a DOMish interface. That means that the XSL stylesheet (assuming it is using some XSL extensions to talk DOMishness with the display engine) is talking Grove-speak to the original XML document (because that is how XSL was defined, at least in how I read the spec) and DOMishness to the display engine (beyond the initial flow-object creation). Having only limited experience with DSSSL, I really don't have a complete picture of how XSL/DSSSL could work in an "dynamic" output media environment. what I mean by "dynamic" in the above paragraphs is that the display engine has some means to change the (existing) rendering, on the fly. I click the "Verify" button and all the text fields which have invalid entries become some nuclear-neon pink, so that I know where my error are, as an example. Or even better, I can insert some new flow-objects or remove existing flow objects from the displayed flow-object stream. My classic example of what I want from a "dynamic" HTML rendering engine is that I can build a "tree" using the builtin list/list-item flow-objects, where I can expand/collapse portions of that tree at runtime, without reloading the document. I hope this emplains a bit. I realize my original post was a (wee-bit) criptic, and I left out some of my in-between thought processes (as an excercise to the reader of coarse. <grin>) -derek Derek E. Denny-Brown II || ddb@criinc.com "Reality is that which, || Seattle, WA USA when you stop believing in it, || WWW/SGML/HyTime/XML doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ddb at criinc.com Thu Nov 20 18:06:01 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:59:00 2004 Subject: Query Languages for XML Message-ID: <3.0.32.19971120100709.00a8c5b0@mailhost.criinc.com> At 09:46 AM 11/20/97 -0500, Paul Prescod wrote: >Of course if a huge number of stylesheets needed a tree control, then >it would be a good idea to make a tree control flow object: One of the things that I see as a potential problem is that HTML etc as it is used now has 2 (as I count them this side of the morning) relatively distinct uses. 1) as an alternate form of (relatively) static information. 2) as a (very-basic) cross-platform (g)ui. XSL and DSSSL are focusing rather hard on (1), but not on (2). That may not be a bad thing if it is made clear that from the designers point of view (2) is better left to java, which it would be if the borwser people could better integrate java into their browsers. the problem is that (2) often spends a lot of time trying to do a lot of the stuff that the display engine for (1) already has figured out. hmm... so maybe what I am looking for is a "standard" way to extend a XSL processing/display engine with new flow-object types at run-time. Paul, was it you who talked about this some months ago? Someone did, so it isn't a new idea. -derek Derek E. Denny-Brown II || ddb@criinc.com "Reality is that which, || Seattle, WA USA when you stop believing in it, || WWW/SGML/HyTime/XML doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Thu Nov 20 18:15:23 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:59:00 2004 Subject: Access Languages are Tied to Schemas Message-ID: <1.5.4.32.19971120181453.00ad6a4c@pop.mindspring.com> At 12:20 PM 11/20/97 -0500, Joe Lapp wrote: >Joe Lapp <jlapp@acm.org> wrote: >>[...] Do we >>expect general SGML/XML repositories to be powerful enough to allow >>them to represent almost any domain-specific object model? > >I don't like how I worded the question here. Let's try again: What >sorts of object models do we want to be able to represent in SGML/XML? Any object model, but with some limitations on the extent of the representation. The following properties of object models are easily represented in SGML/XML: o Identity o State o Type These properties are not easily represented: o Behavior (except for in languages that allow methods to be represented as data, e.g. Java) o Encapsulation constraints There are indirect methods for describing inheritance in SGML/XML, but they are different from the inheritance mechanisms in OO languages. SGML/XML can represent the data and identity portion of any object model expressed in C++, Java, CORBA, etc., including the reference network. Jonathan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Thu Nov 20 18:56:32 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:00 2004 Subject: Generalizing the SGML/XML information model and Releasing MONDO Message-ID: <Pine.SOL.3.91.971120105331.24890A-100000@alumnae> [This is a long email so I will also put it online at "http://www.chimu.com/projects/mondo/" ] Some recent discussions on xml-dev and c.t.sgml have included query languages, encoding complex information (trees, graphs, etc.), object serialization, and extended metamodeling. I recommend enlarging the scope of these discussions and thinking about aligning SGML/XML with other disciplines that can help accomplish these tasks. This aligning would take advantage of the tools and techniques that are already available in other industries: not just by duplication of design but by actually merging with more general capabilities. Although alignment has been successfully done in some areas of SGML/XML I think it is conspicuously lacking in a crucial place: SGML's information model. By improving this particular weakness in SGML by taking advantage of well-established industries, an abundance of other needs become much more easily satisfied. Generalizing the SGML/XML information model ------------------------------------------- The desired applications of SGML/XML have grown beyond the original focus on documents towards working with much more general information and processing. SGML is a combination of encoding technology and an information modeling language. But that modeling language (DTDs and Groves) is very weak and is constrained by being focused on document-oriented information. It is also esoteric and not equivalent to any of the mainstream information modeling approaches. I recommend considering modeling separately from encoding technology. For modeling I think object-oriented information models can subsume SGML's document-oriented models and provide the ability to handle much more advanced models. Object-oriented information models can be very general, expressive, and understandable. This allows them to model many types of information equally well: both document-oriented and more general information. The strength of object-oriented information modeling has resulted in an abundance of good analysis, patterns, and specific models being built using it. This last point is the most important. If SGML/XML aligns with the information modeling industry, many more tools will immediately become available. For describing models you can use the Unified Modeling Language (UML) and tools such as Rational Rose (and several other techniques and tools). Implementing models can be done very easily with most OO languages (with or without generic frameworks), and the resulting implementation can be far more knowledgeable about the semantics of the information it is working with. There are many products that provide persistence and UI presentation that are designed to work with OO DomainModels. There are standard query languages (OQL/SQL) and interface languages (CORBA/IDL). The information modeling industry provides an extensive list of high-quality technologies, standards, and techniques. There has been a lot of great work done with SGML/XML in both modeling (DTDs) and technologies (e.g. HyTime). If this quality work is integrated into the common environment of OO information modeling and OO technologies then it will be available to a larger audience. It will also frequently become easier to understand and more capable because it can take advantage of the inherent abilities of OO models. For example, much of HyTime addressing is very easily and flexibly described in terms of object associations. HyTime becomes more powerful in the general object context. This isn't to say everything is easy. There are still the issues of how to work with different information models on different technologies (e.g. how smart the objects are) and what additional technologies need to be provided to reproduce expected SGML functionality (e.g. like HyTime or extending (through object-methods) OQL with containment-closure abilities). And some tools would never be generalized because the SGML DTD&Grove model are sufficient for the task or the tool is too high a quality to risk moving (e.g. Jade). Overall, I think the benefits will be enormous. MONDO ----- I have been working on a project (called MONDO) to prove the benefits of this alignment and to provide an architecture and the frameworks to support it. MONDO is primarily an architecture: it describes the components (e.g. ObjectBuilder, DomainModel, ObjectEncoder), their responsibilities, and the interfaces among those components. It is meant to be open and language neutral. MONDO will also have a reference implementation in Java (prototypes were in Java, Perl, and Smalltalk). The current reference implementation includes frameworks and tools for the normal document-oriented tasks and also for some more general or object-oriented capabilities. As an example of the later, MONDO can serialize and deserialize Java objects to human readable (XML or OML) encodings. I have been working on MONDO for quite a while and been producing tangibles (i.e. designs, documentation, and code) off and on for a bit more than a year. This is the first time I am releasing them openly. The WWW site currently has some FAQ's, some references (extracted from the design document), and placeholders and timelines for expected additions. The references may be especially useful because they provide a sampling of the integration from these multiple fields. I hope to have the design document (first pass is about 80 pages) up on the web site by early next week and will start putting up the reference code shortly thereafter. The MONDO WWW site is at: http://www.chimu.com/projects/mondo/ As an example (teaser ;-) of the MONDO design, I have included a couple (non-sequential but related) paragraphs below. ====== ObjectBuilder The responsibility of the ObjectBuilder is to build all or part of the Objectbase from an external source. Generally this source will be a human-readable text file, but there are several stages to ObjectBuilding which can each have different approaches (e.g. we could read from a binary file instead). Assuming we have a textual file-based approach, ObjectBuilding would go through three stages: Read from the text file and produce a stream of text Parse the text and turn it into a recipe (what objects to build and what ingredients to use) Build the recipe and construct objects within the DomainModel ------- Recipes for building objects A recipe describes how to build a collection of associated objects. All the information that is placed into the DomainModel by MONDO is the result of building recipes. By formalizing recipes we separate the encoding of information (e.g. whether it is human readable and how to parse it) from what information is in the encoding. MONDO uses that information to construct the knowledge in a form we want to work with, the Objectbase. ====== Any feedback on MONDO or these concepts is appreciated and I hope they contribute to some of the topics that have been addressed recently. I will let people know when the main design document is on line and when the code to work with is downloadable. If you are interested in MONDO for your application or want to help with the project, let me know. --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From elm at arbortext.com Thu Nov 20 19:16:30 1997 From: elm at arbortext.com (Eve L. Maler) Date: Mon Jun 7 16:59:00 2004 Subject: Three Access Language Paradigms Message-ID: <3.0.32.19971120141133.00b17780@village.doctools.com> At 12:21 AM 11/20/97 -0500, Rick Jelliffe wrote: > > >> From: Derek Denny-Brown <ddb@criinc.com> > >> At 05:06 PM 11/19/97 +1100, Rick Jelliffe wrote: >> >After the XML document has been parsed and groved, and auto links >> embedded, and transformations and stylesheets applied, and then >> >sent to some output device, that is the publication. >> >> what if the output device is a network interface for sending it to a client >> for interpretation? i.e. it is never intended to be rendered on screen or >> paper? > >If you need a second word for this, then "publication" is available. If you >don't "document" is fine. But usually "publication" refers to (the result of) >the processing chains that end at some computer interaction medium >(e.g. a printer or screen). Other names for this that I've heard (and sometimes used): o Deliverable o Presentation instance (not as good for non-rendered information) Eve xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Thu Nov 20 19:27:31 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:59:00 2004 Subject: Access Languages are Tied to Schemas In-Reply-To: <1.5.4.32.19971120181453.00ad6a4c@pop.mindspring.com> Message-ID: <3.0.3.32.19971120142812.00972360@pop.access.digex.net> Jonathan Robie <jwrobie@mindspring.com> wrote: >These properties are not easily represented: > >o Behavior (except for in languages that allow methods to be >represented as data, e.g. Java) >o Encapsulation constraints I'm not sure what you mean by "encapsulation constraints." OMT uses a variety of constraints, but none go by that name. Pouring over the UML documentation I only see the term "constraint" being used in a general way. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Thu Nov 20 19:35:48 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:59:00 2004 Subject: Access Languages are Tied to Schemas Message-ID: <1.5.4.32.19971120193521.00aa7580@pop.mindspring.com> At 02:28 PM 11/20/97 -0500, Joe Lapp wrote: >Jonathan Robie <jwrobie@mindspring.com> wrote: >>These properties are not easily represented: >> >>o Behavior (except for in languages that allow methods to be >>represented as data, e.g. Java) >>o Encapsulation constraints > >I'm not sure what you mean by "encapsulation constraints." OMT uses >a variety of constraints, but none go by that name. Pouring over the >UML documentation I only see the term "constraint" being used in a >general way. I'm thinking of public/protected/private access in languages like C++, i.e. the constraints on access to encapsulated data. Jonathan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Thu Nov 20 20:40:39 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:00 2004 Subject: Access Languages are Tied to Schemas Message-ID: <Pine.SOL.3.91.971120123539.24890B-100000@alumnae> Jonathan Robie <jwrobie@mindspring.com> wrote > The following properties of object models are easily represented in > SGML/XML: > o Identity > o State > o Type I would disagree that even these items can be easily represented in SGML/XML (for example, State is more complicated than a particular set of attribute values). I think it is more the other way around: SGML/XML has a particular model of Identity, State, and Type which an object model can easily represent. But in any case, these items are (mostly) the core concept of OO (i.e. Objects) instead of being properties of object models. Objects have Identity, State, and Behavior where the implementation of both state and behavior is encapsulated. Object models describe the possible objects and structures that can exist in a system. This will include describing[1]: Types: The interfaces (methods, associations, and abstract state) that objects can have. Associations: The possible relationships between objects Operations: The messages an object can respond to State Models: The possible state transitions for an object Attributes: The simple associations (to basic value types) of an object Inheritance: The similarities/relationships among types DTDs can describe some of this modeling information, but not particularly well and really only for a limited set of object models. Examples of weaknesses are: only one true association (content) which is a pure containment, all other attributes must be basic data types, limited cardinality control, likelihood of arbitrary ordering, inability (or difficulty) to express Type relationships, inability (or difficulty) for an Object to support more than one type. These are weaknesses compared to the most basic modeling abilities of common modeling techniques (UML, Booch, HOOD, Syntropy, OORAM). Thought about another way, DTDs are good models for textual input of information (what rules must be satisfied by the encoding) but this should be considered only a view onto the true information model. SGML/XML describes a construction view of an information model and provides the front-end to instantiating an Objectbase from that model. Using SGML/XML to try to describe any information model (via DTDs) will be over extending its abilities into areas where other tools/techniques are much better qualified. --Mark mark.fussell@chimu.com [1] An implementation of an object model (or an implementation model developed from a conceptual model) also uses classes, methods, and instance variables to satisfy the above descriptions within a particular system. I am trying to use the most established and main-stream definitions of all these terms, but you may also want to see the references at the MONDO site for possible different definitions (e.g. Dictionary of Object Technology [Fireside+E 95]). i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Thu Nov 20 21:31:42 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:59:00 2004 Subject: Access Languages are Tied to Schemas Message-ID: <1.5.4.32.19971120213117.00ae2538@pop.mindspring.com> At 12:40 PM 11/20/97 -0800, Mark L. Fussell wrote: >I would disagree that even these items can be easily represented in >SGML/XML (for example, State is more complicated than a particular set of >attribute values). I think it is more the other way around: SGML/XML has >a particular model of Identity, State, and Type which an object model can >easily represent. Our basic difference here is that I am thinking primarily in terms of the network of objects available in object oriented systems at run-time, with their metadata (if available), and you seem to be thinking of abstractions used to create object oriented systems. For instance, the state of an object is precisely equivalent to the set of attribute values associated with that object. Either of these can be referred to as an object model, but they are not the same thing. Also, you may be inferring that I am trying to say that SGML can be a replacement for CORBA or other distributed object architectures. No way. In fact, at this point I am not advocating anything concrete, except that I think there should be some kind of query language that SGML/XML systems can use to access data in foreign systems like relational or object oriented databases, and at present, it makes sense to me that such a query language should be defined in terms of SGML/XML structure. And I think that SGML/XML is probably powerful enough for that - at least, it is if we are using it only for retrieval of information, and not for modification of information; for instance, everything that is stored in an object oriented database can be stored in SGML - the object ids can be turned into IDs, containers can be expressed either through containment or sets of IDREFs, etc. As long as access is read-only, you aren't losing much. However, you wouldn't want to modify it through such an interface, since you have lost encapsulation, polymorphic references, type safety of references, etc. This is analogous, in some ways, to ODBC access for object oriented databases, which allows a view on the data in the model, but does not encompass the full semantics of the object database. Such interfaces are great for read-only access, but certainly do not replace the need for an object database, and are not really very good for write access. Jonathan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Thu Nov 20 21:42:48 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:59:00 2004 Subject: Access Languages are Tied to Schemas In-Reply-To: <Pine.SOL.3.91.971120123539.24890B-100000@alumnae> Message-ID: <3.0.3.32.19971120164327.00976670@pop.access.digex.net> "Mark L. Fussell" <fussellm@alumni.caltech.edu> wrote: >[...] Object models describe the possible objects >and structures that can exist in a system. [...] >DTDs can describe some of this modeling information, but not particularly >well and really only for a limited set of object models. I do think that "object model" is too broad a term for what an SGML/XML repository can accomplish. I think the SGML/XML repository offers a new way of looking at objects. Clients may process SGML/XML constructs as raw document information, or clients may process the constructs by interpreting them (adding semantic value not provided by the repository). I'm guessing that most clients that go about interpreting repository data will create objects that contain those data, and those objects will have behavior. We have a single object's data living as sibling objects on many client machines. I'm guessing that these objects (instantiated on client machines) will all have behavior and any other property we ascribe to objects of an object model. The object model of an SGML/XML repository is schizophrenic. When we evaluate the capabilities of SGML/XML to support object models, I think we need to take client behavior into account. The repository is acting more like a file system for the state information of objects, and the clients are more like applications that use the file system. This seems like a different model for designing systems, and I wonder how far we can take it. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Nov 20 21:50:17 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:00 2004 Subject: Generalizing the SGML/XML information model and Releasing MONDO In-Reply-To: <Pine.SOL.3.91.971120105331.24890A-100000@alumnae> Message-ID: <3.0.1.16.19971120211323.1e97f602@pop3.demon.co.uk> At 10:56 20/11/97 -0800, Mark L. Fussell wrote: > >[This is a long email so I will also put it online at >"http://www.chimu.com/projects/mondo/" ] > Mark, thanks very much for this posting. One of the main goals in setting up XML-DEV was for the shared development of software and (although I'm posting this before visiting your site) this looks very valuable. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From markb at iosphere.net Thu Nov 20 22:12:58 1997 From: markb at iosphere.net (Mark Baker) Date: Mon Jun 7 16:59:01 2004 Subject: Access Languages are Tied to Schemas In-Reply-To: <3.0.3.32.19971120164327.00976670@pop.access.digex.net> Message-ID: <Pine.BSD/.3.91.971120165951.27833P-100000@mrburns.iosphere.net> On Thu, 20 Nov 1997, Joe Lapp wrote: > When we evaluate the capabilities of SGML/XML to support object models, > I think we need to take client behavior into account. The repository is > acting more like a file system for the state information of objects, and > the clients are more like applications that use the file system. No, I think that's what we should be trying to stay away from. XML is self-describing structured storage - for anything you want to shove in it. Implementation, state, properties, events, behavioural semantics, whatever. Any object I have can be entirely serialized into an XML document and back again without information loss. The XML document *is* the object. All I need is a framework to transparently activate documents. Or in other words, reserialize it from XML into RAM. So, there are no 'clients' per se. There's browsers, and then there's serialized objects streaming themselves into them. My Javasoft proposal mentions Beans specifically, but for those of you not familiar with Beans, *every* Java object is automatically a Bean in JDK 1.1. So, my proposal to Javasoft isn't a niche idea - it's meant to apply to all objects. Now it's also apparently implemented in MONDO. Bonus. Thanks Mark! MB -- Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069 Will distribute business objects for food. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Thu Nov 20 22:32:18 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:01 2004 Subject: Three Access Language Paradigms Message-ID: <3.0.32.19971120074053.00bac08c@pop.intergate.bc.ca> At 10:50 AM 19/11/97 -0500, Jonathan Robie wrote: >In XLL, is there a way to combine conditions with boolean operators? No; no booleans. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Thu Nov 20 22:44:59 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:59:01 2004 Subject: Access Languages are Tied to Schemas In-Reply-To: <Pine.BSD/.3.91.971120165951.27833P-100000@mrburns.iosphere .net> References: <3.0.3.32.19971120164327.00976670@pop.access.digex.net> Message-ID: <3.0.3.32.19971120174538.0093b9e0@pop.access.digex.net> Mark Baker <markb@iosphere.net> wrote: >On Thu, 20 Nov 1997, Joe Lapp wrote: >> When we evaluate the capabilities of SGML/XML to support object models, >> I think we need to take client behavior into account. The repository is >> acting more like a file system for the state information of objects, and >> the clients are more like applications that use the file system. > >No, I think that's what we should be trying to stay away from. >[...] >Any object I have can be entirely serialized into an XML document and >back again without information loss. The XML document *is* the object. >All I need is a framework to transparently activate documents. Or in >other words, reserialize it from XML into RAM. I think we are in agreement (I disagree, we agree). An XML document is capable of representing any object and all aspects of that object. But an XML document isn't the object it represents. You have to deserialize that document back into an object before you have the fully featured object again. An XML repository could store those objects (in their XML document representation) and even keep the relationships among those objects, but it does not animate those objects. The objects are alive when they are deserialized on the clients. To get a repository to animate the objects you'd have to make the repository a bit more than just a repository. For one thing, you'd also need a JVM. >[...] >My Javasoft proposal mentions Beans specifically, but for those of you not >familiar with Beans, *every* Java object is automatically a Bean in JDK >1.1. So, my proposal to Javasoft isn't a niche idea - it's meant to >apply to all objects. Oh, I fully agree here, too. Actually, I was thinking of your proposal to JavaSoft when I wrote that previous post. I intended to mention that XML repositories could serve as databases for serialized Java objects. Your idea to use XML to represent serialized Java objects is intriguing. As a side note, you mention that in JDK 1.1 every object is a bean. I thought beans had to be serializable. Are you saying that in JDK 1.1 every Java object that ever gets created is serializable? -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Fri Nov 21 01:58:13 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:01 2004 Subject: Access Languages are Tied to Schemas Message-ID: <199711210155.MAA03460@jawa.chilli.net.au> > From: Jonathan Robie <jwrobie@mindspring.com> > The following properties of object models are easily represented in SGML/XML: > > o Identity > o State > o Type > > These properties are not easily represented: > > o Behavior (except for in languages that allow methods to be > represented as data, e.g. Java) > o Encapsulation constraints I think you miss what is perhaps *THE* most important thing that SGML content models represent: sequence. This is one of the essential distinguishing features of SGML. If I have <p>Refer also to <citation> <title>XML draft at http://www.w3c.org/TR for more info.

then the sequence of elements and data in to citation element are vitally critical. Sequence is not an artifact of formatting, in many cases, but as intrinsic to the data as encapsulation and so on. The problem I see with so many discussions of the virtues of object-oriented inheritance systems is that they fail to discuss how inheritance works with sequence. It seems to be an issue tucked aside. For example, if the content model of the above is and I want to use the citation element type as a supertype, and derive a new element type with the following content model so I can say

Refer also to XML draft edited by McQueen, Bray, Paoli at http://www.w3c.org/TR for more info.

This kind of adding element types in particular points in sequences is, as I say, one of the most basic requirements for any real work. I am very interested in seeing inheritance-based models that address this issue: that would be great. The best idea I have some up with is the following: to allow a new keyword #OTHER (or #ANY) to be allowed in content models, to represent any one unambiguous element type. This allows the creator of the original content model the ability to declare points in content models which are publically available for extension by derived element types (declared or undeclared). I currently think that any inheritance-based declaration system must presuppose such explicit inheritance points. I think it is merely a matter of strong typing and interface control. Rick Jelliffe Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From markb at iosphere.net Fri Nov 21 02:05:43 1997 From: markb at iosphere.net (Mark Baker) Date: Mon Jun 7 16:59:01 2004 Subject: Access Languages are Tied to Schemas In-Reply-To: <3.0.3.32.19971120174538.0093b9e0@pop.access.digex.net> Message-ID: On Thu, 20 Nov 1997, Joe Lapp wrote: > I think we are in agreement (I disagree, we agree). An XML document is > capable of representing any object and all aspects of that object. But > an XML document isn't the object it represents. You have to deserialize > that document back into an object before you have the fully featured > object again. That's one way of doing it of course, and very useful for some applications, such as dynamic binding of data to behaviour ala compound document frameworks (and the new beans activation framework). Think of this as serializing a class. But in *many* cases, you just want to make the *object* persist simply, perhaps even on the machine with the browser. This is especially suitable for agent systems; you bring the ability to persist along with you instead of attempting to store it "behind" you. It's a move away from TP-monitor style ACID transactions, and towards a more "make forward progress" means of distributed computing. Object groups are a good example of this. Certainly though, both tools should be available to us. We shouldn't try to shoehorn everything into a single solution when that solution isn't general enough for all of our needs. But, I've got the feeling that we'll be doing a lot more of one than the other before too long. YMMV. 8-) >An XML repository could store those objects (in their > XML document representation) and even keep the relationships among those > objects, but it does not animate those objects. The objects are alive > when they are deserialized on the clients. To get a repository to > animate the objects you'd have to make the repository a bit more than > just a repository. For one thing, you'd also need a JVM. Which isn't too difficult nowadays, especially when so much is being done with the browser (as it should). And a JVM is no different than requiring a script interpreter. > As a side note, you mention that in JDK 1.1 every object is a bean. I > thought beans had to be serializable. Are you saying that in JDK 1.1 > every Java object that ever gets created is serializable? You're right of course. But you'll find that anything that "makes sense" to serialize, can be. MB -- Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069 Will distribute business objects for food. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Fri Nov 21 02:07:10 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:59:01 2004 Subject: Access Languages are Tied to Schemas Message-ID: <1.5.4.32.19971121020612.00b21624@pop.mindspring.com> At 12:53 PM 11/21/97 +1100, Rick Jelliffe wrote: > > >> From: Jonathan Robie > >> The following properties of object models are easily represented in SGML/XML: >> >> o Identity >> o State >> o Type >> >> These properties are not easily represented: >> >> o Behavior (except for in languages that allow methods to be >> represented as data, e.g. Java) >> o Encapsulation constraints > >I think you miss what is perhaps *THE* most important thing that SGML content >models represent: sequence. > >This is one of the essential distinguishing features of SGML. The purpose of my message was to describe what SGML/XML-based interfaces to object systems can represent, not to propose that SGML/XML should have the same inheritance mechanisms as object oriented systems. Whether or not they should, I think it is pretty clear that they don't. Jonathan xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Fri Nov 21 02:20:19 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:01 2004 Subject: Access Languages are Tied to Schemas Message-ID: <199711210215.NAA04666@jawa.chilli.net.au> > From: Mark Baker > On Thu, 20 Nov 1997, Joe Lapp wrote: > > When we evaluate the capabilities of SGML/XML to support object models, > > I think we need to take client behavior into account. The repository is > > acting more like a file system for the state information of objects, and > > the clients are more like applications that use the file system. > > No, I think that's what we should be trying to stay away from. > > XML is self-describing structured storage - for anything you want to shove > in it. Implementation, state, properties, events, behavioural semantics, > whatever. I think this is a good point. XML/SGML is a markup language (it is concerned with the mechanics of constraining, labelling and pointing to user-defined hierarchical information) not a data modeling language. This neutrality is its weakness, in that may will be suboptimal for any specific job, compared to what you might do if you have all the resources and brains to tailor a specific notation and train everyone up in it. However, most people can only learn a small handful of languages, so having a standard markup language frees people's brains to concentrate on the distinguishing specifics of their information, rather than juggling many different notations in their brains. This neutrality also explains why XML's content model system is so simple. SGML has a more complex content model system (inherited inclusions and exclusions, and a "required anywhere" connector "&"), but they have been found in practise to complicate matters more than seems warranted. So I think it is useful to not think of the "poverty" of XML content models, but rather their "modesty" and "neutrality". Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Fri Nov 21 10:37:11 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:59:01 2004 Subject: New release of xslj Message-ID: <3590.199711211036@grogan.cogsci.ed.ac.uk> Version 0.3 of xslj, my XSL-to-DSSSL translator, is now available. This version includes a number of bug fixes (thanks for reports) and much improved HTML output when the CSS/HTML flow objects are used. See http://www.ltg.ed.ac.uk/~ht/xslj.html for information on access, etc. ht ----------- Henry S. Thompson, Human Communication Research Centre, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.cogsci.ed.ac.uk/~ht/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Fri Nov 21 12:39:07 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:01 2004 Subject: Access Languages are Tied to Schemas Message-ID: Jonathan Robie wrote: > In fact, at this point I am not advocating anything concrete, except that I > think there should be some kind of query language that SGML/XML systems can > use to access data in foreign systems like relational or object oriented > databases, and at present, it makes sense to me that such a query language > should be defined in terms of SGML/XML structure. And I think that SGML/XML > is probably powerful enough for that - at least, it is if we are using it > only for retrieval of information, and not for modification of information; > for instance, everything that is stored in an object oriented database can > be stored in SGML - the object ids can be turned into IDs, containers can be > expressed either through containment or sets of IDREFs, etc. As long as > access is read-only, you aren't losing much. Which would make SGML/XML a presentation model (i.e. similar to a reporting view) on more sophisticated information bases. This would inherently be worthwhile if it provided a very understandable model to the user: more understandable than the underlying database. One of the nice things about relational databases is the capability of defining "views" on the data. However simple SQL is compared to (say) C++, very few end-users can do anything more than a simple join. After that things get a bit murky and even if the query produces results the end-user may have no idea (or the wrong idea) of what the answer means[1]. There are many examples of this (see C.J. Date's writing especially). But views and reporting tools (and general UI applications) come to the rescue and provide a simple useful view of the complexity below them. SGML/XML could provide a very sophisticated version of this "reporting" but I think it could be trapped between the ultra-simple HTML and the more sophisticated information models and would rarely be used outside of niches (just use an HTML builder on top of a database). So I would rather see SGML/XML go upward and provide a more accessible interface to "complete" information models than stay in the middle. By going upward it immediately gains the rewards that you mentioned earlier in the week: benefiting from the history/mistakes/knowledge of the database community. Actually, I think in concrete terms I would like to be able to change your suggested OQL from: select e from e in SGMLElement, a in e.attributes, s in e.subElements where e.tagName = "SECT1" and a.tagName = "ID" and s.tagName = "PARA"; to something like: select section from section in Sections children in section.allChildren where section.level > 1 and section.title.beginsWith("MONDO") and children.text.contains("ChiMu") But still use SGML/XML/OML technology and be working from the same original encoding. --Mark mark.fussell@chimu.com [1] Part of the problem is because SQL is flawed compared to relational theory, but it would still be a problem with a better query language. i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Fri Nov 21 15:56:17 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:59:01 2004 Subject: Integrity in the Hands of the Client Message-ID: <3.0.3.32.19971121105652.0095e600@pop.access.digex.net> In this posting I'm going to be a little bold and propose that both the XML and DOM specifications are flawed. The existence of these flaws ride on the assumption that we care to use SGML/XML to create domain models for data where the data evolves over time. I'm also assuming that it is unacceptable for the client objects of a document to maintain the integrity of the document. In order for me to most convincingly convey the point, I need you to bear with me as I explore an example of how we might use XML. I do not directly suggest how to correct the XML specification, but I think I end up implying a few different solutions. However, it seems that the correction to DOM is a bit more straightforward, so I make the obvious suggestions. Suppose we want to create a document that contains information about books and about the authors of those books, and suppose we require that whenever the document has a book, it also has information about the author of the book. The document will reside on a server, and one or more administrators will populate the document from their clients. Other users will be free to browse the document. We need to design the DTD for this document. Here is our first pass: ]> To get a better feel for what we've designed, we create a little sample document: Text goes here. Text goes here. Text goes here. Text goes here. Text goes here. This seems to work. It stores information about books and authors, and it is not possible to add a book without associating it with the description of some author. But we can see that it breaks as soon as we add any other kind of element that has an ID. We know that every book will eventually have an ID, because we'll soon want to have an element whose content elements reference the New York Times Bestsellers. Once we do that, nothing prevents an administrator (or the client program he or she is using) from indicating that the author of a book is another book. This DTD will not suffice. It seems that we might have to use links, but lets look at other approaches first. We entertain the idea that an author's books belong to the content of the author. We quickly throw that one out when we realize that a book can have more than one author. Now we consider having authors belong to the content of a book, but we throw that idea out because authors may author many books. It is possible to put author information in the content of each book, but then we'd be duplicating the lengthy bio and wasting disk space as well as introducing the headache of managing duplicate copies. The same problem arises if we were to duplicate book information under each of the authors of the book, especially since each book has a lengthy book description. So now we ask whether links can do the job. Links allow us to use URLs and XPointers to reference other elements. For the moment, consider trying to accomplish our task using a single DTD, so that all element IDs have the same scope. In this case, the URL of any link references the document that contains the link, so all of our distinguishing information resides in the XPointers. The ID() location term looks useful, but this term cannot constrain the element type of the element that it references. Using ID() as the first locator term would not be sufficient to distinguish between books and authors. Suddenly a brilliant idea comes to mind. We'll use a locator term to specify the element and then follow that with the ID() term to select the idea of the particular element. But this idea has a problem: when the ID() term appears, it must appear as the first locator term. Another idea comes to mind. We could use the following combination of locator terms: CHILD(1,authors)(1,author,id,'A3') Here 'A3' is the identifier of the author. We know that we cannot try to match the author's name, because more than one author may have the same name. ID's are guaranteed to be unique. That seems to work. Something similar could have been accomplished by separating books and authors into different documents and then using the URL portion of the href to specify the document that contains the target element. However, these link solutions all have one problem: nothing in the link specification allows a link element declaration to constrain the kind of resource to which a link links. WD-XML-LINK-970731 indicates that an href is an URL, and that when the URL references another XML document, XPointer locator terms may be appended to the URL. I do not see any mechanism by which a link element can constrain the kind of element that the link references. I have not been able to find a way to have the document server force clients to ensure that whenever they add a book, that book is associated with some author. Clients are given the responsibility of maintaining the integrity of the document. The problem grows more complicated when we also ask that no author exist in the document unless we also have at least one book be associated with the author. A solution to the first problem would not be a sufficient change to specifications in order to guarantee a solution that handles this additional requirement. By having constraints operate in both directions we now require that every change to a document occur within a transaction, so that the document is validated against the DTD only at transaction boundaries. (If every book had to have at least one author and every author had to have at least one book, then when it comes time to add a new book by a new author, the document will not validate against the DTD after we add one and before we add the other.) The example I have given here may seem trivial. Surely we can find a way to live with books that don't have associated author entries and authors that don't have associated book entries. However, in general, constraints between elements will be important. For example, it would not be acceptable to store away an account deduction entry without having an associated account entry or to have an account entry that does not have at least one associated account-owner entry. It seems to me that there are very few domains that can be represented without these kinds of constraints. I think the solution to this problem resides partly in the XML specification and partly in the document access language. A DTD needs to be able to express these kinds of constraints among elements, so that the document server can enforce the constraints. We would then not be relying on the proper behavior of all the clients that wish to add to or modify the document. (Let me know if you need an argument for why clients should not hold this responsibility; I'm assuming we agree on this point.) The access language also needs to reflect the solution because in order for a server to implement constraints, all document update operations must be couched in the language of transactions. That is, every document update operation must be associated with a transaction. The DOM model allows us to manage documents from a client, so long as clients assume part of the responsibility for maintaining object model constraints. However, if we decide that the document server is responsible for maintaining these constraints, then the DOM model as it is currently architected will not suffice, since its document-update operations are not architected around transactions. Moreover, I do not see a way to extend the current DOM design so that it can safely support transactions. One way to correct DOM is redesign it so that it submits query/edit objects to the server, where each query/edit object is submitted via a transaction object. Another way to correct DOM is to add a transaction parameter to all document-update method signatures. I don't think of this latter approach as an extension to DOM, since the corrected DOM would not be backwards-compatible with the current DOM. I think the XML specification as it currently stands is extremely well-suited for describing data that does not change over time, but that it is lacking in specifying how documents are to evolve. -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From paul at arbortext.com Fri Nov 21 16:15:50 1997 From: paul at arbortext.com (Paul Grosso) Date: Mon Jun 7 16:59:01 2004 Subject: Query Languages for XML Message-ID: <97Nov21.111319est.18823@thicket.arbortext.com> At 21:38 1997 11 19 -0500, Lauren Wood wrote: >Derek Denny-Brown wrote: > >% I think there is some real potential for an extension to XSL to allow >% something akin to Microsoft's dHTML (dynamic HTML). I am not sure that it >% should neccessarily be in XSL 1.0, and it could be really hairy if people >% are using XSL-grove interface to the XML and DOM interface to the output. >% I have not quite figured out how to factor in DOM into XSL without making >% things really confusing... > >I'm confused by this. The idea of the DOM is to standardize the object >model part of "dynamic HTML" (whatever that might mean; the definition >seems to change with the application that supports it, the person talking >about it, and probably the phase of the moon as well). So what sort of >extension to XSL do you mean? I also don't understand why the XML >would have an XSL-grove interface, and the "output" (what does >output mean?) would have a DOM interface, when the DOM should >be an interface to an XML document... Not only do I share all of Lauren's confusion, but I'd like to add that all this discussion about extensions to XSL is quite premature. There is no XSL to extend. No one can know what XSL is at this point. If there is any discussion about XSL, it would be more appropriate to be one of requirements and goals, not one about details. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Fri Nov 21 16:42:20 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:59:01 2004 Subject: Access Languages are Tied to Schemas Message-ID: <1.5.4.32.19971121164151.00b2cd9c@pop.mindspring.com> At 04:38 AM 11/21/97 -0800, Mark L. Fussell wrote: >Which would make SGML/XML a presentation model (i.e. similar to a >reporting view) on more sophisticated information bases. This would >inherently be worthwhile if it provided a very understandable model to >the user: more understandable than the underlying database. Precisely. This would be equivalent to defining a "document view" for a database using a DTD. >SGML/XML could provide a very sophisticated version of this "reporting" >but I think it could be trapped between the ultra-simple HTML and the >more sophisticated information models and would rarely be used outside of >niches (just use an HTML builder on top of a database). So I would >rather see SGML/XML go upward and provide a more accessible interface to >"complete" information models than stay in the middle. The ability to define "document views" for external systems is important whether or not anything more sophisticated is done. I'm not sure exactly what you mean by "a more accessible interface to 'complete' information models". Could you spell that out for me? I see a big difference between using SGML/XML to create information models and using SGML/XML to simulate information models that are actually defined in another paradigm. I think it is important to recognize that the object oriented model and the SGML/XML document model are significantly different. SGML/XML can be used as an exchange format or view model for object data, but it is not an object oriented system. Similarly, SGML/XML can be used as an exchange format or view model for other kinds of systems, such as relational databases. Of course, SGML/XML is a data model in its own right. The data defined by XL7, for instance, may be defined in documents, but it is the kind of data traditionally managed in databases, and complex relationships among this data are possible. I guess what I am saying is that (1) documents are not just substitutes for objects in object systems, (2) documents can be used to manage rich data, (3) SGML/XML does not need to be changed into an object oriented system to make this possible, (4) architectural forms allow great flexibility in this kind of system. >Actually, I think in concrete terms I would like to be able to change >your suggested OQL from: > > select e > from e in SGMLElement, > a in e.attributes, > s in e.subElements > where e.tagName = "SECT1" > and a.tagName = "ID" > and s.tagName = "PARA"; > >to something like: > select section > from section in Sections > children in section.allChildren > where section.level > 1 > and section.title.beginsWith("MONDO") > and children.text.contains("ChiMu") > >But still use SGML/XML/OML technology and be working from the same >original encoding. Let me give a little background: the first query is slightly modified from an actual query for an object oriented database that contains SGML data. The only modification that I made was to change some of the names of the classes used to store the data, which is basically like changing the names of the tables in a relational database. My query assumes that there is not a new database type or a new table for each element type, but that the data model for the relational or object oriented database is quite simple, representing elements, their children, and their attributes. In an object oriented database, your query would require that each element type be registered as a separate class in the class dictionary for the database. I think that it will probably be easier to implement queries of the first kind in existing object-oriented and object-relational databases. But I think we are pretty much in agreement that full-text and other text operators would be useful, that boolean operators are important (as well as precedence), that path expressions of some kind are important to allow queries to utilize the structure of SGML containment (and, if possible, references), etc. Jonathan Jonathan Robie jonathan@texcel.no Texcel Research http://www.texcel.no xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Fri Nov 21 16:47:41 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:01 2004 Subject: Query Languages for XML References: <3.0.32.19971120100709.00a8c5b0@mailhost.criinc.com> Message-ID: <3475BBE8.9676A18A@technologist.com> Derek Denny-Brown wrote: > > One of the things that I see as a potential problem is that HTML etc as it > is used now has 2 (as I count them this side of the morning) relatively > distinct uses. > 1) as an alternate form of (relatively) static information. > 2) as a (very-basic) cross-platform (g)ui. > > XSL and DSSSL are focusing rather hard on (1), but not on (2). I'm not sure what you mean by that. XSL as currently proposed has access to all of the form features of HTML, just as it has access to all of the static display features of HTML. It is correct to argue that we are spending more effort on *improving* HTML's static display features than improving its form features, but I think that that is probably appropriate considering the market's interest in better static pages, SGML's particular strengths in that area and Java's suitability for forms. > hmm... so maybe what I am looking for is a "standard" way to extend a XSL > processing/display engine with new flow-object types at run-time. Paul, > was it you who talked about this some months ago? Yes, I looked into this, and will talk about it at SGML/XML 97. I was more interested in compound "heavy weight" flow objects like "title", "section", "table of contents" and so forth. There are some tricky issues with even these simple compound objects and the issues get trickier when you want to talk about new primitives (how do they negotiate real estate? how much information do they need to negotiate properly? what about line breaking?). Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Fri Nov 21 17:01:52 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:01 2004 Subject: Integrity in the Hands of the Client Message-ID: <3.0.32.19971121085250.00bbce94@pop.intergate.bc.ca> At 10:56 AM 21/11/97 -0500, Joe Lapp wrote: >In this posting I'm going to be a little bold and propose that both >the XML and DOM specifications are flawed. Mr. Lapp has discovered one of the well-known shortcomings of SGML, inherited by XML; namely, the typing and constraint mechanisms supplied by DTDs are well-known to be insufficiently rich to allow their use for purposes which we have come to expect of database schemas. More obviously, if, in Mr. Lapp's example, I wanted to give prices for the books, I might want to be able to say that this has to be a number, with 2 digits right of the decimal point. SGML doesn't help you here either. Yes; we need a new and richer form of schema. No boldness is required. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Fri Nov 21 17:07:13 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:01 2004 Subject: Integrity in the Hands of the Client References: <3.0.3.32.19971121105652.0095e600@pop.access.digex.net> Message-ID: <3475C074.BEEB9A03@technologist.com> Joe Lapp wrote: > > Once we do that, nothing prevents an administrator > (or the client program he or she is using) from indicating that the > author of a book is another book. This DTD will not suffice. ... > However, these link solutions all have one problem: nothing in the > link specification allows a link element declaration to constrain > the kind of resource to which a link links. ... Neither SGML nor XML DTDs are meant to, nor will ever be able to express all interesting semantic constraints. SGML/XML cannot even express all interesting *syntactic* constraints (try to make an attribute that allows only valid DOS filenames). The question of what is the right balance of simplicity and constraint expression is an interesting one, and one that should be rethought from time to time. But the inability to express a *particular* constraint is not evidence that the language is fundamentally flawed. The only language that could express all interesting contraints would be a Turing-complete one. I've toyed with the idea of a DSSSL subset (DSSSL-Check?) that would return a list of error messages, or the empty list of the document was conforming. The DTD would express simpler constraints and the DSSSL-Check Spec would express the more complex ones. In a graphical editor, the DTD constraints would probably checked in real-time and the DSSSL-Check constraints would be checked periodically (since they could conceivably be quite slow). RDF may be a useful system in-between these two extremes. It is more concerned with semantics (and probably less with syntax) than SGML, but is not Turing complete. > owever, in > general, constraints between elements will be important. For > example, it would not be acceptable to store away an account > deduction entry without having an associated account entry or to > have an account entry that does not have at least one associated > account-owner entry. It seems to me that there are very few domains > that can be represented without these kinds of constraints. It is worth noting that SQL does not provide a complete system for expressing all interesting constraints in relational databases. That's why "business logic" often resides in proprietary stored procedures or on completely separate application servers. > The access > language also needs to reflect the solution because in order for > a server to implement constraints, all document update operations > must be couched in the language of transactions. That is, every > document update operation must be associated with a transaction. Please explain this point. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From markb at iosphere.net Fri Nov 21 17:12:55 1997 From: markb at iosphere.net (Mark Baker) Date: Mon Jun 7 16:59:01 2004 Subject: Integrity in the Hands of the Client In-Reply-To: <3.0.3.32.19971121105652.0095e600@pop.access.digex.net> Message-ID: On Fri, 21 Nov 1997, Joe Lapp wrote: > In this posting I'm going to be a little bold and propose that both > the XML and DOM specifications are flawed. Bold's good. I like bold. But I'm going to be just as bold and suggest that it is your use of XML/DOM that is giving you problems, not the specs themselves. >The existence of these > flaws ride on the assumption that we care to use SGML/XML to create > domain models for data where the data evolves over time. Okay, so let's investigate how XML (and a couple words on DOM) are, IMO, just fine for this. > I'm also > assuming that it is unacceptable for the client objects of a document > to maintain the integrity of the document. Amen. Once you've done encapsulation and data-hiding, there's no going back. > Suppose we want to create a document that contains information about > books and about the authors of those books, and suppose we require > that whenever the document has a book, it also has information about > the author of the book. The document will reside on a server, and > one or more administrators will populate the document from their > clients. Other users will be free to browse the document. > > We need to design the DTD for this document. Here is our first pass: Ok, let me stop you right there. A DTD is a fixed statement of structure. If you use one, you better be darned sure that that structure isn't going to change anytime soon. As we see from your example, you were struggling to define that structure (as anybody would have given the same task). So, what to do? Go finer-grained. Ask yourself what doesn't change over time. In this example, you know that you have books and authors. So why not give each of those their own document type? Furthermore, the relationship itself between a book and an author might also be treated as a document type. Sound too funky? Consider that that's exactly what is done in loosely coupled structural OO work, or before that, first-normal-form entity/relationship schemas. CORBA has the Relationship service for just this kind of functionality for objects. Objects can create, destroy, type, and navigate directed relationships at runtime. Maybe for this example, it's a bit heavy-weight. I'm not sure. But with just an author DTD, a book DTD, and XML-Links, you could get the same job done - perhaps not quite as flexibly (since dependancies are introduced within the documents themselves), but just as functionally capable. BTW, this is the same reason that a stream of serialized-to-XML Java objects won't have a DTD. The structure of a set of objects is only guaranteed to be known at runtime. But these streams will still be well-formed. > I have not been able to find a way to have the document server force > clients to ensure that whenever they add a book, that book is > associated with some author. Clients are given the responsibility > of maintaining the integrity of the document. The OMG's OMA has a place holder for a "Rules Facility" that does exactly this. It allows arbitrary rules (including structural) to be hung off the ORB as objects/documents, and the ORB is responsible for enforcing these rules. See, for example; http://www.jeffsutherland.org/oopsla97/rouvellou.html > The DOM model allows us to manage documents from a client, so long > as clients assume part of the responsibility for maintaining object > model constraints. That depends who the 'client' is. If it's a traditional application, then yes, that's bad. But it might be something on another "level" (hopefully you'll understand what I mean by that by these examples), such as a Rules Facility or Persistence service, in which case it's ok - because their job is to maintain the internal integrity of the object. >However, if we decide that the document server > is responsible for maintaining these constraints, then the DOM > model as it is currently architected will not suffice, since its > document-update operations are not architected around transactions. I don't see the need for two reasons. First, I would never use DOM (or any other mechanism) to try and break the encapsulation of my documents. Second, as I stated in my last message, transactions are an overrated means of reasoning about distributed systems. They try and make distributed processing look like local processing, when we now know how impractical that view is. MB -- Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069 Will distribute business objects for food. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From howardk at paradigmdev.com Fri Nov 21 17:18:51 1997 From: howardk at paradigmdev.com (Howard Katz) Date: Mon Jun 7 16:59:02 2004 Subject: Integrity in the Hands of the Client Message-ID: <57B675B21506D1118BAB0060081C295D168029@VSERVER> Mark, would you mind expanding just a bit on the following paragraph? I'm not seeing what your point is: BTW, this is the same reason that a stream of serialized-to-XML Java objects won't have a DTD. The structure of a set of objects is only guaranteed to be known at runtime. But these streams will still be well-formed. Thanks, Howard Katz > -----Original Message----- > From: Mark Baker [SMTP:markb@iosphere.net] > Sent: Friday, November 21, 1997 9:08 AM > To: Joe Lapp > Cc: xml-dev@ic.ac.uk > Subject: Re: Integrity in the Hands of the Client > > On Fri, 21 Nov 1997, Joe Lapp wrote: > > In this posting I'm going to be a little bold and propose that both > > the XML and DOM specifications are flawed. > > Bold's good. I like bold. > > But I'm going to be just as bold and suggest that it is your use of > XML/DOM that is giving you problems, not the specs themselves. > > >The existence of these > > flaws ride on the assumption that we care to use SGML/XML to create > > domain models for data where the data evolves over time. > > Okay, so let's investigate how XML (and a couple words on DOM) are, > IMO, > just fine for this. > > > I'm also > > assuming that it is unacceptable for the client objects of a > document > > to maintain the integrity of the document. > > Amen. Once you've done encapsulation and data-hiding, there's no > going back. > > > Suppose we want to create a document that contains information about > > books and about the authors of those books, and suppose we require > > that whenever the document has a book, it also has information about > > the author of the book. The document will reside on a server, and > > one or more administrators will populate the document from their > > clients. Other users will be free to browse the document. > > > > We need to design the DTD for this document. Here is our first > pass: > > Ok, let me stop you right there. > > A DTD is a fixed statement of structure. If you use one, you better > be > darned sure that that structure isn't going to change anytime soon. > As > we see from your example, you were struggling to define that structure > > (as anybody would have given the same task). > > So, what to do? > > Go finer-grained. Ask yourself what doesn't change over time. In > this > example, you know that you have books and authors. So why not give > each > of those their own document type? > > Furthermore, the relationship itself between a book and an author > might > also be treated as a document type. > > Sound too funky? Consider that that's exactly what is done in > loosely coupled structural OO work, or before that, first-normal-form > entity/relationship schemas. > > CORBA has the Relationship service for just this kind of functionality > > for objects. Objects can create, destroy, type, and navigate directed > > relationships at runtime. > > Maybe for this example, it's a bit heavy-weight. I'm not sure. But > with just an author DTD, a book DTD, and XML-Links, you could get the > same job done - perhaps not quite as flexibly (since dependancies are > introduced within the documents themselves), but just as functionally > capable. > > BTW, this is the same reason that a stream of serialized-to-XML Java > objects won't have a DTD. The structure of a set of objects is only > guaranteed to be known at runtime. But these streams will still be > well-formed. > > > I have not been able to find a way to have the document server force > > clients to ensure that whenever they add a book, that book is > > associated with some author. Clients are given the responsibility > > of maintaining the integrity of the document. > > The OMG's OMA has a place holder for a "Rules Facility" that does > exactly > this. It allows arbitrary rules (including structural) to be hung off > > the ORB as objects/documents, and the ORB is responsible for enforcing > these > rules. > > See, for example; > > http://www.jeffsutherland.org/oopsla97/rouvellou.html > > > The DOM model allows us to manage documents from a client, so long > > as clients assume part of the responsibility for maintaining object > > model constraints. > > That depends who the 'client' is. If it's a traditional application, > then yes, that's bad. But it might be something on another "level" > (hopefully you'll understand what I mean by that by these examples), > such as a Rules Facility or Persistence service, in which case it's ok > - > because their job is to maintain the internal integrity of the object. > > >However, if we decide that the document server > > is responsible for maintaining these constraints, then the DOM > > model as it is currently architected will not suffice, since its > > document-update operations are not architected around transactions. > > I don't see the need for two reasons. First, I would never use DOM > (or any other mechanism) to try and break the encapsulation of my > documents. Second, as I stated in my last message, transactions are > an > overrated means of reasoning about distributed systems. They try and > make distributed processing look like local processing, when we now > know > how impractical that view is. > > MB > -- > Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, > Beans > http://www.iosphere.net/~markb distobj@acm.org > ICQ:5100069 > > Will distribute business objects for food. > > xml-dev: A list for W3C XML Developers. To post, > mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ser at javalab.uoregon.edu Fri Nov 21 17:42:18 1997 From: ser at javalab.uoregon.edu (Sean Russell) Date: Mon Jun 7 16:59:02 2004 Subject: XML on the web: docproc 2 Message-ID: <3475C95A.23524B2@javalab.uoregon.edu> Hiho, I'm just now getting around to announcing docproc 2, an XML + XSL document processor. This is a beta release, and I welcome feedback. docproc is currently installed on javalab.uoregon.edu and is functioning as a Servlet. The URL for the docproc documentation and distribution site is: http://javalab.uoregon.edu/ser/software/docproc_2/docs/index.xml There are several pages on Javalab which have been XML-ized, as test cases. I have spent most of my time working on the docproc package, and the style sheets for most of these pages are not particularly clever. The "document" style sheet is, however, rather complex, and it is this stylesheet which the docproc documentation page uses. One test page URL, which will lead you to other test pages, is: http://javalab.uoregon.edu/vlab/select.xml To retrieve and view the XML source of any given XML page on javalab, replace "javalab" in the URL with "jersey." Jersey is running Apache, without docproc, and has NFS access to the same documents as Javalab. Please be aware that Javalab is a testbed, and that you may experience delays or periods of downtime. In particular, the JavaWebServer on Javalab has been having problems processing delivering non-XML documents. This has nothing to do with docproc. Thank you, and again, please send me your feedback. --- SER xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Fri Nov 21 18:06:15 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:02 2004 Subject: Inheritance (was: Access Languages are Tied to Schemas) References: <199711210155.MAA03460@jawa.chilli.net.au> Message-ID: <3475CE52.2321C593@technologist.com> Rick Jelliffe wrote: > The best idea I have some up with is the following: to allow a new keyword > #OTHER (or #ANY) > to be allowed in content models, to represent any one unambiguous element type. > This allows the creator of the original content model the ability to > declare points in content models which are publically available for extension > by derived element types (declared or undeclared). > > I currently think that any inheritance-based declaration system must presuppose > such explicit inheritance points. I think it is merely a matter of strong typing > and interface control. Strong typing and interface control are issues of subclassing, not inheritance. Inheritance is just a code reuse mechanism. Unlike subclassing, it does not allow more expressive DTDs to be created (which is, presuamably, what you are talking about). I think that we must keep these ideas separate in our mind if we are to make progress on either front. Their conflation is, (IMO) just a historical mistake driven by early compiler limitations and performance considerations that do not apply to SGML. Both concepts are useful in SGML, but they should be separate, just as they are in most modern OO programming languages (C++, Java, CLOS, Python, etc.), even those which conflate them in the syntax. I described the difference in: http://www.lists.ic.ac.uk/hypermail/xml-dev/9710/0077.html Anyhow, you can emulate OTHER using subclassing without a first class OTHER construct. Now URLs can go in CITATONS after the date. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From markb at iosphere.net Fri Nov 21 18:38:43 1997 From: markb at iosphere.net (Mark Baker) Date: Mon Jun 7 16:59:02 2004 Subject: Integrity in the Hands of the Client In-Reply-To: <57B675B21506D1118BAB0060081C295D168029@VSERVER> Message-ID: On Fri, 21 Nov 1997, Howard Katz wrote: > Mark, would you mind expanding just a bit on the following paragraph? Of course not. > I'm not seeing what your point is: > > BTW, this is the same reason that a stream of serialized-to-XML > Java > objects won't have a DTD. The structure of a set of objects is > only > guaranteed to be known at runtime. But these streams will still > be > well-formed. Picture a container Bean (i.e. the GlasgowSpec - a BeanContext). When you design that container, you only know that it will hold other Beans - not necessarily which other Beans. Your container may publish services for use by contained Beans. It might, and likely will, contain Beans that were developed after it was developed. Some of those Beans might also be containers. Now, imagine serializing that container at runtime. Can you tell me its structure *now* (I mean *right* now, as you're reading this - aka design time)? If not, then you can't use a DTD. The stream itself will be responsible for describing the structure implicitly, not some separate static DTD. Isn't this what well-formed XML documents were meant to address? That you could still create self-describing documents even when you didn't know the structure a priori? Based on some of the discussions I've read on the list archives, I do get the impression that this capability of XML isn't being used to its fullest potential. > Thanks, My pleasure. MB -- Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069 Will distribute business objects for food. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Sat Nov 22 06:32:46 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:02 2004 Subject: Integrity in the Hands of the Client Message-ID: <199711220630.RAA28483@jawa.chilli.net.au> > From: Joe Lapp > This seems to work. It stores information about books and authors, > and it is not possible to add a book without associating it with > the description of some author. But we can see that it breaks as > soon as we add any other kind of element that has an ID. We know > that every book will eventually have an ID, because we'll soon want > to have an element whose content elements reference the New York > Times Bestsellers. Once we do that, nothing prevents an administrator > (or the client program he or she is using) from indicating that the > author of a book is another book. This DTD will not suffice. The SGML standard explictly says that the SGML markup declarations only form part of the definition of a document type. So you are being no more bold than the SGML standard. (The contraction DTD is actually the "Document Type Definition" not the "Document Type Declarations" by the way, as further evidence of this distinction.) People expect XML/ SGML to provide a way to do everything, then get surprised that it doesnt. It does not intend to. It is not a format for modeling data; it is a language for marking up data with enough information that your clever programs can make use of it. XML/SGML's validation only extends to very simple content models and to making sure that IDs are unique, just for this purpose. The problem you describe above is very simply dealt with. Make an "application requirement" that all IDs for books start with one prefix, and that all IDs for authors start with another. This is very common practise in the industry. You can write simple external validating code to enforce it, and it only requires a single line of plain English to document it. It is almost universal practise among experienced DTD writers to specify unique prefixes for IDs of different types. I recommend it to anyone writing XML systems. The simplest way is to just use a contracted form of the element type name (or the current element or its distinguishing container) as the prefix. There is an ISO standard way (part of the SGML Extended Facilities of HyTime'97 which is on the WWW) to mark this up. The Lexical Definition annex lets you give (in one fixed attribute) a POSIX regular expression to constain the format of another attribute. So you can specify that IDs and IDREFs have a common prefix, for particular element types. (Of course, your software then needs to implement this standard to be able to use the information, but that is no different from any other markup.) It is just false that SGML (the family of technologies: ISO 8879, ISO 10774, ISO 9070, etc) does not provide a way to use regular expressions (or any other syntax you choose) to provide models for data. The lexical typing facilities have been on the books for 5(?) years now, and have just been overhauled in HyTime '97 standard. However, because SGML systems do not have to provide it to be conforming, few have, as part of their standard configuration, so far. XML has taken exactly the same road as SGML and left more useful data validation to the application to take care of. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Sat Nov 22 09:06:36 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:02 2004 Subject: Recipes for Information Message-ID: This is somewhat related to the recent threads on Integrity and Inheritance. It is again a bit long so it will be duplicated at MONDO (www.chimu.com/projects/mondo). ======== I suggest that SGML/XML be perceived as a markup language to describe how to build information instead of describing (and modeling) the information itself. This may appear to be a subtle distinction but it has a lot of implications. I will start with a recent concrete example from Rick Jelliffe : This says a citation is composed of (through its content) a title, text, and url. But do not view that as the information model of a citation; consider it a recipe for a citation. We can build a citation if we supply the three (named) ingredients: title, text, and url. The detail of the resulting information (which I will call an object) is unknown. It is likely that the citation object will have these three attributes, but it could have more or it could even discard some of them (in which case the recipe included information that the model did not need). If we have a different element that requires more information we could have a different recipe: The object that results from this recipe might be the same type as a citation object, a subtype of the citation object (i.e. treatable as a citation object but has more capabilities), or even an unrelated type of object. For the moment we will abstain on discussing anything about the objects resulting from the DetailedCitation and the Citation recipes [why I started capitalizing will be explained later too]. What about combining the two recipes into a single element? We could combine them as: This would be ambiguous (in SGML terms) for the first two but all of them are bad recipes. They are bad because we (or the computer) must look at all the content to know which version we are using. This is analogous to reading a whole recipe before we can be sure what we are trying to make. It would be better to more clearly separate the options from the requirements if you choose that option. Our original version separated the recipes through the elements: We could also do this with: or: In these forms it is explicit what we are trying to build (or at least the complexity is dramatically reduced). We do not have to look into the details of the information itself. RECIPES ======= Now I will ask for a leap of faith. Consider separating ELEMENTs between Recipes that build objects and Parameters that name the ingredients that are required for a particular recipe. As an architectural-form it would look like this: Although in the content model parameters are sequential, their order is insignificant semantically. Each parameter must have a unique name, so consider them to be and-ed together instead of seq-ed. Sort of like: or like required element attributes. As a convention I will capitalize the Recipes and keep parameters in lowercase. Now returning to our example, to build a Citation required three parameters: The original ordering of the parameters is irrelevant to the informational content because each parameter is uniquely named, it is only a presentation/encoding restriction to have them be sequential. Also, the parameters do not describe the Types of the ingredients, just the Role of them in building the recipe. All of 'title', 'text', and 'url' could be simple strings: Or any of them could have a more complex type. By separating the two types of elements we can Be very explicit about what we are constructing Have a great deal of flexibility for reuse of elements Use very simple content models that produce complex structures Note that although the '&' is considered complex to implement, this particular use of it has the same form as attributes: Parameters are unordered and possibly required. Shortcuts --------- You might have noticed that String cheats: a String does not follow the required Recipe pattern of having only parameters in content. This is a convenience shortcut Recipe [OK, and an insanity prevention device], which makes putting strings of text into this format more easily. Similarly we will probably need to have a shortcut for Lists (sequences) of objects: With these additions we have to modify our original description of the architectural-form of Recipes to: Recipes, DTDs, and DomainModels ------------------------------- Each Recipe builds an object. What is the type of this object and how does it relate to the ELEMENT content model? I propose (and agree with others proposing) that there should be no required connection between the rules of a recipe (the DTD) and the rules of the DomainModel objects built from that recipe. Objects can have far more complex relationship rules than DTDs can describe and the DTD will either over-constrain or under-constrain the built objects. Instead consider the DTD as similar to a UI Form. You may want to place things in a particular order and group them together: Person FirstName LastName SSN Children FirstName LastName But this is a presentation of the (view independent) information model that has a person with several attributes and associations in no particular order (even children do not need to be explicitly ordered for orderings can be derived from [for example] the child's birthdate). The UI/DTD can place constraints (like a SSN has a 123-45-6789 format) but it should be very careful about these constraints (what about 99- SSNs) or really delegate the responsibility of validation to the DomainModel. But simplified views are still useful. DTDs can still be used to produce an information model but it should be possible to unlink the information model and have it start a more robust life of its own (or the dependency reversed). The Recipes should still be useful because they encode the knowledge required to build the information independently of how precisely or extensively it is modeled (up to a point). The recipes can live on as the model grows. And, in a strange circularity, information models are also (obviously) information so they can again be encoded as recipes in SGML/XML and used as metadata for the domain model. So although DTDs are not good information models, there is nothing stopping SGML/XML from being a good encoding for good information models. --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Sat Nov 22 09:34:02 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:02 2004 Subject: Sequence Access Languages ... Message-ID: Rick Jelliffe wrote: > I think you miss what is perhaps *THE* most important thing that SGML > content models represent: sequence. > This is one of the essential distinguishing features of SGML. > If I have > >

Refer also to > > XML draft > at > http://www.w3c.org/TR > for more info.

> > then the sequence of elements and data in to citation element > are vitally critical. Sequence is not an artifact of formatting, > in many cases, but as intrinsic to the data as encapsulation > and so on. [SNIP to possible Content Model] > I think your example shows the opposite. There is no information change between any of the orderings within the citation: <text><url> vs. <title><url><text> vs. <url><title><text> etc. You may consider the desired presentation and encoding order to be only the first but that would be a view onto the information and not a property of the information itself. You could alternatively define an attribute that says citations look good in English in that particular order. Or maybe the 'at' should be derived and the content model is simply: <!ELEMENT citation ( title & url)> This works well with your next example too: > <!ELEMENT citation ( title, text, name, text, url )> becomes: <!ELEMENT citation ( title & editor? & url )> Depending on whether the editor is included or not, different text would be generated at presentation. The generated text could still be encoded in SGML but as separate information: <CitationPresentationInfo> <urlPrefix>at</urlPrefix> <editorPrefix>edited by</editorPrefix> </CitationPresentationInfo> I am not saying sequence is unimportant, but I think SGML is overly focused on it (from an IM perspective) because it comes from a paper/linear background. Information is rarely linear: it is only time that is, which has caused some media [and the humans who use them] to be (mostly) linear also. It can be difficult to break that linear assumption when it doesn't apply if your tools keep reinforcing it. --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Nov 22 09:50:25 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:02 2004 Subject: Integrity in the Hands of the Client In-Reply-To: <199711220630.RAA28483@jawa.chilli.net.au> Message-ID: <3.0.1.16.19971122104401.2a87a84a@pop3.demon.co.uk> At 17:26 22/11/97 +1100, Rick Jelliffe wrote: [...] >It is just false that SGML (the family of technologies: ISO 8879, ISO 10774, >ISO 9070, etc) does not provide a way to use regular expressions (or any >other syntax you choose) to provide models for data. The lexical typing >facilities have been on the books for 5(?) years now, and have just been >overhauled in HyTime '97 standard. However, because SGML systems do not >have to provide it to be conforming, few have, as part of their standard configuration, so far. XML has taken exactly the same >road as SGML >and left more useful data validation to the application to take care of. We are at a very exciting, but critical, time in the development of XML and I am very heartened by the quality and amount of debate on this list. I sense that there is a steady influx of people who have had little or no exposure to 'traditional' SGML and are discovering its power and limitations in an empirical manner :-) [If so, I have particular empathy, as I come from outside the SGML community and have never created an SGML document for 'production' purposes.] XML will be used by vastly more people that current practise SGML. That is both liberating and a cause for concern. It's certainly likely that useful methods already developed in SGML will often not be used simply because people don't know about them. Similarly there are often standards in other disciplines which map directly onto XML problems. Where possible they should be used. In many cases the XML specs (including XLL and XSL) deliberately do not say how something should be done - only what syntax should be used. The WG has (often rightly) taken the view that it should not prescribe ways of doing things. But we are not at - or very near to - the time when people will start doing things and there is a danger that we shall end up with serious inconsistencies. For example, when Britain first invented and developed railways there were two gauges (4' 8.5", and 8') and Baker Street station in London had both. Australia had (?5) and I gather is only now rationalising them (Rick?). As an example, if we use DATEs in XML I think we need a good reason not to use ISO 8601. It is clear that there is overwhelming demand for some datatyping in XML. For example, I am now extending JUMBO as an authoring tool and I want to be able to control the type and validity of both attributes values and PCDATA content. Obviously I can invent my own rules, but I'd prefer to use something that other people have already agreed on. I can't do this in a DTD, but I think I *can* do it consistently with (and in the spirit of) SGML. [Very simply - I'll expand later - I am developing a per-element 'schema' in XML syntax which encapsulates the DTD approach and enhances it. As is my spirit, I'm keeping it simple - not adding the complexities of inheritance as in the XML-data approach.] At present my datatypes are: STRING INTEGER FLOAT (or synonym) DATE URL MIMETYPE and I'd value comments. [Any new items need code to be written, so they don't come free :-)] This almost inevitably leads on to data validation and I'd like to know what syntax people already have for expressing this. Obviously it would be nice for it to be XML-compatible. P. I have had some positive feedback on the idea of XDEV and I shall try to reformulate my ideas. It's very clear that we need a way of discussing the 'land beyond syntax'. I liked the phrase 'when ontologies collide' which I saw recently (I think from a pointer from Robin Cover's page) and this seems to me an area where XML-DEV can play an important role. At least we may be able to identify the ontologies :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Sat Nov 22 10:37:54 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:02 2004 Subject: Inheritance Message-ID: <Pine.SOL.3.91.971122022916.17881E-100000@alumnae> Paul, No argument with your posting (I decided not to post a similar statement after rereading yours), but could you change your terms slightly? Although the OO terms themselves were definitely conflated during the eighties they have by now settled down to: Type: The declaration of the interface of any set of [objects] that conforms to this common protocol. Any set of objects or values with similar behavior... [Firesmith+E 95] Class: A class is the realization of a type. [UML] The idea of class is closely linked...with the description of implementation details of software objects [Cook+D 94]. Type vs. Class: Types classify objects according to a common interface; classes classify objects according to a common implementation. [Firesmith+E 95] Subtyping: The incremental definition of a new type in terms of one or more existing types, whereby the subtype conforms to all of its supertypes [an is-kind-of relationship] [Firesmith+E 95] And subclassing implies implementation-inheritance (i.e. code reuse), exactly what you were trying to avoid implying. So I would suggest rewriting your example to: > > Anyhow, you can emulate OTHER using subtyping without an explicit > > OTHER construct. > > > > <!ELEMENT CITATION (name, author, date, OTHER-CIT* )> > > <!ELEMTYPE OTHER-CIT> <!-- no constraints on subtypes --> > > <!ELEMTYPE ANOTHER-TYPE ISA ANY> <!-- Be explicit about the automatic root --> > > > > <!ELEMENT URL (#PCDATA) ISA (OTHER-CIT & ANOTHER-TYPE)> Which makes it use the standard terminology. So, ELEMENTs would be the leaves of a tree/digraph of Types with ANY as the root. Note that ISA should formally be IS-A-KIND-OF but that is an annoyingly long keyword. (My dog is-a Dog which is-a-kind-of Mammal vs: My dog is-a Dog which is-a Mammal). --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Nov 22 12:15:56 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:02 2004 Subject: XDEV proposals (was Re: Recipes for Information) In-Reply-To: <Pine.SOL.3.91.971122004509.17881A-100000@alumnae> Message-ID: <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk> At 01:06 22/11/97 -0800, Mark L. Fussell wrote: > >This is somewhat related to the recent threads on Integrity and >Inheritance. It is again a bit long so it will be duplicated at MONDO >(www.chimu.com/projects/mondo). Thanks Mark - extremely valuable. [... long insightful and stimulating discussion snipped ...] I think I understand the wishes of Mark and an increasing number of XML-DEVers and hope the following is useful... I come from a non-SGML background and have discovered the formal limitations of SGML/XML in what I want to do. [The DTD doesn't map onto my problems, no datatyping, no easy extensibility through inheritance, etc.] As Mark says, the background of XML is paper-based (and this is reflected in XSL which is essentially paper-based with very small concessions to paper-like screen display). Nevertheless the DTD-based approach is extremely powerful in the right cases and in the right hands. The problems I address have the following generic operations. - I want to author XML. Ideally this should be human- and machine-readable. I want this process to be controlled by software/data to make it both flexible and rigorous. [This is tough, but I'm starting to address it in JUMBO. Practical help will be appreciated :-)]. - I wish to be able to re-use other people's information objects. This is almost certainly going to break any DTD, but it is implicit in most of the current W3C activity. (RDF, MathML, XSL may have some sort of DTDs, but they will probably be used as components of larger documents, which cannot have DTDs) - I wish to be able to manage distributed and multicomponent objects. I think XML and related disciplines will solve this very well and excitingly. - I want to be able to validate XML 'objects'. XML can do this syntactically, but not semantically. For this I need additional 'recipes' and code - I want to be able to transform XML objects into other XML objects. XSL is tantalisingly close to being able to do this but I believe - at present - that W3C XML-transformation activity is 'undefined'. - I want to be able to send XML objects to other people *with* a prior contract as how these are to be used. XML can partially solve this at present using DTDs, controlled prose and vocabularies and *bespoke applications* (i.e. a different application for each DTD.) This is as far as X*L goes. Much of the X*L prose stresses that particular activity is left to the *application*. This means that XML documents often need to be authored, knowing what application is going to be used to process them. This is, presumably, the way that CDF is designed - you have to have a 'CDF processor'. However it does not support *generic* applications (or even generic components of applications). - I wish to be able to send hypermedia. XLL specifically declines to add any semantics to the syntax, other than an (implied) HTML-like behaviour for some of the SIMPLE links. - I wish to send objects to other people who will print them out and read them. XSL solves this. - I wish to be able to send XML objects to people who I don't know exist, have never heard of me or my domain. [Example, a supermarket may need to hyperlink to molecular information in labelling its food products.] They need to access my semantics in (a) human-readable and (b) machine-readable form. For this a *generic* XML processor (or processing component) is required. This *is* achievable (through XSL) if the processing activity consists of producing 2D human-readable objects. I, and I suspect many others, want to be able to create generic XML applications. [JUMBO is a *generic* XML application - it can process any XML document. The degree of added value depends on the components made available by the document's author or domain.] Most of these issues are not being addressed, and probably will not be addressed by the current XML activity. [Not a criticism - they are doing a fantastic job. Their time is taken with deciding on precise syntax, procedures, meaning of components in XML documents, etc. More difficult than I think a lot of people realise.] This is where XML-DEV has a role to play. Not formally - this list has no standing other than the high quality of its postings. Since many of these areas will give rise to 'colliding ontologies' (i.e. strongly held views on how to do things and what things mean) there are no single solutions. However, if we treat this in the spirit of a biological system, 'fit' solutions should arise. To be 'fit' a solution must: - reproduce readily. IOW it must be relatively easy to understand what it's about. Simplicity is very valuable here :-) - be useful. - have a modest degree of flexibility. Too much variation kills off complex organisms. - be aware of its environment. If it's competing in a niche which is already filled, it will have a hard time. i.e. if you haven't looked in other disciplines, you will probably reinvent something. [The biological metaphor isn't worth elaborating :-)] My hope, therefore, is that we can identify and systematise certain areas which are useful to a group of people. [There may be multiple and incompatible solutions - so long as they are identifiable that need not be a problem.] Among the *simple* ideas that might be tractable as XDEV proposals are: - parser APIs and the generic behaviour of applications. Whatever happened to Xapi-J? - datatyping - re-usable elements, probably with machine-readable schemas. - transformation language (this *might* spare us from my Monty Python proposal :-). - behaviour for XLL-based applications If you do take these ideas up, please use simple subject lines :-) P. > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sat Nov 22 13:44:05 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:02 2004 Subject: Is XDEV useful? (was re: XDEV proposals) In-Reply-To: <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk> References: <Pine.SOL.3.91.971122004509.17881A-100000@alumnae> <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk> Message-ID: <199711221344.IAA00417@unready.microstar.com> Peter Murray-Rust writes: > The problems I address have the following generic operations. > - I want to author XML. Ideally this should be human- and > machine-readable. I want this process to be controlled by > software/data to make it both flexible and rigorous. [This is > tough, but I'm starting to address it in JUMBO. Practical help will > be appreciated :-)]. I think that the latest version of Adept supports XML editing, and I announced some patches to PSGML a couple of months ago. > - I wish to be able to re-use other people's information > objects. This is almost certainly going to break any DTD, but it is > implicit in most of the current W3C activity. (RDF, MathML, XSL may > have some sort of DTDs, but they will probably be used as > components of larger documents, which cannot have DTDs) Actually, this turns out not to be the case -- this is actually very simple with XML in its current form, if you use XML as a data content notation. In the internal DTD subset: <!ENTITY myrdf SYSTEM "myrdf.xml" NDATA xml> <!ENTITY mymathml SYSTEM "mymathml.xml" NDATA xml> <!ENTITY myxsl SYSTEM "myxsl.xml" NDATA xml> In the external or internal DTD subset: <!NOTATION xml PUBLIC "-//W3C//NOTATION eXtensible Markup Language//EN" SYSTEM "http://www.w3.org/XML/"> <!ELEMENT externalDoc EMPTY> <!ATTLIST externalDoc doc ENTITY #REQUIRED> In the document instance: <para>Here is a a reusable RDF object:</para> <externalDoc doc="myrdf"> <para>Here is a reusable MathML object:</para> <externalDoc doc="mymathml"> <para>Here is a reusable XSL object:</para> <externalDoc doc="myxsl"> Whenever your processing software finds an external data entity with the XML notation, it can simply call the parser recursively. You could also take an HTML-like approach (especially in a DTD-less document), and simply do something like <include src="myrdf.xml"> <include src="mymathml.xml"> <include src="myxsl.xml"> Again, just have your processing software call your parser recursively. > - I wish to be able to manage distributed and multicomponent > objects. I think XML and related disciplines will solve this very > well and excitingly. Exactly -- this is where the entity structure of full SGML and XML are a big win. > - I want to be able to validate XML 'objects'. XML can do this > syntactically, but not semantically. For this I need additional > 'recipes' and code And you always will, no matter how XDEV is designed. I've implemented SQL-based data management systems, and SQL's type checking is _never_ enough (or even close). Certainly we could modify XML so that parsers could perform validations like - the contents of this element must be a number - the contents of this element must not be empty but we'd just make the parsers bigger and wouldn't help much anyway. After all, in real-world applications you always need to perform validations along these lines: - the contents of the element must be the name of an American city with a population over 500,000 - the contents of the element must be a name mentioned in a list in a different XML document - the contents of the element must be a valid Internet domain name I think that XML and SGML were smarter to leave all of this to the application-specific processing software in the first place. > - I want to be able to transform XML objects into other XML > objects. XSL is tantalisingly close to being able to do this but I > believe - at present - that W3C XML-transformation activity is > 'undefined'. Architectural forms will bring you part-way there. For one proposal, see http://home.sprynet.com/sprynet/dmeggins/xml-arch.html > - I want to be able to send XML objects to other people *with* a > prior contract as how these are to be used. XML can partially solve > this at present using DTDs, controlled prose and vocabularies and > *bespoke applications* (i.e. a different application for each DTD.) > This is as far as X*L goes. Much of the X*L prose stresses that > particular activity is left to the *application*. This means that > XML documents often need to be authored, knowing what application > is going to be used to process them. This is, presumably, the way > that CDF is designed - you have to have a 'CDF processor'. However > it does not support *generic* applications (or even generic > components of applications). As, I think, Paul Prescod has noted, nothing but a Turing-complete language could do this. XML is a method for creating applications -- it is not an application itself, and each application will need its own conventions, etc. > - I wish to be able to send hypermedia. XLL specifically declines to add > any semantics to the syntax, other than an (implied) HTML-like behaviour > for some of the SIMPLE links. Are notations not suitable for specifying this information? > - I wish to send objects to other people who will print them out > and read them. XSL solves this. Yes, it may. I wonder if document-viewing will end up being a major XML application, when most of the effort right now seems to be going into transactions and meta-data. > - I wish to be able to send XML objects to people who I don't know exist, > have never heard of me or my domain. [Example, a supermarket may need to > hyperlink to molecular information in labelling its food products.] They > need to access my semantics in (a) human-readable and (b) machine-readable > form. For this a *generic* XML processor (or processing component) is > required. This *is* achievable (through XSL) if the processing activity > consists of producing 2D human-readable objects. I, and I suspect many > others, want to be able to create generic XML applications. [JUMBO is a > *generic* XML application - it can process any XML document. The degree of > added value depends on the components made available by the document's > author or domain.] Here, again, architectural forms will help. As long as you use a DTD, and the DTD implements a "food information" base architecture, the supermarket will be able to incorporate your molecular information automatically. > Most of these issues are not being addressed, and probably will not be > addressed by the current XML activity. [Not a criticism - they are doing a > fantastic job. Their time is taken with deciding on precise syntax, > procedures, meaning of components in XML documents, etc. More difficult > than I think a lot of people realise.] I agree. > This is where XML-DEV has a role to play. Not formally - this list has no > standing other than the high quality of its postings. Since many of these > areas will give rise to 'colliding ontologies' (i.e. strongly held views on > how to do things and what things mean) there are no single solutions. > However, if we treat this in the spirit of a biological system, 'fit' > solutions should arise. [remainder omitted] XML-DEV would provide simple solutions to a few additional simple problems, but in the end (as with SQL), people will still have to do a lot of work in the middleware. I cannot usefully dump the SQL tables from my database and send them to someone else without a lot of integration and customisation work, unless we planned our tables together from the start. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Sat Nov 22 15:44:19 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:03 2004 Subject: Sequence <was> Access Languages ... Message-ID: <199711221542.CAA07290@jawa.chilli.net.au> > From: Mark L. Fussell <fussellm@alumni.caltech.edu> > I am not saying sequence is unimportant, but I think SGML is overly > focused on it (from an IM perspective) because it comes from a > paper/linear background. Information is rarely linear: it is only time > that is, which has caused some media [and the humans who use them] to be > (mostly) linear also. It can be difficult to break that linear > assumption when it doesn't apply if your tools keep reinforcing it. But do you think HTML would have become a popular markup language if its DTD was like this? <!ELEMENT html ( h1*, h2*, h3*, p*, I*, table*, tr*, td*, th*)> This is reductio ad absurdum of what you are saying. A DTD where all sequence information is made explicit. In such a DTD, all the elements would have IDs, and either some external specification to set the sequence/containment, or a "next" IDREF attribute. SGML is not overly focused on sequence. Sequence is such a basic property of text that having to always mark it up explicitly is just bizarre. Of course boilerplate text can be removed and added. And of course chunks in one part can be usefully reflected into another part. But sequence is important because it is a prime property of language. Databases contain words and pictures and various fragments. However SGML/XML must be a format to allow these to be placed as cohesive language-mediating documents. If people just want a database dump format for nice relational tables, comma-delimiter formats are available and attractive. But when they have text which they don't want to have desequenced, SGML/XML can be useful. I think the other big trouble with trying to view SGML/XML as a poor database dump format, is that when you get too far from a markup paradigm, you have to involve programmers rather than writers. Just folks can write HTML, at a pinch. If you go too much to a database mentality, you move to requiring custom-tools for data entry, rather than simple text-editors. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Nov 22 15:59:46 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:03 2004 Subject: Is XDEV useful? (was re: XDEV proposals) In-Reply-To: <199711221344.IAA00417@unready.microstar.com> References: <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk> <Pine.SOL.3.91.971122004509.17881A-100000@alumnae> <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971122162310.0b3796b0@pop3.demon.co.uk> At 08:44 22/11/97 -0500, David Megginson wrote: Thanks very much David, You pose - but do not answer - a question :-). >Peter Murray-Rust writes: > [...] > >I think that the latest version of Adept supports XML editing, and I >announced some patches to PSGML a couple of months ago. Indeed. I have no doubt that are and will be some excellent commercial tools. My problem, which I think is not unique, is that I cannot persuade my (often conservative) colleagues in science to start using a new discipline if there is a significant entry cost in terms of tools. [I do not remember how much Adept is, but many SGML tools are beyond the reach of impecunious individuals :-)]. I also want to be able to customise the tools I work with, and - for example - to link in the conversion of legacy data 'on the fly'. I did indeed note your posting on EMACS/pSGML, and thought about downloading it. But there were dire warnings about 'if you aren't fully familiar with major modes of EMACS don't try this...' that I didn't :-) > > > - I wish to be able to re-use other people's information > > objects. This is almost certainly going to break any DTD, but it is > > implicit in most of the current W3C activity. (RDF, MathML, XSL may > > have some sort of DTDs, but they will probably be used as > > components of larger documents, which cannot have DTDs) > >Actually, this turns out not to be the case -- this is actually very >simple with XML in its current form, if you use XML as a data content >notation. > >In the internal DTD subset: > > <!ENTITY myrdf SYSTEM "myrdf.xml" NDATA xml> > <!ENTITY mymathml SYSTEM "mymathml.xml" NDATA xml> > <!ENTITY myxsl SYSTEM "myxsl.xml" NDATA xml> > >In the external or internal DTD subset: > > <!NOTATION xml PUBLIC "-//W3C//NOTATION eXtensible Markup Language//EN" > SYSTEM "http://www.w3.org/XML/"> > <!ELEMENT externalDoc EMPTY> > <!ATTLIST externalDoc > doc ENTITY #REQUIRED> > >In the document instance: > > <para>Here is a a reusable RDF object:</para> > <externalDoc doc="myrdf"> > <para>Here is a reusable MathML object:</para> > <externalDoc doc="mymathml"> > <para>Here is a reusable XSL object:</para> > <externalDoc doc="myxsl"> > >Whenever your processing software finds an external data entity with >the XML notation, it can simply call the parser recursively. This is very clever! Thanks for pointing this out. I wouldn't have thought of it. It does, however, require that each ENTITY consistently uses just one DTD. > >You could also take an HTML-like approach (especially in a DTD-less >document), and simply do something like > > <include src="myrdf.xml"> > <include src="mymathml.xml"> > <include src="myxsl.xml"> This is indeed what I do at present - but using XML-LINK specifically. <ITEM XML-LINK="SIMPLE" HREF="myrdf.xml" SHOW="EMBED" ACTUATE="AUTO"> (although the semantics of EMBED - just like SRC - may not be universally agreed.) > > > > - I want to be able to validate XML 'objects'. XML can do this > > syntactically, but not semantically. For this I need additional > > 'recipes' and code > >And you always will, no matter how XDEV is designed. I've implemented >SQL-based data management systems, and SQL's type checking is _never_ >enough (or even close). Certainly we could modify XML so that parsers >could perform validations like > > - the contents of this element must be a number > - the contents of this element must not be empty > >but we'd just make the parsers bigger and wouldn't help much anyway. >After all, in real-world applications you always need to perform >validations along these lines: > > - the contents of the element must be the name of an American city > with a population over 500,000 > - the contents of the element must be a name mentioned in a list in > a different XML document > - the contents of the element must be a valid Internet domain name My approach to this is to write Element-specific code which is activated at various processing times, e.g. Atom.process(). [I also have a Atom.display()] This, of course, implies that the validation (or display) of the element is context-independent, but I'm optimistic that - for the sort of things I'm interested in - that will be true. I can easily see: Float.validate(); Molecule.validate(); Table.validate(); URL.validate(); being standalone functions and re-usable in different environments. They can also easily be overridden at the same stages as stylesheets. Your first two examples are admittedly context-dependent. >I think that XML and SGML were smarter to leave all of this to the >application-specific processing software in the first place. Agreed. I think one role of XML-DEV is to see what agreement(s) are possible for the next step. > > > - I want to be able to transform XML objects into other XML > > objects. XSL is tantalisingly close to being able to do this but I > > believe - at present - that W3C XML-transformation activity is > > 'undefined'. > >Architectural forms will bring you part-way there. For one proposal, see > > http://home.sprynet.com/sprynet/dmeggins/xml-arch.html I have read - and appreciated this. I think that, without having an AF-aware processor to hand, and a friendly guru, it's too difficult for *me*. And certainly for my community. But I know there are a lot of devotees of AFs on this list, and perhaps they can come to a communal view as to whether there is agreement as to how they are to be used in XML and what software is required (because they do need software). > > > - I want to be able to send XML objects to other people *with* a > > prior contract as how these are to be used. XML can partially solve > > this at present using DTDs, controlled prose and vocabularies and > > *bespoke applications* (i.e. a different application for each DTD.) > > This is as far as X*L goes. Much of the X*L prose stresses that > > particular activity is left to the *application*. This means that > > XML documents often need to be authored, knowing what application > > is going to be used to process them. This is, presumably, the way > > that CDF is designed - you have to have a 'CDF processor'. However > > it does not support *generic* applications (or even generic > > components of applications). > >As, I think, Paul Prescod has noted, nothing but a Turing-complete >language could do this. XML is a method for creating applications -- >it is not an application itself, and each application will need its >own conventions, etc. Well, I'm probably mad. But I still feel that (at least parts of) an XML-processor can be document-independent. > > - I wish to be able to send hypermedia. XLL specifically declines to add > > any semantics to the syntax, other than an (implied) HTML-like behaviour > > for some of the SIMPLE links. > >Are notations not suitable for specifying this information? I don't know :-). I have never used NOTATION. Seeing your example above suggested that it may be useful. Maybe it will add type information to the thing pointed at? XLL states that there is an attribute 'BEHAVIOR' but says nothing about what it is for. It would be valuable (as I have already posted) if there is some consensus about the values and their meaning. > > > - I wish to send objects to other people who will print them out > > and read them. XSL solves this. > >Yes, it may. I wonder if document-viewing will end up being a major >XML application, when most of the effort right now seems to be going >into transactions and meta-data. I think the definition of 'document' will effectively broaden. I see no reason why non-textual objects cannot be regarded primarily as 'documents'. > [...] > >Here, again, architectural forms will help. As long as you use a DTD, >and the DTD implements a "food information" base architecture, the >supermarket will be able to incorporate your molecular information >automatically. Ah - but this is the problem. I have no idea who will use my information and that is why I think that AFs are limited in my area. In Java classes, for example, I can use the Date class without the authors knowing I exist. I hope that others can use my Molecule class/element in the same way. > >XML-DEV would provide simple solutions to a few additional simple >problems, but in the end (as with SQL), people will still have to do a >lot of work in the middleware. I cannot usefully dump the SQL tables >from my database and send them to someone else without a lot of >integration and customisation work, unless we planned our tables >together from the start. No question. Maybe I think there will be a lot of newcomers with simple problems to which there will be simple solutions. Just as there were with HTML. Maybe I'm wrong :-), and that most of the problems will have to map onto very thoroughly worked out solutions on a per-problem basis. We'll see. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From nelson at media.mit.edu Sat Nov 22 17:04:43 1997 From: nelson at media.mit.edu (Nelson Minar) Date: Mon Jun 7 16:59:03 2004 Subject: Integrity in the Hands of the Client In-Reply-To: <3.0.1.16.19971122104401.2a87a84a@pop3.demon.co.uk> References: <199711220630.RAA28483@jawa.chilli.net.au> <3.0.1.16.19971122104401.2a87a84a@pop3.demon.co.uk> Message-ID: <199711221704.MAA27293@pinotnoir.media.mit.edu> >We are at a very exciting, but critical, time in the development of XML Yes, definitely. The next six months are when XML stops being a small research effort and starts being used by people who don't care about how to structure documents, but just want to publish. If XML is rolled out correctly, we can make it easy for them. >XML will be used by vastly more people that current practise SGML. And by a lot of people who have never heard of SGML and don't care about it. >In many cases the XML specs (including XLL and XSL) deliberately do >not say how something should be done - only what syntax should be >used. The WG has (often rightly) taken the view that it should not >prescribe ways of doing things. But we are not at - or very near to - >the time when people will start doing things and there is a danger >that we shall end up with serious inconsistencies. The danger is more than inconsistencies. XML is complicated and hard to understand how to use well. In order to help the people who just want to publish, examples and tools need to be developed to help people not just build legal XML, but *good* XML. That's hard, both because you have to encapsulate a practice of good XML authoring and even worse, come up with what we mean by "good" in the first place. I'm reminded of what happened in the first few months of 1994, when a lot of people suddenly learned HTML. One of the most useful documents (for me) of that period was Eric Tilton's essay "Composing Good HTML" (since turned into a book, "Web Weaving", with Carl Steadman and Tyler Jones). It was a short essay, but it laid out many of the basics of writing HTML well - issues beyond syntax. Style issues like "don't say 'click here' in a document, integrate the anchor text into the narrative". Structural issues like "don't misuse headers" and "try to do logical formatting, not physical". And meta information recommendations, like "put your name on documents" and "put a last modified date on documents if it makes sense". For me, that essay made HTML made sense, gave some order to the varied capabilities of the syntax. I tried to do my bit back then by writing an HTML editor tool (an emacs mode) that made it easier to write good HTML. Indenting the HTML source to show the document structure, providing simple templates to get basic well formedness, automating last modified footers. And I think it was reasonably successful - pages written with my editor were at least a little better than pages written with nothing. XML needs similar style guidelines and tools if people are going to use it well. The problem for XML is harder than with HTML since XML is more powerful. I think XML will be most successful for casual document writers when there are standard well-established DTDs combined with style sheets that are simple to use and very well documented as to what the tags mean and how to use them. I don't know how to smooth the process of helping people develop their own DTDs. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sat Nov 22 19:09:49 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:03 2004 Subject: Is XDEV useful? (was re: XDEV proposals) In-Reply-To: <3.0.1.16.19971122162310.0b3796b0@pop3.demon.co.uk> References: <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk> <Pine.SOL.3.91.971122004509.17881A-100000@alumnae> <199711221344.IAA00417@unready.microstar.com> <3.0.1.16.19971122162310.0b3796b0@pop3.demon.co.uk> Message-ID: <199711221910.OAA00315@unready.microstar.com> Peter Murray-Rust writes: > Thanks very much David, > You pose - but do not answer - a question :-). Perhaps as I move into my mid-30's I'm assuming the modesty and humility of old age (though those who have endured a conversation with me may have their doubts). > >Whenever your processing software finds an external data entity with > >the XML notation, it can simply call the parser recursively. > > This is very clever! Thanks for pointing this out. I wouldn't have thought > of it. > It does, however, require that each ENTITY consistently uses just one DTD. Each entity can have its own DOCTYPE declaration -- simply start a new invocation of your parser. > I have read - and appreciated this. I think that, without having an > AF-aware processor to hand, and a friendly guru, it's too difficult for > *me*. And certainly for my community. But I know there are a lot of > devotees of AFs on this list, and perhaps they can come to a communal view > as to whether there is agreement as to how they are to be used in XML and > what software is required (because they do need software). The simplest approach to AF does not require an architectural engine at all; instead, simply look at attribute values instead of element type names; i.e., instead of IF element_name = "FOO" DO do_a_foo() END try IF attribute_name("MYARCH") = "FOO" DO do_a_foo() END > > > - I wish to be able to send hypermedia. XLL specifically declines to add > > > any semantics to the syntax, other than an (implied) HTML-like behaviour > > > for some of the SIMPLE links. > > > >Are notations not suitable for specifying this information? > > I don't know :-). I have never used NOTATION. Seeing your example above > suggested that it may be useful. Maybe it will add type information to the > thing pointed at? That exactly its purpose -- the notation informs the processing software of a binary entity's type: <!NOTATION EPS PUBLIC "+//ISBN 0-201-18127-4::Adobe//NOTATION PostScript Language Ref. Manual//EN" "postscript"> <!ENTITY pic SYSTEM "pic.ps" NDATA EPS> (I have to admit that I have no idea with to do with system identifiers for notations in XML -- in full SGML, I just leave them out). [...] > >Here, again, architectural forms will help. As long as you use a DTD, > >and the DTD implements a "food information" base architecture, the > >supermarket will be able to incorporate your molecular information > >automatically. > > Ah - but this is the problem. I have no idea who will use my information > and that is why I think that AFs are limited in my area. In Java classes, > for example, I can use the Date class without the authors knowing I exist. > I hope that others can use my Molecule class/element in the same way. How could they possibly use your information automatically if you weren't using some kind of shared standard? How would they know what information applied to what food, for example, unless you had somehow encoded that information in advance for them? All the best, and thanks for an interesting discussion, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Nov 22 20:03:39 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:03 2004 Subject: Is XDEV useful? (was re: XDEV proposals) In-Reply-To: <199711221910.OAA00315@unready.microstar.com> References: <3.0.1.16.19971122162310.0b3796b0@pop3.demon.co.uk> <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk> <Pine.SOL.3.91.971122004509.17881A-100000@alumnae> <199711221344.IAA00417@unready.microstar.com> <3.0.1.16.19971122162310.0b3796b0@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971122205518.0b37d2c8@pop3.demon.co.uk> At 14:10 22/11/97 -0500, David Megginson wrote: >Peter Murray-Rust writes: > > > Thanks very much David, > > You pose - but do not answer - a question :-). > >Perhaps as I move into my mid-30's I'm assuming the modesty and Well, so am I (but not in the decimal system :-) Age is unimportant on XML-DEV (except as datatype, of course :-) [...] > >Each entity can have its own DOCTYPE declaration -- simply start a new >invocation of your parser. > This raises a common problem I have. If I have an 'include file' (e.g. a chapter) I can 'include' by the following mechanisms: - declare it as an entity and use &chapter1; In this case it should not have any doctypes, or other header info - reference it by XML-LINK="SIMPL" HREF="chapter1.xml" - use your NOTATION trick The advantage of the last two is that they are standalone XML files and can be validated independently, and so I'm leaning towards them in general. They also have the merit that you load the TOC and then look at whatever chapters you want. This takes less memory and is faster The advantage of the first is you have a single object in memory which can be searched (e.g. Xpointers). Any comments? > >The simplest approach to AF does not require an architectural engine >at all; instead, simply look at attribute values instead of element >type names; i.e., instead of > > IF element_name = "FOO" DO > do_a_foo() > END > >try > > IF attribute_name("MYARCH") = "FOO" DO > do_a_foo() > END Oh dear! Like the man who didn't realise he had been using prose all his life. This is exactly what I do for most of my stuff at present :-) It's advantage is that it makes the DTD much more forgiving :-) > > > > >Here, again, architectural forms will help. As long as you use a DTD, > > >and the DTD implements a "food information" base architecture, the > > >supermarket will be able to incorporate your molecular information > > >automatically. > > > > Ah - but this is the problem. I have no idea who will use my information > > and that is why I think that AFs are limited in my area. In Java classes, > > for example, I can use the Date class without the authors knowing I exist. > > I hope that others can use my Molecule class/element in the same way. > >How could they possibly use your information automatically if you >weren't using some kind of shared standard? How would they know what >information applied to what food, for example, unless you had somehow >encoded that information in advance for them? No :-) I produce something I think other people would value and just produce it with (hopefully) good documentation. Thus I have a class RealSquareMatrix in JUMBO. I may make an <!ELEMENT> out of it. I would then document it with what I felt were ReallyUseful properties of RealSquareMatrices. If people want to use it, they're welcome. This is the way that we use java.* and other classes. So, if I produce <MOLECULE> I will document what it is, what its components are, and then offer Molecule.java as something that will display()/validate() it. For example, a Molecule can have Atoms but not Bonds, but not Bonds without Atoms. If a food manufacturer reads my documentation, they can decide for themselves whether it's useful. [I have had interest by those involved in submission of drugs - e.g. pharmaceutical companies and regulatory agencies.] The users then have to satisfy themselves whether <MOLECULE> is robust, future-proofed, etc. In the same way I shall take <MATHML> on trust. I shall create MATHML objects (possibly with TeX or symbolic algebra) and use them for chemistry. The original authors of MathML need never know what I am doing (although I have actually met some and am very excited about what they are doing). > P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jlapp at acm.org Sat Nov 22 21:36:47 1997 From: jlapp at acm.org (Joe Lapp) Date: Mon Jun 7 16:59:03 2004 Subject: Integrity in the Hands of the Client In-Reply-To: <Pine.BSD/.3.91.971121112104.1431E-100000@mrburns.iosphere. net> References: <3.0.3.32.19971121105652.0095e600@pop.access.digex.net> Message-ID: <3.0.3.32.19971122163645.00965aa0@pop.access.digex.net> Mark Baker <markb@iosphere.net> wrote: >But in *many* cases, you just want to make the *object* persist simply, >perhaps even on the machine with the browser. This is especially >suitable for agent systems; you bring the ability to persist along with >you instead of attempting to store it "behind" you. It's a move away from >TP-monitor style ACID transactions, and towards a more "make forward >progress" means of distributed computing. Object groups are a good >example of this. And in a subsequent posting he wrote: >[...] transactions are an >overrated means of reasoning about distributed systems. They try and >make distributed processing look like local processing, when we now know >how impractical that view is. I find these statements very thought-provoking. I'm not quite sure what you mean by them, at least not in the context of our discussion. It sounds like you are proffering a very important perspective that I'm going to need to carry around in my back pocket. In particular, I'm curious about the implications for data that is shared among many users? Are you saying that there is a model that accomplishes the same thing as sharing data but that does not require a central (or a partitioned and replicated but still synchronized) repository? -- Joe Lapp (Java Apps Developer/Consultant) Unite for Java! - http://www.javalobby.org jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Sat Nov 22 21:38:00 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:03 2004 Subject: Sequence <was> Access Languages ... In-Reply-To: <199711221542.CAA07290@jawa.chilli.net.au> Message-ID: <Pine.SOL.3.91.971122124318.5297A-100000@alumnae> On Sun, 23 Nov 1997, Rick Jelliffe wrote: > > From: Mark L. Fussell <fussellm@alumni.caltech.edu> > > I am not saying sequence is unimportant, but I think SGML is overly > > focused on it (from an IM perspective) because it comes from a > > paper/linear background. Information is rarely linear: it is only time > > that is, which has caused some media [and the humans who use them] to be > > (mostly) linear also. It can be difficult to break that linear > > assumption when it doesn't apply if your tools keep reinforcing it. > > But do you think HTML would have become a popular markup language if its DTD > was like this? > > <!ELEMENT html > ( h1*, h2*, h3*, p*, I*, table*, tr*, td*, th*)> > > > This is reductio ad absurdum of what you are saying. A DTD where all > sequence information is made explicit. I certainly wasn't trying to say that sequencing should be removed but just that it can be difficult to see when it doesn't apply. Sometimes information is (at least dominantly) organized as a sequence: Ordered Sections contain Ordered Paragraphs. Sometimes information does not inherently need to be sequenced but the application would like it to be so it does not need to worry about ordering it at presentation (I am thinking of a list of citations where there is a natural ordering [by one of the columns/attributes of a citation]). And a variation of this case is: sometimes it is just easier for people to take direct control than to do informational markup. I think HTML and Word Processors represent this end of the spectrum. > If people just want a database dump format for nice relational tables, > comma-delimiter formats are available and attractive. But when they have > text which they don't want to have desequenced, SGML/XML can be useful. Well, I guess I have larger visions of what SGML/XML can do, and I think it is within (or at most a mild extension) of the original vision. Requoting [Goldfarb 90, A.2.40]: --- Generalized markup is based on two novel postulates: a) Markup should describe a document's structure and other attributes rather than specify processing to be performed on it, as descriptive markup need be done only once and will suffice for all future processing. b) Markup should be rigorous so that the techniques available for processing rigorously-defined objects like programs and databases can be used for processing documents as well. --- SGML is designed to describe information, and although the original vision may have been focused on describing documents I believe that was just because it was the particular task at hand. > ... Just folks can write > HTML, at a pinch. If you go too much to a database mentality, you move > to requiring custom-tools for data entry, rather than simple text-editors. No argument that HTML is easier for novices to directly write than more structured information, but that also applies to any of the more sophisticated DTDs. The benefit of a human-readable and human-understandable encoding like SGML/XML is that people can progress from simple DTDs like HTML to more complex ones and still understand what is going on. I have done this with web-site development where content writers now use a "real" DTD that allows generation of different HTML views (and more sophisticated linking... etc.) And I do agree that accurately modeled information (e.g. normalizing in a RDB context) can make it too hard (for the desired writes) to enter data directly. It is likely that some SGML/XML DTDs will be designed to contain all the necessary information with explicitly desired redundancy and artificial sequencing but with the assumption that the processing will later remove them on the way to the information model. This is almost exactly what UI Forms and relational views are doing. So I don't want to get rid of sequence, I just believe people should think twice about it and assertain whether it is really part of the information and is the best way to represent that information. --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ddb at criinc.com Sat Nov 22 23:05:07 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:59:03 2004 Subject: Query Languages for XML Message-ID: <3.0.32.19971122150639.00a71d60@mailhost.criinc.com> At 11:50 AM 11/21/97 -0500, Paul Prescod wrote: >Derek Denny-Brown wrote: >> >> One of the things that I see as a potential problem is that HTML etc as it >> is used now has 2 (as I count them this side of the morning) relatively >> distinct uses. >> 1) as an alternate form of (relatively) static information. >> 2) as a (very-basic) cross-platform (g)ui. >> >> XSL and DSSSL are focusing rather hard on (1), but not on (2). > >I'm not sure what you mean by that. XSL as currently proposed has access >to all of the form features of HTML, just as it has access to all of the >static display features of HTML. It is correct to argue that we are >spending more effort on *improving* HTML's static display features than >improving its form features, but I think that that is probably >appropriate considering the market's interest in better static pages, >SGML's particular strengths in that area and Java's suitability for >forms. I have a difficult time understanding how a "call-back" would work in XSL, since the processing model does not include any mechanism for such callbacks. This sense given that XSL is (at least to this point) about transforming XML to a displayable view. The problem is when that display able view is interactive. I am not talking about FORMs, where the interaction is between the browser and the server, but rather interaction between the user and the browser, a-la JScript/JavaScript, onMouseOver, etc... What if I have two (or more) possible way to view (ie. differing applications of a styelsheet) some content. I am not looking to reprocess teh whole, pag,e but rather to reprocess just a small portion and swap out a portion of the existing/original flow-objects for the newly generated flow-objects. Or, as a simpler case, to just modify the attributes of existing flow objects. Using HTML forms as a example, it would be really nice if you could performs some sanity checks on the contents of the form before it was sent to the server. One of the problems with most currnet scheme's which do this is that they provide no clear indication of what they think is wrong, when the sanity checking is done. It would be better if they could use color or some such thing to indicate which fields have data which it considers invalid. The way all this (and most of JavaScript) is done now is through call-backs. You register a function as a call-back in the case a specific event happens (such as the user pushing the submit button, or the user's mouse moving over a specific image). I am unclear how XSL could handle such a call-back. For what I would want, it really needs a queriable model of the flow-objects which were created originally, and some way to modify those flow-objects. Now your XSL style sheet has two portions, one which takes the XML document and builds a complete flow-object stream/tree. The other handles callbacks from user-generated events regarding flow-objects, and modifies the flow-objects. This second part is what "dynamic" HTML is all about. (Either through Netscape's JavaScript or Microsoft's dHTML, though dHTML is more like what I am talking about.) If you have looked at some of Microsoft's MSXML samples which use DSO (I think that is what they call it...), that is kind of in line with what I am talking about, though in that case the original flow-objects where directly HTML, not generated from XML... -derek Derek E. Denny-Brown II || ddb@criinc.com "Reality is that which, || Seattle, WA USA when you stop believing in it, || WWW/SGML/HyTime/XML doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Sun Nov 23 01:23:53 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:59:03 2004 Subject: Integrity in the Hands of the Client Message-ID: <v03007808b09c3c674a91@[205.181.197.111]> From: Joe Lapp <jlapp@acm.org> In this posting I'm going to be a little bold and propose that both the XML and DOM specifications are flawed. The existence of these flaws ride on the assumption that we care to use SGML/XML to create domain models for data where the data evolves over time. I'm also assuming that it is unacceptable for the client objects of a document to maintain the integrity of the document. I've not been following this thread closely, so I apologize if I get something wrong. I'll stop, first, too, to note that when interconverting data formats we rarelt can represent every validity constraint in the new format -- If I dump a DB record to tabbed files I lose referential (and all other) integrity checks, but I may have much better luck moving to a compeiting vendor's system. When using XML, we may reasonably expect that the richer formalism will give us more control (and for hierarchical data, that expectation is well (if not perfectly) met. We may also expect that other properties can be preserved (eg IDrefs eliminate broken pointers, but don't allow typed references), but some probably won't be. We need to design the DTD for this document. Here is our first pass: <!DOCTYPE catalog [ <!ELEMENT catalog (books, authors)> <!ELEMENT books (book*)> <!ELEMENT authors (author*)> <!ELEMENT book (summary)> <!ATTLIST book title CDATA #REQUIRED author IDREF #REQUIRED> <!ELEMENT author (bio)> <!ATTLIST author id ID #REQUIRED name CDATA #REQUIRED> <!ELEMENT summary (#PCDATA)> <!ELEMENT bio (#PCDATA)> ]> To get a better feel for what we've designed, we create a little sample document: <catalog> <books> <book title="The Postman" author="A1"> <summary>Text goes here.</summary></book> <book title="Startide Rising" author="A1"> <summary>Text goes here.</summary></book> <book title="Hitchhiker's Guide to the Galaxy" author="A2"> <summary>Text goes here.</summary></book> </books> <authors> <author id="A1" name="David Brin"><bio>Text goes here.</bio></author> <author id="A2" name="Douglas Adams"><bio>Text goes here.</bio></author> </authors> </catalog> This seems to work. It stores information about books and authors, and it is not possible to add a book without associating it with the description of some author. But we can see that it breaks as soon as we add any other kind of element that has an ID. We know that every book will eventually have an ID, because we'll soon want to have an element whose content elements reference the New York Times Bestsellers. Once we do that, nothing prevents an administrator (or the client program he or she is using) from indicating that the author of a book is another book. This DTD will not suffice. The problem with this is that it uses database style "joins" on ID values. XML's most powerful constraints are tree constraints, based on containment. For example the following structure does not have this problem: <catalog> <authors> <author id=A1><name>David Brin</name> <bio>whatever<bio> <books> <book><title>The Postman whatever other books go here. If we have more than one author: ...etc
Note that you do have to pick a "by author" or "by book" hierarchy to use this technique. I also moved title and author into elements: titles frequently contail markup, and names can be complex enough that it's often a good idea to be prepared for the eventual need for markup. Consider Chinese names where the order of family and personal names is different than it is in most European cultures. It seems that we might have to use links, but lets look at other approaches first. We entertain the idea that an author's books belong to the content of the author. We quickly throw that one out when we realize that a book can have more than one author. Or take an alternative approach (as I sketched above). I have not been able to find a way to have the document server force clients to ensure that whenever they add a book, that book is associated with some author. Clients are given the responsibility of maintaining the integrity of the document. No, Servers that want to impose non-XML integrity constraints (such as you are demanding) must impose those constraints themselves. XML, like traditional databases (which seem to be your starting point) represents some things well, nd some things very badly. Attempting to create relational schemas for XML documents produces that same kind of hairy, unnatural specifications and requires similar extra integrity checks on update to represent typical document information. Basically, I think that the flaw of not providing what you ask for is in fact no flaw, but an artifact of different tools being targeted to different purposes. There is a difference -- since XML is a data format and _not_ a processing technology the way a database is, it may be useful as a way to represent data and transport best _manipulated_ in non-XML ways. You get a rich language of structures for free by using an XML parser, and that may save some time in writing data transporters -- for instance, a DTD for the transport of complete RDB table sets would be easy to write -- but checking those tables for semantic correctness would not be one of the things you get for free. I think the XML specification as it currently stands is extremely well-suited for describing data that does not change over time, but that it is lacking in specifying how documents are to evolve. You overstate the case here. It's suited for describing how the data whose integrity costraints correspond to XML validity should evolve. These constraints are not theoretically justified, but are pragmatically justified by the fact that people can get useful document management work done using them. This is the same thing with relational database -- all those theorems about normal forms and algebra merely show that the system is well defined -- the fact that tables are useful for many kinds of data is still a pragmatic one, and not a theoretical one. The world is still full of things that don't fit the relational model very well. I know that our current data-manipulation-savior is OO databases, bit once we have experience with them we'll grow to understand the ways in which they fall short of perfection as well. Nevertheless, future versions of XML might have small improvements that will help cases like this. The provision of multiple ID spaces (ability to have typed IDs and typed IDrefs) is one that has been suggested a number of times. It would also be very useful in documents, since (begin example) only would have "fignum" attributes, and so the user of "figref" attributes will be prevented from referring instead to a paragraph of random text. Small suggestions like this that also offer a lot of leverage may get considered for XML 1.1. (Small in the sense that little syntax is required to support it, and little processing beyond that already required for ID/IDREF processing). To my mind, such suggestions are compelling to the extent that they are useful in _document_ management (as well as general data management) because that really describes the primary focus of XML design. XML may well be useful beyond that area, but I think it should stay away from bidding on the "universal data format of the ages" title, that may well be impossible to ever attain. -- David ------------------------------------------+---------------------------- David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com Boston University Computer Science | Dynamic Diagrams http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/ | MAPA: mapping for the WWW xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Sun Nov 23 16:57:31 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:03 2004 Subject: Integrity in the Hands of the Client References: Message-ID: <3478614E.239F097A@technologist.com> David G. Durand wrote: > To my mind, such suggestions are compelling to the extent that they > are useful in _document_ management (as well as general data > management) because that really describes the primary focus of XML > design. XML may well be useful beyond that area, but I think it should > stay away from bidding on the "universal data format of the ages" > title, that may well be impossible to ever attain. This is such an important point I felt I had to emphasize it. We could legally mandate every single byte that is stored on a computer hard drive must be in XML and the world would not be a better place. We would still have incompatibilities between software, we would still have trouble storing documents in relational databases and relational information in documents and so forth. Unifying notation is merely a convenience. It doesn't automatically buy a perfect world of seamless interoperability as some seem to believe. "Sometimes the actual claims for markup-based systems are overstated; the claim that SGML results in portable documents, for example, falls afoul of the observation that it is possible to put angle brackets around troff tags, supply a simple document type descrip- tor,and thereby achieve anSGML-compliant document, without gaining any portability or descriptiveness for the information. True portability requires not only that informa- tion be transportable from one machine to another,but that the semantics of that informa- tion be the same on either machine. SGML, in particular,claims to transfer no semantics, so it surely cannot guarantee portability." [1] Given this fact, we should focus on making the best notations we can for the data types we have to represent, rather than trying to stuff all data into the same notation, or worse, making a single notation that is adapted for all kinds of data. Putting angle brackets around troff does not make troff into a serialization of a Java Bean and the fact that Java Beans and Troff might share a notation does not make it easier to create troff files from Java or to render them IN Java. Paul Prescod [1] "Markup Reconsidered" http://www.sil.org/sgml/raymmark.ps xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Sun Nov 23 17:29:00 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:04 2004 Subject: Sequence Access Languages ... References: Message-ID: <347868B1.7604BDDD@technologist.com> Mark L. Fussell wrote: > > SGML is designed to describe information, and although the original vision > may have been focused on describing documents I believe that was just > because it was the particular task at hand. I think that you have this backwards. SGML was designed to represented documents and insofar as documents share properties with some other types of information, SGML can represent other information. I see no reason to believe that a single notation could efficiently represent all forms of information. If we take this to an extreme then most people seem to agree: how soon do you expect we will represent bitmapped graphics in XML? My personal rule of thumb is that it is okay to represent some non-document data type in SGML/XML if it is convenient to do so without extending SGML/XML in a way that would make it less appropriate for dealing with documents. Suboptimal extensions would be those that confuse the organizational principles of SGML or make it more complicated to implement or understand (such as complex validity constraints). Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mrc at allette.com.au Sun Nov 23 21:22:36 1997 From: mrc at allette.com.au (Marcus Carr) Date: Mon Jun 7 16:59:04 2004 Subject: Sequence Access Languages ... References: Message-ID: <34789E7C.F1E4FAB6@allette.com.au> Mark L. Fussell wrote: > SGML is designed to describe information, and although the original vision may > have been focused on describing documents I believe that was just because it was > the particular task at hand. Actually, the task at hand has always been to capture the information as cleanly and thoroughly as possible with as little regard to the downstream applications as possible. Several years ago we converted to SGML a substantial amount of military data with an anticipated lifespan of fifty years. At the time, there were no satifactory SGML repositories, yet we are not precluded from uploading to one when they arrrive even if it does mean (an easy) SGML to XML conversion of the data. Similarly, and with all due respect to the idea of a repository, by the time this data reaches its twilight there will be some very different mechanisms for managing data and I daresay the repository will be long gone. We know that XML/SGML won't cover everything at once - the quickest path to failure is to try to make it do so. Stage one, capture the data as best you can anticipate and hopefully in a way that also works for authors, stage two, convert it for use in specific applications and write semantic support mechanisms. Disregard of any type of application is the greatest strength of XML/SGML - we must not be tempted to lose sight of that no matter how tempting the siren's call. It's a long game... -- Regards Marcus Carr email: mrc@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 9262 4777 fax: +61 2 9262 4774 _______________________________________________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From terje at in-progress.com Mon Nov 24 03:49:30 1997 From: terje at in-progress.com (terje@in-progress.com) Date: Mon Jun 7 16:59:04 2004 Subject: Join the Document Interchange Initiative Message-ID: The Document Interchange Initiative (DII) is a campaign to foster that the content of a website can be more easily interchanged between various software. To reach this goal, the campaign promotes adherence to markup standards as an alternative to proprietary markup extensions. In a way, the campaign is a public relations effort to increase the use of XML and related technologies. The Document Interchange Initiative promotes markup that conforms to the established standards both when it comes to syntax and semantics. The following location is updated with information about the campaign, and new information is added on a regular basis. Feel free to make a link to it: http://interaction.in-progress.com/interchange You are invited to join the Document Interchange Initiative, by adding your name or organization to a list of those that support the goals of interchangable documents through adherence to markup standards. There are no obligations whatsoever, but your name or organization on the list will help to get the necessarry attention for the campaign. Please send an email to or directly to me to be listed as supporter or become an activist, or if you have any questions related to the campaign. -- Terje | Media Design in*Progress C a s c a d e... a comprehensive Cascading Style Sheets editor for Mac http://interaction.in-progress.com/cascade Make your Web Site a Social Place with Interaction - The Most Powerful Companion to a Mac Web Server! http://interaction.in-progress.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Mon Nov 24 05:20:30 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:04 2004 Subject: Integrity in the Hands of the Client Message-ID: <199711240518.QAA09900@jawa.chilli.net.au> > From: Paul Prescod > > > "Sometimes the actual claims for markup-based systems are overstated; > the claim that SGML results in portable documents, for example, > falls afoul of the observation that it is possible to put angle > brackets around troff tags, supply a simple document type descrip- > tor,and thereby achieve anSGML-compliant document, without gaining > any portability or descriptiveness for the information. True > portability requires not only that informa- tion be transportable > from one machine to another,but that the semantics of that informa- > tion be the same on either machine. SGML, in particular,claims to > transfer no semantics, so it surely cannot guarantee portability." > > [1] "Markup Reconsidered" http://www.sil.org/sgml/raymmark.ps Without wishing to disagree in any way with Paul, the quote is perhaps not quite true, I think. Sticking angle brackets on troff code may give you a document that is syntactically *valid* SGML but, because to the extent that it uses elements to markup processing instructions, the document does not *conform* to SGML. Such conformance cannot be judged mechanically, but by looking at the definitions in ISO 8879 for processing instructions and elements. People often seem to think "SGML is a grammar; I can markup all sorts of sloppy things; therefore SGML is a bad grammar". But SGML is more than a queer grammar, it is a language: the terms "element" and "processing instruction" (etc) have broad but useable meanings. I think one problem with XML is that these definitions of what an element, etc., actually mean are not present. XML *is* just a grammar, more or less. But to convert it to a useful language, we often have to plug in SGML's definitions. And again, we shouldn't then think that in all cases "SGML conformance=good; SGML non-conformance=bad". But that is separate from "do I need SGML validity? do I need XML well-formedness? do I need a custom syntax?". Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Mon Nov 24 09:01:06 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:04 2004 Subject: XML and standards (was Re: Integrity in the Hands of the Client) References: <3.0.2.32.19971124001024.00930d10@pop.iosphere.net> Message-ID: <3479431B.BA9E33EE@technologist.com> Mark Baker wrote: > > At 12:01 PM 23/11/97 -0500, Paul Prescod wrote: > >Putting angle brackets around troff does > >not make troff into a serialization of a Java Bean > > What if that troff document contained a link to an implementation of a > troff formatter? What if that implementation described its interface using > XML? What if it didn't? What if it described its interface using CORBA or some proprietary language that is more powerful than CORBA? You don't lose any flexibity or expressive power, you just have to write another parser for CORBA or your proprietary language. The hard part of writing a troff implementation is not writing the parser, but in writing the formatter. So XML can only make a marginal difference in implementation time or effort. The hard part of writing an interface to a troff implementation is writing the interface, not publishing it (in my experience, anyway) so XML can only make a marginal difference there either. The same goes for writing an SGML DTD parser. The difficulty there is in keeping track of all of those elements, attributes and entities, not in parsing the syntax. So again you only get a marginal benefit from using XML as the representation language. Now if a marginal benefit is enough to tip you into profitability, then I'm glad we were able to help you. But there are costs associated with that marginal benefit. You will beat your head against the wall trying to express constraints that SGML cannot express directly. You will find that your files are much larger than they would be in an optimized notation. You will notice redundancy in places that you don't really need it. On the other hand, there is a huge benefit to using SGML/XML *for documents* because SGML is the international standard for representing structured documents. Thus you get the benefit of hundreds of tools, books and experts, almost all of them specialized for document markup. You do not get that benefit when you ignore CORBA (the real object interface standard) to use XML instead. You do not get that benefit when you ignore TeX or troff to use XML as a page description language. You do not get that benefit when you ignore the existing DTD syntax to invent a new XML instance syntax. When you use XML to replace an existing standard, you are, for a period at least, actually working against open standards and promoting a proprietary alternative, even if it is expressed in the standard notation of SGML/XML. This might be a good idea if there is a problem with the existing standard in a given area, but more often it is a better idea to work with the people who control the standard to improve it rather than striking out on your own (for all of the usual reasons). Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Mon Nov 24 14:33:51 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:04 2004 Subject: Integrity in the Hands of the Client References: <199711240518.QAA09900@jawa.chilli.net.au> Message-ID: <34793F19.2039C768@technologist.com> Rick Jelliffe wrote: > Sticking angle brackets on troff code may give you a document that is > syntactically *valid* SGML but, because to the extent that it uses elements > to markup processing instructions, the document does not *conform* to > SGML. Such conformance cannot be judged mechanically, but by looking at the > definitions in ISO 8879 for processing instructions and elements. "Element: A component of the hierarchical structure defined by a document type definition;" > People often seem to think "SGML is a grammar; I can markup all sorts of > sloppy things; therefore SGML is a bad grammar". I would have thought that that flexibility makes SGML a *good* grammar. SGML would be a GOOD encoding for (e.g.) a typesetting language. In fact, it already is used in this way for SPDL. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Mon Nov 24 15:47:52 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:04 2004 Subject: Integrity in the Hands of the Client Message-ID: <199711241545.CAA29767@jawa.chilli.net.au> > From: Paul Prescod > "Element: A component of the hierarchical structure defined by a > document type definition;" As distinct from "Processing Instruction: markup consisting of system-specific data that controls how a document is to be processed." Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From cskerr at geocities.com Mon Nov 24 15:48:59 1997 From: cskerr at geocities.com (Charles Kerr) Date: Mon Jun 7 16:59:04 2004 Subject: MS XML parser only works with IE... References: <01bcf8a9$98d09820$0a08bdcc@infinity> Message-ID: <3479A42C.A8BE480B@geocities.com> (For those of you reading this on the xml-dev mailing list, the article referred to in this letter is at http://www.javalobby.org/jn001.htm#xml and WORA == Write Once, Run Anywhere) > Stating that MSXML is "Arguably the best XML parser for Java today" I think > is in error and inconsistent with the stated views of the Java Lobby and our > commitment to WORA. MSXML is not 100% pure and the DSO (Data Source Object) > applet only works with MS IE 4.0 browsers. > > These imports are hidden in the source > import com.ms.com.*; > import com.ms.com.IUnknown; > import com.ms.com.Variant; > import com.ms.osp.*; > import netscape.javascript.JSObject; > > Not only that, but with this release of MSXML it appears that Microsoft is > attempting to fragment the XML community by encouraging the use of > non-standard end tags and other things like the inline '&' that break > XML. I can only conclude that Microsoft is giving MSXML away for free --once > again-- in order to fragment the emerging XML standard. This is not a Java > application, and MSXML is something we should NOT endorse or support. The MSXML dependencies on Windows are apparently trivial and fixable. Equally important, the MSXML EULA grants the right to redistribute such modified code. See the three letters (from the xml-dev mailing list) that I include at the end of this letter for more information. What I'd like to see is someone post these fixes so that each person wanting the portable version doesn't have to make the changes by hand. If anyone does this (clovett, you listening? :) and lets me know, I'll write it up in the news. Regarding MSXML's break with the XML spec, I was unaware of the and & notation -- it was discussed in the xml-dev mailing list right before I joined. It looks like Microsoft is, for once, interested in hearing constructive feedback. In particular Chris Lovett (clovett@microsoft.com) has encouraged such feedback. Anyone interested in this topic should send him polite mail requesting that MS stick to the spec. I can understand why you would be upset about this. The splintering by Microsoft of a great new technology is something that we Java programmers seem mysteriously sensitive to. ;) Nevertheless, I'll stand by my statement that MSXML is arguably the best XML parser for Java today. There are other choices, such as Lark (http://www.textuality.com/Lark/) and NXP (http://www.edu.uni-klu.ac.at/~nmikula/NXP/preview/). I'm cc'ing this to the xml-dev mailing lists in the hopes that it will rekindle the discussion on the importance of sticking to the specs. Charles cskerr@geocities.org Unite for Java! http://www.javalobby.org/ -- Exerpts from three letters on the xml-dev mailing list regarding XML's ties to Windows [1] > Windows dependency of MSXML is minimal. All you have to do is following: > 1. remove com.ms.xml.dso package. > Delete the class files from the jar and/or comment it out of the makefile. > DSO is accessed by some of the samples but none of the other MSXML packages. > 2. remove dependency on com.ms.xml.xmlstream package. > Latest version of MSXML includes an alternate XMLInputStream class located > inside the 'make' directory. Replace com.ms.xml.util.XMLInputStream with > the alternate version to remove dependency on com.ms.xml.xmlstream package. > WIth above two changes, you will end up with a pure-Java version of MSXML. > MSXML is the most complete XML parser available right now and you get the > source code on top of it. I would be smiling by now if I were you :-) [2] > The parser uses a newly-defined Interface to a stream library that is > specific to XML. The parser does not use the implementations of streams > provided in the JDK 1.1 packages for the internet. I believe that this has > to do with byte-ordering problems in those implementations. I have not > checked this for myself. > The interface per se has no platform dependencies. It is shipped with two > implementations. One implementation is specific to Windows, the other is > generic Java using JDK packages. Neither has the byte-order flaw. You may > use whichever one you prefer. Both work. The generic one has lower > performance. > --Andrew Layman AndrewL@microsoft.com [3] > How 'bout that! Microsoft's EULA even grants us the right to redistribute > such modified code. Quite generous of them, I must say. Microsoft just > went up a point in my rating system. I am indeed smiling now. :-) > My apologies to the MSXML team. > -- > Joe Lapp (Java Apps Developer/Consultant) > Unite for Java! - http://www.javalobby.org > jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ddb at criinc.com Mon Nov 24 18:06:55 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:59:04 2004 Subject: Sequence Access Languages ... Message-ID: <3.0.32.19971124100758.00936af0@mailhost.criinc.com> >On Sun, 23 Nov 1997, Rick Jelliffe wrote: >> If people just want a database dump format for nice relational tables, >> comma-delimiter formats are available and attractive. But when they have >> text which they don't want to have desequenced, SGML/XML can be useful. It really depends on the requirements. For data with a long expected life-time, XML may actually be a better choice than comma/tab delimited file _because_ it is so verbose. If the original architects choose tag names which are clear, then when someone approaches the data 10 years later, and the original authors are long gone, the chance of this new-comer understanding the data format increases significantly. This is what Steven Newcomb calls self-descriptive documents. (Steven/Peter, did I get that right?) I have been bitten by problem that I write a quick and dirty data-dump tool which dumps out to a tab-delimited file and then, a year later I can't remember exactly what all the fields were. XML can help. It is not a perfect solution, but it beats re-engineering software (esp if you don't have source any more....) but again, it all goes back to your requirements. If your data is only going to be used by 3 programs you wrote, and the data has a short life expectancy, then tab-delimited files are a good choice. -derek Derek E. Denny-Brown II || ddb@criinc.com "Reality is that which, || Seattle, WA USA when you stop believing in it, || WWW/SGML/HyTime/XML doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mecom-gmbh at mixx.de Mon Nov 24 18:23:16 1997 From: mecom-gmbh at mixx.de (james anderson) Date: Mon Jun 7 16:59:04 2004 Subject: Inheritance References: Message-ID: <3479C712.985B01D9@mixx.de> greetings, sorry to start in the middle of this thread, but as an xml novice i'm wondering why one is at all concerned to extend a language intended to mark up "structure" in order to encode "behaviour". (this being the distinction made by separating 'class' and 'type'). why is it not sufficient to accept that an dtd form encodes the structure of one class only, and to encode the type and/or class relations in marked-up data, instead of adding new elements to the definition language? (eg ELEMTYPE). for example ANOTHER-TYPE ANY would encode the same information. what advantage do the special forms and the additional processing mechanisms offer? why, for instances, isn't the generic dt-element definition a type a model ? why does there need to be a BNF for document type definitions? granted, i have gathered only that sgml background which i need to vaguely understand XML's origins, but, in the processing of writing an XML 'processor', i couldn't help but wonder why or whether all the special forms are required by anything other than historical contingency. (in point of fact, since it's possible to structure processors which transform all forms to a uniform intermediate representation, i doubt that the syntactic distinctions are necessary.) which brings me to ask why one would want to add more. for whatever reason. and, in passing, where it is noted >And subclassing implies implementation-inheritance (i.e. code reuse), >exactly what you were trying to avoid implying. be careful not to conflate subclassing, through "implementation inheritance", with code reuse. that applies only in languages which identify class/structure-implementation with behaviour-implementation. for a 'generic-function' language (eg. CLOS, DYLAN) specifications for code reuse are in terms of the type relations, not the class relations. bye, james anderson, xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From markb at iosphere.net Mon Nov 24 18:38:32 1997 From: markb at iosphere.net (Mark Baker) Date: Mon Jun 7 16:59:04 2004 Subject: XML and standards (was Re: Integrity in the Hands of the Client) In-Reply-To: <3479431B.BA9E33EE@technologist.com> Message-ID: On Mon, 24 Nov 1997, Paul Prescod wrote: > > What if that troff document contained a link to an implementation of a > > troff formatter? What if that implementation described its interface using > > XML? > > What if it didn't? What if it described its interface using CORBA or > some proprietary language that is more powerful than CORBA? You don't > lose any flexibity or expressive power, you just have to write another > parser for CORBA or your proprietary language. My point is that if it did, then no longer are clients responsible for interpreting the semantics of the data - a contained/referenced implementation is. In comp doc frameworks, when a new stream of data is introduced into a container, the framework decides the type of the data and then attempts to find an editor based on that type. The editor knows what to do with that data, and negotiates with the container for the real-estate for its presentation. For XML docs, the "type" doesn't have to be a DTD, though that might still be useful. The "type" could just as easily be a tag (so a single document would contain many embedded types). So if a well-formed document comes streaming into our container, the framework would start parsing it, come across a tag called 'troff', and then proceed to try and discover and install a chunk of code that knows how to parse/render troff. Or the document could provide its own ref(s) (more likely for scalability purposes). Either way, it's not the container (the client) that's responsible for interpreting the semantics of the data. It's the document itself that is responsible. > When you use XML to replace an existing standard, you are, for a period > at least, actually working against open standards and promoting a > proprietary alternative, even if it is expressed in the standard > notation of SGML/XML. In the example above, how might we implement that framework without assuming a data format? MB -- Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069 Will distribute business objects for food. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From cskerr at geocities.com Mon Nov 24 18:58:26 1997 From: cskerr at geocities.com (Charles Kerr) Date: Mon Jun 7 16:59:05 2004 Subject: MS XML paser only works with IE... Message-ID: <3479D099.310B46C1@geocities.com> (For those of you reading this on the xml-dev mailing list, the article referred to in this letter is at http://www.javalobby.org/jn001.htm#xml. WORA == Write Once, Run Anywhere) > Stating that MSXML is "Arguably the best XML parser for Java > today" I think is in error and inconsistent with the stated > views of the Java Lobby and our commitment to WORA. > MSXML is not 100% pure and the DSO (Data Source Object) > applet only works with MS IE 4.0 browsers. > > These imports are hidden in the source > import com.ms.com.*; > import com.ms.com.IUnknown; > import com.ms.com.Variant; > import com.ms.osp.*; > import netscape.javascript.JSObject; > > Not only that, but with this release of MSXML it appears > that Microsoft is attempting to fragment the XML community > by encouraging the use of non-standard end tags and other > things like the inline '&' that break XML. I can only conclude > that Microsoft is giving MSXML away for free --once again-- in > order to fragment the emerging XML standard. This is not a Java > application, and MSXML is something we should NOT endorse > or support. The MSXML dependencies on Windows are apparently trivial and fixable. Equally important, the MSXML EULA grants the right to redistribute such modified code. See the three letters (from the xml-dev mailing list) that I include at the end of this letter for more information. What I'd like to see is someone post these fixes so that each person wanting the portable version doesn't have to make the changes by hand. If anyone does this (clovett, you listening? :) and lets me know, I'll write it up in the news. Regarding MSXML's break with the XML spec, I was unaware of the and & notation -- it was discussed in the xml-dev mailing list right before I joined. It looks like Microsoft is, for once, interested in hearing constructive feedback. In particular Chris Lovett (clovett@microsoft.com) has encouraged such feedback. Anyone interested in this topic should send him polite mail requesting that MS stick to the spec. I can understand why you would be upset about this. The splintering by Microsoft of a great new technology is something that Java programmers seem mysteriously sensitive to. ;) And once you've written a body of code to work with the MSXML API it will be a nuisance to rewrite it if MS diverges even further from the Java or XML specs in the future. Nevertheless, I'll stand by my statement that MSXML is arguably the best XML parser for Java today. I commend MS for their great work and challenge others to add some competition. Lark (http://www.textuality.com/Lark/) is one promising alternative to MSXML but doesn't have as many features. I'm cc'ing this to the xml-dev mailing lists in the hopes that it will rekindle the discussion of the importance of sticking to the specs. Charles cskerr@geocities.org -- Unite for Java! http://www.javalobby.org/ ------------------------------------------------------ Exerpts from three letters on the xml-dev mailing list regarding XML's ties to Windows [1] > Windows dependency of MSXML is minimal. All you have to do > is following: > 1. remove com.ms.xml.dso package. > Delete the class files from the jar and/or comment it out of > the makefile. DSO is accessed by some of the samples but none > of the other MSXML packages. > 2. remove dependency on com.ms.xml.xmlstream package. > Latest version of MSXML includes an alternate XMLInputStream > class located inside the 'make' directory. Replace > com.ms.xml.util.XMLInputStream with the alternate version to > remove dependency on com.ms.xml.xmlstream package. > WIth above two changes, you will end up with a pure-Java version > of MSXML. MSXML is the most complete XML parser available > right now and you get the source code on top of it. I would > be smiling by now if I were you :-) [2] > The parser uses a newly-defined Interface to a stream library > that is specific to XML. The parser does not use the > implementations of streams provided in the JDK 1.1 packages for > the internet. I believe that this has to do with byte-ordering > problems in those implementations. I have not checked this > for myself. The interface per se has no platform dependencies. > It is shipped with two implementations. One implementation > is specific to Windows, the other is generic Java using JDK > packages. Neither has the byte-order flaw. You may > use whichever one you prefer. Both work. The generic one has > lower performance. > --Andrew Layman AndrewL@microsoft.com [3] > How 'bout that! Microsoft's EULA even grants us the right > to redistribute such modified code. Quite generous of them, > I must say. Microsoft just went up a point in my rating system. > I am indeed smiling now. :-) > My apologies to the MSXML team. > -- > Joe Lapp (Java Apps Developer/Consultant) > Unite for Java! - http://www.javalobby.org > jlapp@acm.org xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Mon Nov 24 19:02:26 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:05 2004 Subject: Sequence Access Languages ... Message-ID: <199711241902.TAA09040@GPO.iol.ie> Don't forget the DTD - the key difference between SGML/XML and other interchange formats IMHO. The DTD is at once rough sketch. formal blue-print, test-bed and QA check for perhaps gigabytes of data. >For data with a long expected >life-time, XML may actually be a better choice than comma/tab delimited >file _because_ it is so verbose. If the original architects choose tag >names which are clear, then when someone approaches the data 10 years >later, and the original authors are long gone, the chance of this new-comer >understanding the data format increases significantly. This is what Steven >Newcomb calls self-descriptive documents. (Steven/Peter, did I get that >right?) > Sean Mc Grath sean@digitome.com Digitome Electronic Publishing http://www.digitome.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From elm at arbortext.com Mon Nov 24 20:13:06 1997 From: elm at arbortext.com (Eve L. Maler) Date: Mon Jun 7 16:59:05 2004 Subject: Integrity in the Hands of the Client Message-ID: <3.0.32.19971124150739.0096d670@village.doctools.com> At 12:04 PM 11/22/97 -0500, Nelson Minar wrote: ... >I'm reminded of what happened in the first few months of 1994, when a >lot of people suddenly learned HTML. One of the most useful documents >(for me) of that period was Eric Tilton's essay "Composing Good HTML" >(since turned into a book, "Web Weaving", with Carl Steadman and Tyler >Jones). It was a short essay, but it laid out many of the basics of >writing HTML well - issues beyond syntax. Style issues like "don't say >'click here' in a document, integrate the anchor text into the >narrative". Structural issues like "don't misuse headers" and "try to >do logical formatting, not physical". And meta information >recommendations, like "put your name on documents" and "put a last >modified date on documents if it makes sense". For me, that essay made >HTML made sense, gave some order to the varied capabilities of the syntax. > >I tried to do my bit back then by writing an HTML editor tool (an >emacs mode) that made it easier to write good HTML. Indenting the HTML >source to show the document structure, providing simple templates to >get basic well formedness, automating last modified footers. And I >think it was reasonably successful - pages written with my editor were >at least a little better than pages written with nothing. > > >XML needs similar style guidelines and tools if people are going to >use it well. The problem for XML is harder than with HTML since XML is >more powerful. I think XML will be most successful for casual document >writers when there are standard well-established DTDs combined with >style sheets that are simple to use and very well documented as to >what the tags mean and how to use them. I don't know how to smooth the >process of helping people develop their own DTDs. I agree that XML needs similar guidelines; there's technology, and then there are the techniques with which you apply it. It's ideal if new users can get started with good habits as soon as possible. I would say the problem for XML is harder because XML is more "meta" (and it derives its extra power from that). Each DTD and DTD fragment will need its own user/style guide -- many of the established DTDs already have user guides, and for some there are even courses that teach you how to use them. If I may, I'd like to suggest that budding XML DTD writers check out my book, "Development SGML DTDs: From Text to Model to Markup" (ISBN 0-13-309881-8, published by Prentice Hall Professional Technical Reference ). It contains a system for doing the requirements analysis for, designing, implementing, and testing DTDs, and has a lot of technique advice in it (as well as some psychological advice for dealing with the shock of migration :-). Its focus is on publishing applications and corporate SGML use, but my co-author, Jeanne El Andaloussi, and I have used the basic methodology to create many DTDs for many different situations, and it seems to hold up very well. Also, the analysis and design phases can be completed with little detailed knowledge of SGML/XML language syntax. We wrote the book precisely to "smooth the process of helping people develop their own DTDs" for SGML; I'm certainly hoping that new XML users will find it helpful too. Best regards, Eve xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mrc at allette.com.au Mon Nov 24 23:15:53 1997 From: mrc at allette.com.au (Marcus Carr) Date: Mon Jun 7 16:59:05 2004 Subject: Sequence Access Languages ... References: <3.0.32.19971124100758.00936af0@mailhost.criinc.com> Message-ID: <347A0A8A.989C0D87@allette.com.au> Derek Denny-Brown wrote: > It really depends on the requirements. For data with a long expected > life-time, XML may actually be a better choice than comma/tab delimited file > _because_ it is so verbose. If the original architects choose tag names which > are clear, then when someone approaches the data 10 years later, and the > original authors are long gone, the chance of this new-comer understanding the > data format increases significantly. Or, if you would prefer, you could use shortref in SGML and parse the comma delimited files, making your input both a database dump and a valid SGML instance and your output valid XML. The point is, those of us coming to XML from SGML have experienced, grappled with, partially solved or lived with a lot of issues that those from other backgrounds may regard as being imperatives. The current discussion is a natural result of diverse and intelligent opinions, but a natural enemy of moderation and controlled change. I hope XML is allowed to settle in before anyone tries to fix anything, as I doubt if anyone has clear and complete perspective from all sides of this very large baby. -- Regards Marcus Carr email: mrc@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 9262 4777 fax: +61 2 9262 4774 _______________________________________________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Tue Nov 25 00:14:27 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:05 2004 Subject: MS XML parser only works with IE... Message-ID: <01bcf936$a360edc0$0100007f@localhost> >The MSXML dependencies on Windows are apparently trivial and fixable. >Equally important, the MSXML EULA grants the right to redistribute such >modified code. See the three letters (from the xml-dev mailing list) >that I include at the end of this letter for more information. What I'd >like to see is someone post these fixes so that each person wanting >the portable version doesn't have to make the changes by hand. If >anyone >does this (clovett, you listening? :) and lets me know, I'll write it up >in the news. FYI, after writing the first of the three letter mentioned above, I contacted Andrew Layman at MS and offered to help make MSXML completely portable without performance sacrifices. Both he and Chris Lovett liked the idea and we worked hard to make it happen over a weekend. There was never any hesitation from them about this effort and I am convinced that there was absolutely no ill will from them regarding peculiar 'features' of MSXML. They thought they were neat features and got their ears chewed off for it. All they needed was a gentle reminder instead of the slap they got. Let us not mix conspiracy theory with our judgement. WORA version of MSXML is coming soon from Microsoft. It will compile and run on any Java platform. It will take advantage of native libraries if available without recompilation. Its WORA without sacrifices. Its WORA-FOW (Write Once, Run Anywhere - Faster On Windows ;-p). Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ddb at criinc.com Tue Nov 25 00:22:59 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:59:05 2004 Subject: Sequence Access Languages ... Message-ID: <3.0.32.19971124162449.00a7b100@mailhost.criinc.com> At 10:15 AM 11/25/97 +1100, Marcus Carr wrote: >Derek Denny-Brown wrote: >> It really depends on the requirements. For data with a long expected >> life-time, XML may actually be a better choice than comma/tab delimited file >> _because_ it is so verbose. If the original architects choose tag names which >> are clear, then when someone approaches the data 10 years later, and the >> original authors are long gone, the chance of this new-comer understanding the >> data format increases significantly. >Or, if you would prefer, you could use shortref in SGML and parse the comma >delimited files, making your input both a database dump and a valid SGML instance >and your output valid XML. But using shortref would defeat the whole point of helping the documents to be "self-describing". I agree that in some cases, SHORTREF is not a bad idea, but I believe it should be sued sparingly. (Unless you are using it as a trick to import existing data... in which case all rules are off) > The point is, those of us coming to XML from SGML have >experienced, grappled with, partially solved or lived with a lot of issues that >those from other backgrounds may regard as being imperatives. The current >discussion is a natural result of diverse and intelligent opinions, but a natural >enemy of moderation and controlled change. I hope XML is allowed to settle in >before anyone tries to fix anything, as I doubt if anyone has clear and complete >perspective from all sides of this very large baby. There really is need of a good book, along the lines of of what Nelson Minar was talking about when he refered to >I'm reminded of what happened in the first few months of 1994, when a >lot of people suddenly learned HTML. One of the most useful documents >(for me) of that period was Eric Tilton's essay "Composing Good HTML" and the need for something with XML. Such a task is much harder for XML since XML can be used for many purposes. I fail to understand how "the current discussion" is an "enemy of moderation and controlled change". Which current discussion? In general, there has been a very small amount of talk about the need for things to change, and the significant comment (by Joe Lapp) to that effect, has resulted in one of the better discussions on how an application architect should plan to incorperate XML into their application, without "fixing" the standard. A number of good concise explanations of how to get the most of XML, and what the parser should do vs. the application. It is amazing how trying to teach someone what you think you know can help you understand the material even better. I am hoping that is true for a group (XML-Dev) as well as for the individual... -derek Derek E. Denny-Brown II || ddb@criinc.com "Reality is that which, || Seattle, WA USA when you stop believing in it, || WWW/SGML/HyTime/XML doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mrc at allette.com.au Tue Nov 25 02:52:39 1997 From: mrc at allette.com.au (Marcus Carr) Date: Mon Jun 7 16:59:05 2004 Subject: Sequence Access Languages ... References: <3.0.32.19971124162449.00a7b100@mailhost.criinc.com> Message-ID: <347A3D57.2002F7FE@allette.com.au> Derek Denny-Brown wrote: > >Or, if you would prefer, you could use shortref in SGML and parse the comma > >delimited files, making your input both a database dump and a valid SGML > >instance and your output valid XML. > > But using shortref would defeat the whole point of helping the documents to be > "self-describing". As part of the document, the DTD would act as a formal centralised reference rather than having to infer the structure by examination of the instances; I was alluding to data handling generally rather than the point you were making about self-describing documents. > There really is need of a good book, along the lines of of what Nelson Minar was > talking about when he refered to > >I'm reminded of what happened in the first few months of 1994, when a > >lot of people suddenly learned HTML. One of the most useful documents > >(for me) of that period was Eric Tilton's essay "Composing Good HTML" > and the need for something with XML. Such a task is much harder for XML since > XML can be used for many purposes. I agree, a flood of good books will be useful. I suspect that the diversity you mention will lead to smaller publications dealing with single or fairly tightly-grouped applications of XML. > I fail to understand how "the current discussion" is an "enemy of moderation and > controlled change". Sorry, that does read very badly. What I mean is that answers to XML issues should be given a fair chance to evolve naturally. HTML was allowed to be just a way to present documents while people figured out how to extend it in various directions. Although many lessons have been learned from HTML that XML can springboard from, I think there is some danger in the perception that XML is the best way to do almost everything. XML will be supplementary to what a number of organisations have been doing for a long time - for many it will just be a way of putting SGML on the web without converting to HTML first. "The current discussion" should of course go on - I also read it with interest. -- Regards Marcus Carr email: mrc@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 9262 4777 fax: +61 2 9262 4774 _______________________________________________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Tue Nov 25 03:31:16 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:05 2004 Subject: MS XML parser only works with IE... Message-ID: <199711250329.OAA07558@jawa.chilli.net.au> > From: Don Park > FYI, after writing the first of the three letter mentioned above, I > contacted Andrew Layman at MS and offered to help make MSXML completely > portable without performance sacrifices. Both he and Chris Lovett liked the > idea and we worked hard to make it happen over a weekend. There was never > any hesitation from them about this effort and I am convinced that there was > absolutely no ill will from them regarding peculiar 'features' of MSXML. > They thought they were neat features and got their ears chewed off for it. > All they needed was a gentle reminder instead of the slap they got. Let us > not mix conspiracy theory with our judgement. The other point is that floating "&" is required in SGML (even with the WebSGML adaptations, which have been accepted and are now being wordsmithed). Short tagging "" is an optional feature that can be enabled. If MSXML chooses to support some convenient SGML features on top of XML, I dont see what there is to complain of. It seems a bonus to me. One of SGML's main attractive features is that it does not attempt to enforce policy in many areas: it provides a toolkit and gives the user the choice. This makes it more complex of course. XML is a choice of particular features by various boffins and experts, and so XML will inevitably be suboptimal for some uses. And there is a lot of old SGML material. If having some clearly labelled SGML extensions makes MSXML handle kinds of other kinds of SGML as well as XML, great! In fact, the more full SGML implementation that MXSML provides the better, IMHO. Give us more, Chris and Andrew! Allow entities to have attributes like SGML does. Allow tag ommission like SGML and HTML do! It is the nature of software to have experiments. It is futile, but still good, to try to freeze syntax. I think this is why in the future we will end up with a range of markup languages from XML to SGML '97. If this is an alarming option (and it is), then the displine is for XML developers (not parser makers) to only use XML features in their systems. I am sure that everyone who has been through SGML will agree that it is difficult to not all the time wish for your favorite enhancements. And, if you bite the bullet and decide to go with the standard, you may then get flack for being an unthinking sheep :-) The problem is not with Microsoft for making their XML parser also handle SGML better, the problem will be with users of the parser in software if they use these features over the web rather than inhouse. I.e. the problem is "us" not "them". Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Tue Nov 25 14:09:48 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:59:05 2004 Subject: MS XML parser only works with IE... Message-ID: >The other point is that floating "&" is required in SGML (even with the >WebSGML adaptations, which have been accepted and are now being wordsmithed). >Short tagging "" is an optional feature that can be enabled. I think we would do well to remember that XML is NOT SGML and should not be allowed to fall prey to the incredible number of 'options' that have made SGML worthless to a large number of developers. Short tagging is NOT an optional feature of XML, and should NOT be a feature of MSXML either. If it is allowed to be an optional feature, than my XYZ parser is either going to have to accept Microsoft's 'extensions' or reject a lot of documents created by people who only tested on the Microsoft tools. >XML is a choice of particular >features by various boffins and experts, and so XML will inevitably be >suboptimal for some uses. Fine. Let's start off suboptimal and get a standard that works instead of a standard that can be embraced and extended by any software company that thinks it has a new grand idea. >Give us more, Chris and Andrew! Allow entities to have >attributes like SGML does. Allow tag ommission like SGML and HTML do! Do not give us more, Chris and Andrew, if you really like XML. If you want to kill it quickly, add lots of extra SGML parts. >The problem is not with Microsoft for making their XML parser also handle >SGML better, the problem will be with users of the parser in software if they >use these features over the web rather than inhouse. I.e. the problem is >"us" not "them". The problem is an incompatibility between the "us"es and "them"s of the world. Keep XML as clean as possible, at least for now. Forget everything you knew about SGML's intricacies and focus on what XML, not SGML, can do for the world, and with any luck, the world might take XML sersiously. While working on XML: A Primer, I used the Alpha 1.0 MSXML to test my code, aware of many of its difficulties. As I discovered when 1.6 came out, it had let me wander outside the spec in a number of key places (mixed declarations, for one) that took my code outside of valid XML. I've fixed it all now, but the experience has left me extremely wary of tools that go beyond the standard, intentionally or accidentally. Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Tue Nov 25 16:14:43 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:59:05 2004 Subject: MS XML parser only works with IE... Message-ID: <7BB61B44F197D011892800805FD4F79201CD6639@red-03-msg.dns.microsoft.com> I think a little more grace and courtesy is called for here. Microsoft has been working very hard to ship parsers that track the evolving spec. As with any unfinished product, particularly one whose specifications are clearly marked "work in progress," there are going to be some areas where the product lags behind the spec or visa versa. Regarding the short tagging, did anyone actually run the code? If so, you would have discovered that the parser does not respect short tagging unless you go out of your way to turn it on via an undocumented method that is not meant for clients to call. It is not a secret feature (we give away the source code) but it is not part of parsing normal XML. If we were trying to trick people into using this facility, we sure went out of our way to fail! I recommend approaching this with a bit more benevolence and researching things a little more before assuming a conspiracy. --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: Simon St.Laurent [SMTP:SimonStL@classic.msn.com] > Sent: Tuesday, November 25, 1997 6:07 AM > To: Rick Jelliffe; Xml-Dev (E-mail) > Subject: RE: MS XML parser only works with IE... > > >The other point is that floating "&" is required in SGML (even with the > >WebSGML adaptations, which have been accepted and are now being > wordsmithed). > >Short tagging "" is an optional feature that can be enabled. > > I think we would do well to remember that XML is NOT SGML and should not > be > allowed to fall prey to the incredible number of 'options' that have made > SGML > worthless to a large number of developers. Short tagging is NOT an > optional > feature of XML, and should NOT be a feature of MSXML either. If it is > allowed > to be an optional feature, than my XYZ parser is either going to have to > accept Microsoft's 'extensions' or reject a lot of documents created by > people > who only tested on the Microsoft tools. > > >XML is a choice of particular > >features by various boffins and experts, and so XML will inevitably be > >suboptimal for some uses. > > Fine. Let's start off suboptimal and get a standard that works instead of > a > standard that can be embraced and extended by any software company that > thinks > it has a new grand idea. > > >Give us more, Chris and Andrew! Allow entities to have > >attributes like SGML does. Allow tag ommission like SGML and HTML do! > > Do not give us more, Chris and Andrew, if you really like XML. If you > want to > kill it quickly, add lots of extra SGML parts. > > >The problem is not with Microsoft for making their XML parser also handle > >SGML better, the problem will be with users of the parser in software if > they > >use these features over the web rather than inhouse. I.e. the problem is > >"us" not "them". > > The problem is an incompatibility between the "us"es and "them"s of the > world. > Keep XML as clean as possible, at least for now. Forget everything you > knew > about SGML's intricacies and focus on what XML, not SGML, can do for the > world, and with any luck, the world might take XML sersiously. > > While working on XML: A Primer, I used the Alpha 1.0 MSXML to test my > code, > aware of many of its difficulties. As I discovered when 1.6 came out, it > had > let me wander outside the spec in a number of key places (mixed > declarations, > for one) that took my code outside of valid XML. I've fixed it all now, > but > the experience has left me extremely wary of tools that go beyond the > standard, intentionally or accidentally. > > Simon St.Laurent > Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) > > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Tue Nov 25 16:48:07 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:59:05 2004 Subject: MS XML parser only works with IE... Message-ID: >I think a little more grace and courtesy is called for here. Microsoft has >been working very hard to ship parsers that track the evolving spec. >... >I recommend approaching this with a bit more benevolence and researching >things a little more before assuming a conspiracy. I wasn't promoting a conspiracy (the word appeared nowhere in my post), as you might know if you remembered my messages from earlier this month, which included a fairly extensive discussion of Microsoft's former demonstration of short-tagging in the MSXML site, all of which has been removed. I have researched this more extensively than I wanted to by a considerable margin. I do not hold Microsoft to be a villain in this case. The target of my post, which apparently lacked 'grace and courtesy' was not Microsoft - it was the SGML folks who clamor for every piece of junk that's littered the SGML spec to be included in XML. I clamored at one point for CDATA myself, but I've decided to rest and let the spec take its own course, as simple as possible. I'll elaborate on this in a more extended post later this week. Microsoft has created an excellent parser, and I'm very glad to hear regularly on this list about your continual willingness to produce XML compliant and 100% Java XML parsing solutions. Keep up the good work, but please try to read my postings a little more closely before assuming that I'm accusing Microsoft of fomenting world grief. Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Tue Nov 25 18:51:40 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:59:05 2004 Subject: MS XML parser only works with IE... In-Reply-To: <199711250329.OAA07558@jawa.chilli.net.au> (ricko@allette.com.au) Message-ID: <199711251850.KAA16423@boethius.eng.sun.com> | If MSXML chooses to support some convenient SGML features on top of XML, | I dont see what there is to complain of. It seems a bonus to me. This is a license to repeat the browser wars of the last three years and hold users hostage to particular software packages. If you want full SGML support, then lobby for *consistent* full SGML support. Anything less than that will create exactly the kind of vendor dependence that we are trying to get away from. Jon xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Tue Nov 25 19:12:50 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:05 2004 Subject: MS XML parser only works with IE... In-Reply-To: References: Message-ID: <199711251913.OAA00314@unready.microstar.com> Simon St.Laurent writes: > While working on XML: A Primer, I used the Alpha 1.0 MSXML to test > my code, aware of many of its difficulties. As I discovered when > 1.6 came out, it had let me wander outside the spec in a number of > key places (mixed declarations, for one) that took my code outside > of valid XML. I've fixed it all now, but the experience has left > me extremely wary of tools that go beyond the standard, > intentionally or accidentally. As I remember hearing it a few years back, one of the basic rules of the Internet was to be conservative in what you produce and liberal in what you accept. With that in mind, I'd suggest using a very strict, validating parser on the authoring side, like NSGMLS or NXP (I haven't tried Lark). On the production side, use whatever works for you. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Nov 26 01:17:31 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:05 2004 Subject: Parser considerations (was: MS XML parser only works with IE...) In-Reply-To: <199711251913.OAA00314@unready.microstar.com> References: Message-ID: <3.0.1.16.19971126020423.2a07fd4a@pop3.demon.co.uk> At 14:13 25/11/97 -0500, [many people] wrote about MSXML Some of the things we mustn't forget at this time are: - there is as yet no frozen XML 'recommendation' (I hope that's the correct term). Under those circumstances it is unlikely that there are any completing conforming parsers; the spec is still changing and so any parser has addressed a moving target. - for many people helping in the development of XML the question of 'best parser' is not appropriate at this stage - and I suspect not for at least 3 months. The spec is quite large and is a lot of effort to implement (those of us who have hacked parsers know). Many of us give up on points we don't understand (for me it was parameter entities, and that caused others grief as well :-). So until we see the next spec [is there a later public one than Aug 7?] we can't be sure whether a parser 'gets PEs right' :-). I sympathise with anyone who has failed to implement part of the current spec, and I hope that people trying out parsers and other software will take a constructive view of such 'failings'. - I believe that all parser writers at present would like their parsers validated. Validation *of* a parser seems to me to include checks on - reporting errors in non-conforming XML documents - asserting that a conforming XML document is conforming - carrying out defined transformations on the original input All of these require a set of test inputs, which I believe we badly need at present. It is very likely that a parser writer at present will overlook something in the spec. Checking the transformations is less easy as there is no defined output. How, for example, do we check that parser A transforms all the entities correctly? An important way is to make sure that the outputs of two independent parsers agree. To this extent, whatever we think about 'steenking ESIS' [a quote from the source code of a well known XML parser], it is at least checkable :-) - the really hard bit comes when the semantics of behaviour are unclear. Does the statement require the parser to *do* anything? Different authors will certainly have different ideas - some see it as a request by the author that the document must be validated - authors that if the reader wishes to validate it, then this is the doctype that should be used. There are many subtleties of this sort. I believe that the development of XML has been one of the outstanding achievements of the WWW. It has been fast, rigorous, fair, open, and required extraordinary commitment and patience from those involved. Often the SIG has had 50 emails a day, and many have required a great deal of careful reading. I have been very gratified by the level and amount of constructive contributions to XML-DEV as this is an important area for ironing parts the spec cannot reach. I remember the agonies of early C++ compilers where every platform and vendor had messages 'this feature not supported' and so on. I believe that all contributors on this list want to avoid this and that 'any valid XML document can be parsed with any XML parser'. Since some parsers may purport to be XML compliant but not be, it is critical that this fact can be recognised, and a test suite of documents seems to be a key instrument. I hope very much that authors of such parsers will be able to find the energy to mend them :-) If - at some future time - I were looking for attractive features in an XML parser and after discarding the non-compliant ones, I would want to consider a wide range and I doubt that any one parser would 'win' in all aspects. To this end I am trying to make JUMBO accept a range of parsers by a simple commandline switch (or button). Thus: java jumbo.sgml.SGMLTree foo.xml parser=NXP (or Lark) I can quite envisage where a user wants to use parser A to read in the initial document (perhaps because it is large, or tree-structured) and parser B to read the entities. I am delighted to hear about WORA-MSXML, and shall hope to look at it shortly. I hope it's easy to bolt into JUMBO. I am slightly disappointed that Xapi-J seems to have become dormant, because then work inside JUMBO would be minimal. At present most of the parsers I have encountered are event-driven (e.g. doStartTag, doError...) and not all build trees (JUMBO is happy to build trees from streams) . If, indeed, this is the model most people use, then let's get a standard terminology (Element, PI, ElementType, Attribute, etc.) It would make things so much simpler. I also expect we could get a very very simple API defined... P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Nov 26 01:20:17 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:05 2004 Subject: Inheritance In-Reply-To: <3479C712.985B01D9@mixx.de> References: Message-ID: <3.0.1.16.19971126021919.2f7726c0@pop3.demon.co.uk> At 19:27 24/11/97 +0100, james anderson wrote: >greetings, > >sorry to start in the middle of this thread, but as an xml novice i'm >wondering why one is at all concerned to extend a language intended to >mark up "structure" in order to encode "behaviour". (this being the >distinction made by separating 'class' and 'type'). We all started off as novices, so don't be afraid. I'm assuming that everyone on this list knows that the mails are hypermailed at: http://www.lists.ic.ac.uk/hypermail/xml-dev and that it is possible to search this archive. So you can go back to the start of this thread if it helps (I don't know whether it does or not). Also I have attempted to abstract some of the posting that may have some lasting value in http://www.vsms.nottingham.ac.uk/vsms/xml/jewels.html XML discussions have a cyclic nature - like sunspots - the same topic reoccurring at intervals of a few months. Since it's often due to 'novices' joining the club, we're delighted. You will also find that the SGML community is patient and does not regard ignorance as a crime (some other things are :-). Precision in language is highly valued. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From shibl at w4.ca Wed Nov 26 02:05:57 1997 From: shibl at w4.ca (Shibl Mourad) Date: Mon Jun 7 16:59:05 2004 Subject: MS XML parser only works with IE... References: <199711251850.KAA16423@boethius.eng.sun.com> Message-ID: <347B8421.E77@w4.ca> Jon Bosak wrote: > This is a license to repeat the browser wars of the last three years > and hold users hostage to particular software packages. I know that I am going to be hated for saying this, but the browser wars was a phenomenal success and prompted the development of excellent and useful technology very rapidely. Compare this with standards first technology (eg SGML) where the rate of progress is much slower and the end benefits to the user (not the devloper) much smaller. XML needs some breathing space where new features could be made to live if popular and die if irrelevant. Shibl xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Nov 26 02:23:59 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:06 2004 Subject: MS XML parser only works with IE... Message-ID: <199711260224.VAA00626@unready.microstar.com> Shibl Mourad writes: > Jon Bosak wrote: > > > This is a license to repeat the browser wars of the last three years > > and hold users hostage to particular software packages. > > I know that I am going to be hated for saying this, but the browser wars > was a phenomenal success and prompted the development of excellent and > useful technology very rapidely. Both of these statements are, to an extent, correct. The browser wars introduced or brought into the mainstream many interesting innovations, but few (if any) of the good ones are a result of the mess that both Netscape and Microsoft have both made of HTML. Applets, real-time audio and video, virtual-reality, animations, and other types of interaction have certainly made the web more exciting, but why is it so difficult to find web pages that display well on my 640x480 notebook screen (and what's going to happen on even lower-resolution TV screens)? How many web pages could visually-impared people usefully have their software read aloud to them? Why is it sometimes hard to write a web page that displays properly in both Netscape and MSIE? It is possible to innovate without messing around with the standards (though, to be fair, there won't be an XML standard as such for a couple more weeks). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Wed Nov 26 11:32:40 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:06 2004 Subject: MS XML parser only works with IE... Message-ID: <199711261130.WAA17193@jawa.chilli.net.au> > From: Simon St.Laurent > I think we would do well to remember that XML is NOT SGML and should not be > allowed to fall prey to the incredible number of 'options' that have made SGML > worthless to a large number of developers. The current XML draft says "(XML) is an extremely simple dialect of SGML." That is the first sentence of the abstract. I was a member of the SIG from quite early on, and it has always been the official line. So XML says it is SGML. Furthermore, the recent correction to SGML (WebSGML), which is in its next-to-final draft before release (it has already been voted) means that there should be no doubt that the national standards bodies involved with ISO want SGML to be XML-accepting too. I have attended ISO meetings on this, and the ISO people certainly do not see XML as something independent of SGML either. The optional features of SGML have not made it worthless to developers. The complexity of unadorned SGML and the generality of its toolkit approach is the thing that made it dificult. The very thing that makes you rich makes you poor. XML (and the companion change to the SGML standard) have reduced this base level. It is pure blue-sky to think that one syntax can meet everyone's need. I am not saying there should be options in XML. I am saying if someone wants more than XML, there are many things in SGML that are useful, and if Microsoft want to implement them, good for Microsoft. Of course, these should not be termed "experimental XML" features. They should be labelled "non-XML SGML" features. I already said words to that effect. > Fine. Let's start off suboptimal and get a standard that works instead of a > standard that can be embraced and extended by any software company that thinks > it has a new grand idea. Am I saying anything other? XML was developed as the technology of choice for delivering SGML on the Web. I support that 100%. But if a company wants to use something more powerful at their back-end, why shouldn't they use a more powerful language nearer SGML if that serves their inhouse needs better. And why shouldnt Microsoft allow this in their parser? Any tools just need to have a checkbox marked "XML only" to keep things obvious. And XML has draconian error correcting, so data with more than XML will not work over the web anyway! > Keep XML as clean as possible, at least for now. Forget everything you knew > about SGML's intricacies and focus on what XML, not SGML, can do for the > world, and with any luck, the world might take XML sersiously. The spanner is that many of SGML intricacies are responses to real problems. For example, XML (and WebSGML) let you pass all whitespace to the application, which means the application itself must be more complicated since there is no standard way to cover the problem of what to do if your editor has a fixed line length and you need to stick in an element that would cause a wrap, but you do not want to put in a newline in the data. XML development has been an exhaustive analysis of every part of mainstream SGML. And I think almost everyone on the SIG would agree that there are good reasons for almost all the non-intuitive parts of SGML. However, the need to be straightforward (the #1 goal of XML) means that there is a different cost/benefit trade-off for deciding what should go into the base language (compared to SGML in the early 1980s). The English-using world already runs on SGML. Computer chips, air transport, legal systems, the military, many stock markets, much print media, diagnostics of office equiement, and (with HTML 4.0) WWW. Any claim that SGML is not good for what it has tried to do are wrong, as far as the market has spoken. > The target of my post, which apparently lacked 'grace and courtesy' was not > Microsoft - it was the SGML folks who clamor for every piece of junk that's > littered the SGML spec to be included in XML. Do you have access to the deliberations of the XML SIG or WG? If you do not, you have no way of knowing what "SGML people" clamoured for, and if you do then you are just wrong. The minimal SGMLs that were proposed (by "SGML people" since there were no others) at the start were all substantially smaller than what we have now in XML. In fact, XML has grown largely because we found there was so much of SGML that was needed. Only this week there are last minute calls (from "SGML people", who Simon deems himself to be so different from) to make several quite important simplifications to XML. And, in any case, the distinction between SGML and XML people is entirely spurious. If you use XML, you are an SGML person. You have bought into the idea of using a human readable Language, of adding Markup to character data, of markup up Generalized elements rather than a fixed low level tagset, and you think it is good to have a common Standard. The fact that you find ISO 8879 baffling and horrible does not make you anti-SGML, an more than the fact that I cannot read my video recorder manual make me anti-TV. SGML is not the enemy. The enemy is poorly described data that is no use, and systems that are inappropriately complicated (or simple) for their user requirements. SGML is merely a toolkit for constructing markup languages, which includes a lot of features that are not relevant to delivering structured data over the Web. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Wed Nov 26 13:35:47 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:59:06 2004 Subject: SGML and XML Message-ID: XML is the best opportunity I've yet seen to create a standard which handles documents (and other data) intelligently yet simply. Whatever XML's roots (which of course are SGML), XML has the opportunity to reach an extremely broad audience - an audience the size of the current (and future) HTML audience, not just the established SGML community. The terms of the XML discussion have always been framed in SGML, and are likely to continue to be for a considerable time to come. While that has advantages, I don't think the concept of using XML as a Trojan Horse to introduce SGML proper to a larger audience is a good one. I gave a seminar two weeks ago in Washington DC to the ACM - a place and an organization that I would tend to think of as friendly to SGML. Of 50 people in the seminar (which was on Dynamic HTML), 15 had worked with SGML. Every time I brought up SGML (in connection with XML, CSS, and the DOM), I was greeted with questions about "is that really necessary?" "Are those SGML people trying to change _our_ world?" These questions didn't just come from the HTML beginners; many of them came from the developers who had worked with SGML, some quite extensively. At lunch the discussion quickly turned to XML, and I had to do a lot of convincing to get people 'past' SGML. For public relations reasons, it seems like XML needs to be able to have it both ways. Companies already using SGML and developing SGML tools need to be encouraged to accept XML - not as a replacement for SGML, but as something to take seriously. The larger non-SGML community, however, needs to be given XML as something new and different. XML should not just carry in SGML's reputation as a complicated, slow-to-develop, and difficult-to-implement tool of the Federal Government. XML evangelists need to be able describe the problems that XML fixes and how it fixes them, without reference to enormous systems that SGML has created in the past. >So XML says it is SGML. Furthermore, the recent correction to SGML (WebSGML), which >is in its next-to-final draft before release (it has already been voted) >means that there should be no doubt that the national standards bodies >involved with ISO want SGML to be XML-accepting too. I have attended ISO >meetings on this, and the ISO people certainly do not see XML as something >independent of SGML either. XML says it is SGML. Fine. But should the future development of XML be aimed at gradually including SGML features, or should it be aimed at meeting the needs of the developing XML community? I expect the XML community in six months to a year to be rather distinct from the SGML community and hopefully quite a bit larger. This issue will grow; we'll see what the W3C and ISO do. >The complexity of unadorned SGML and the generality of its toolkit approach >is the thing that made it dificult. The very thing that makes you rich makes >you poor. And conversely, the thing that makes you poor will make you rich. HTML took off because it was brilliantly simple. (There were plenty of other factors, of course, but simplicity was key.) SGML has done very well in sectors that were able to make the investment in learning SGML, developing in SGML, and creating systems around SGML. XML has the opportunity to take its much simpler toolkit to a much larger audience. Simplicity is key to reaching that larger audience; adding SGML features, even with an on/off switch, is likely to confuse new users of XML while still disappointing the SGML community. >But if a company wants >to use something more powerful at their back-end, why shouldn't they use >a more powerful language nearer SGML if that serves their inhouse needs >better. And why shouldnt Microsoft allow this in their parser? If a company wants to use something more powerful, why don't they consider 'real' SGML an get a parser designed for that instead of creating documents that are called XML but are no longer XML? Using this suggestion effectively will require a new series of standards to define what features of SGML have been added to a set of documents so that people don't blindly run them through XML parsers with the switch set wrong. Data interchange will be a mess, once again. >XML development has been an exhaustive analysis of every part of mainstream >SGML. And I think almost everyone on the SIG would agree that there are >good reasons for almost all the non-intuitive parts of SGML. However, the >need to be straightforward (the #1 goal of XML) means that there is >a different cost/benefit trade-off for deciding what should go into the >base language (compared to SGML in the early 1980s). There is a completely different cost-benefit analysis. XML is the grand opportunity to extend generalized markup to a far larger audience than exists today. There may be good reasons for almost all the non-intuitive parts of SGML, but the fact remains that these non-intuitive features have been barriers to use and development. After reading some of the ISO specs and too large a chunk of the SGML literature, it became quite clear to me why SGML never percolated down to small companies and developers. It's too complicated to be used without considerable upfront investment. >The English-using world already runs on SGML. Computer chips, air >transport, legal systems, the military, many stock markets, >much print media, diagnostics of office equipment, and (with HTML 4.0) >WWW. Any claim that SGML is not good for what it has tried to do >are wrong, as far as the market has spoken. The market has spoken that SGML does a great job for managing enormous amounts of information. It has also spoken that SGML presents enormous barriers to entry (steep learning curve, cost of development, etc.) that have kept a lot of people from using it. SGML does a great job in many systems. The "many" there, however, is a tiny select few compared to the many that a simpler syntax (i.e. XML) could reach. The scale of those projects is very different from those XML makes possible. >And, in any case, the distinction between SGML and XML people is entirely >spurious. If you use XML, you are an SGML person. This distinction will grow as XML is adopted more widely. Visit the high-end web development mailing lists and you'll find an incredible amount of hostility to SGML but a simmering interest in XML. If you use XML, you are using SGML tools. This does not make you an SGML person. As you may have detected, I do have a certain amount of hostility toward SGML and SGML culture, while remaining very enthusiastic about XML. >SGML is not the enemy. The enemy is poorly described data that is no use, >and systems that are inappropriately complicated (or simple) for their >user requirements. SGML is merely a toolkit for constructing markup >languages, which includes a lot of features that are not relevant >to delivering structured data over the Web. XML appears to be addressing the problems with SGML that have kept it from being used by a wider audience. Poorly described data is the real enemy, of course. Attacking that enemy in a larger sense requires a reconsideration of the weapons we have used previously and a refinement. XML's simplicity will encourage a large number of people to describe their data properly, people who wouldn't have bothered with SGML. This is an improvement, and the SGML community deserves great credit for the effort they have poured into building a simple but useful toolkit, which avoided the byzantine complexity SGML proposals are known for. XML is more than just SGML, however. XML is going to bring a lot of 'bozos' into the field of markup, people who care neither about the history nor the theory and just want to get things done. A different attitude and different needs will very likely increase the demands for XML to find its own voice. I could, of course, be dead wrong. We'll know in a couple of years. Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Wed Nov 26 14:41:13 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:06 2004 Subject: MONDO Design Document v0.3 Message-ID: The first public release of the MONDO Design document is available at: http://www.chimu.com/projects/mondo/design/mondoDesign.pdf The home page for the MONDO project is: http://www.chimu.com/projects/mondo/ The following is the first overview paragraph: ========================================================= This document describes MONDO, a generalized architecture for encoding, modeling, and processing information. MONDO is the result of evolving and integrating the concepts from descriptive markup with the concepts from object-oriented information modeling. This produces a very flexible and powerful system for working with both structured documents and human-readable information models, and removes the boundaries separating them. The techniques and tools from multiple industries can be focused on common problems. ========================================================= The document is not quite where I was hoping it would be, but enough of the core concepts are there that it should be readable. Another version of the document will come out in the next week or so to address some of the difficiences and the feedback that I receive. We normally publish documents in HTML as well as PDF but the conversion program is crashing over some of the diagrams and we have not had time to track them down and fix them. This will be fixed in the next release. The release of MONDO-J code will probably be next week and I will send out notice when it is downloadable. All feedback is very appreciated. --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Wed Nov 26 16:21:30 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:06 2004 Subject: SGML and XML Message-ID: <199711261619.DAA22835@jawa.chilli.net.au> > From: Simon St.Laurent > I gave a seminar > two weeks ago in Washington DC to the ACM - a place and an organization that I > would tend to think of as friendly to SGML. Of 50 people in the seminar > (which was on Dynamic HTML), 15 had worked with SGML. Every time I brought up > SGML (in connection with XML, CSS, and the DOM), I was greeted with questions > about "is that really necessary?" "Are those SGML people trying to change > _our_ world?" These questions didn't just come from the HTML beginners; many > of them came from the developers who had worked with SGML, some quite > extensively. At lunch the discussion quickly turned to XML, and I had to do a > lot of convincing to get people 'past' SGML. > For public relations reasons, it seems like XML needs to be able to have it > both ways. > As you may have > detected, I do have a certain amount of hostility toward SGML and SGML > culture, while remaining very enthusiastic about XML. So you are a speaker with hostility to SGML, and your audience picks up on it. Maybe that just means you are a sympathetic and hypnotic speaker :-) However, I do think that a lot of the antagonism against SGML is actually antagonism against the standard ISO 8879 (which is not intended to be remotely entry-level or novice-friendly) mixed with antagonism against the early HTML DTDs (which were overly-complicated, IMHO, in structure for their readerships, as it turned out). Plus the fact that SGML implementations often involve converting peoples minds from presentation structure to logical structure, which many people find is a big change in discipline and job description (XML wont alter that!). Plus many SGML editing environments are not set up to simulate element structures with different formatting, so an operator cannot use simple visual cues of presentation to keep track of their progress. I worked for a company (Allette) that gets most of its jobs from SGML projects that had failed at other companies. When we looked at what made them fail, it was very often because the DTD did not describe the structures required, or because of invalid documents which reflect poor QC, and because not smart enough programming systems were used. XML does not address any of these issues, so I think that the kinds of projects Allette was troubleshooting (which are presumably ones that will feed out disgruntled programmers to ACM meetings) would not have been helped. Which is in no way to deny that SGML the technology does not have some dross, and that its wording can be improved. > This is an improvement, and the SGML community deserves great credit for the > effort they have poured into building a simple but useful toolkit, which > avoided the byzantine complexity SGML proposals are known for. Which proposals? > XML is more > than just SGML, however. XML is going to bring a lot of 'bozos' into the > field of markup, people who care neither about the history nor the theory and > just want to get things done. A different attitude and different needs will > very likely increase the demands for XML to find its own voice. Yes. And all the questions "are declarations good", "should we remove constants to headers, or allow inline declarations?", "why isnt everything an element, wouldnt that be simpler?", and "why cant we leave out these strings, since they are not needed for parsing?" and so on. The trouble with slagging off at SGML is that because there is no difference in technology between XML and WebSGML, it all ends up in personal attacks on people who have been able to use SGML, or on the people who invented it, or even just us innocent bystanders who happen to go to committee meetings. I have seen this happen many times before. (I am not saying you are doing this Simon, merely that I have seen it many times. In anycase, you are writing a book and your antagonism will teach a new generation of XML people, who may therefore feel less likely to buy my book :-) Please say "XML is simpler than SGML '86" and "XML is better for small systems than SGML '86" and "SGML has many things that are not needed" but not "SGML people are trying to make everything complicated and make XML as bad, complex, over-engineered and stinky as SGML". This demonizing of "SGML people" is a bad way to win people over to XML. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Nov 26 18:12:13 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:06 2004 Subject: MS XML parser only works with IE... References: <199711251850.KAA16423@boethius.eng.sun.com> <347B8421.E77@w4.ca> Message-ID: <347C6745.DFDE457@technologist.com> Shibl Mourad wrote: > > Jon Bosak wrote: > > > This is a license to repeat the browser wars of the last three years > > and hold users hostage to particular software packages. > > I know that I am going to be hated for saying this, but the browser wars > was a phenomenal success and prompted the development of excellent and > useful technology very rapidely. Yes, competition is good. But proprietary extensions like those made to HTML *retard* competition by raising the bar for new participants. > Compare this with standards first technology (eg SGML) where the rate of > progress is much slower and the end benefits to the user (not the > devloper) much smaller. Please back up this statement. Do we consider a Fortune 500 company slashing their technical writing budget "a user"? Of would we call the individual technical writers, who can reduce duplication, find information faster and reuse it more effectively "the user." In either case, how would you argue that SGML, Java, C++, IPNG, CORBA and other "standards first" technologies have "few benefits to the user." Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Nov 26 18:17:29 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:06 2004 Subject: MS XML parser only works with IE... References: Message-ID: <347C688D.533E0037@technologist.com> > Short tagging is NOT an optional > feature of XML, and should NOT be a feature of MSXML either. If it is allowed > to be an optional feature, than my XYZ parser is either going to have to > accept Microsoft's 'extensions' or reject a lot of documents created by people > who only tested on the Microsoft tools. If a user enables a non-standard option, they get what they deserve. It's as simple as that. Every compiler I have ever used has had flags for non-standard options. When Microsoft serves non-standard documents over the Web, that's another issue. The web is the place for interoperability. But in Microsoft's own source code, they can embed an RTF parser if they bloody well feel like. They do have a responsibility to make clear the distinction between the RTF features and the XML features, of course, but they don't have a responsibility to make software that exclusively handles W3C XML. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Nov 26 18:19:23 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:06 2004 Subject: SGML and XML References: Message-ID: <347C6514.CB6804E8@technologist.com> Simon St.Laurent wrote: > XML says it is SGML. Fine. But should the future development of XML be aimed > at gradually including SGML features, or should it be aimed at meeting the > needs of the developing XML community? This is a completely false dichotomy. XML will grow *both* to gradually include SGML features and to extend SGML in ways specific to the Web community. The relevant example is the short-tag syntax. This is *much more* appropriate on the Web, where everyone is used to editing things by hand, than in the SGML world, where we often buy expensive editors or use emacs. It is also much more appropriate in XML, which does not have tag minimization than in general SGML, which does. In other words, the Microsoft people were trying to solve a problem for Web users by recognizing a good idea in SGML. This is exactly *why* XML was designed to be a subset of SGML (it didn't have to be). > If a company wants to use something more powerful, why don't they consider > 'real' SGML an get a parser designed for that instead of creating documents > that are called XML but are no longer XML? Using this suggestion effectively > will require a new series of standards to define what features of SGML have > been added to a set of documents so that people don't blindly run them through > XML parsers with the switch set wrong. Data interchange will be a mess, once > again. I can't believe that this is your logical extrapolation from an *undocumented* switch in a parser for a language that doesn't exist yet. The mere hint of extra features is enough to bring the Web crashing to its knees. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ddb at criinc.com Wed Nov 26 18:29:43 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:59:06 2004 Subject: SGML and XML Message-ID: <3.0.32.19971126103137.00a9b6e0@mailhost.criinc.com> At 01:18 AM 11/27/97 +1100, Rick Jelliffe wrote: >> From: Simon St.Laurent >> I gave a seminar >> two weeks ago in Washington DC to the ACM ... >> ... Every time I brought up >> SGML (in connection with XML, CSS, and the DOM), I was greeted with questions >> about "is that really necessary?" "Are those SGML people trying to change >> _our_ world?" These questions didn't just come from the HTML beginners; many >> of them came from the developers who had worked with SGML, some quite >> extensively. At lunch the discussion quickly turned to XML, and I had to do a >> lot of convincing to get people 'past' SGML. > >However, I do think that a lot of the antagonism against SGML is actually >antagonism against the standard ISO 8879 (which is not intended to be remotely >entry-level or novice-friendly) mixed with antagonism against the early HTML >DTDs (which were overly-complicated, IMHO, in structure for their readerships, >as it turned out). I would tend to disagree. I have talked to a number of people who are antagonistic against SGML because the standard is so complicated. The fact that it takes a book that large to really give an implementor enough information to build a parser says something. As does the fact that SP is roughtly 1Mb compiled. There are reasons for all of this, but people tend to avoid things which take too long to understand, and react adversely when they are forced to use something which they don't understand. Part of the problem falls back to the tools, but if the initial standard had been more directed to a specific audience, then the tools would have been easier. Generality has its pros and cons. SGML was so general that it was extremely complicated and only the determined could wade through the initial waves of confusion. Thus there were very few people who 'understood' this SGML thing, so organizations trying to use SGML had to get by with people who "didn't get SGML," and as a result had a horrid time at it. Thus there are a number of people who think SGML is "a bad thing" because 3/4 projects using it crashed and burned... (the preceeding figure is purely random. I personnaly have watched a number of projects fail, but I claim no knowledge of a general success/failure rate....) This is not to say SGML is a bad thing. SGML is based on some extreemly sound ideas, which are real driving requirements in a number of industries. (otherwise SGML would have been dead a long time ago) XML (hopefully) is the necessary compromises to get SGML used in more of the cases where it can really provide benefit. -derek Derek E. Denny-Brown II || ddb@criinc.com "Reality is that which, || Seattle, WA USA when you stop believing in it, || WWW/SGML/HyTime/XML doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Nov 26 18:51:12 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:06 2004 Subject: SGML and XML, parsers, etc. In-Reply-To: <199711261619.DAA22835@jawa.chilli.net.au> Message-ID: <3.0.1.16.19971126185622.0a2728da@pop3.demon.co.uk> The initial (and I hope) the current idea of this list is for 'XML developers'. This is very widely interpreted and there has been a very high percentage of top quality contributions. A few recent ones have tended to be statements of opinions and, although I certainly don't want to stifle discussion, they don't contribute to the *development* of XML. There is still a serious lack of resources in the public arena. Maybe there are lots of people waiting to announce things as soon as the spec is 'frozen' :-). At present, however, we do not have any/sufficient : - test documents - tutorials - editing tools - post-parser applications - class libraries for common functions (e.g. entitySubstitution) Some posters have felt that XML is too rigid (i.e. we should break the specs), completely broken, not powerful enough etc. It's not helpful to elaborate these views here as they don't contribute to the development of XML. However, as a compromise, if anyone wishes to post such views here, I think we can allow them in a WF or valid XML document (self-contained, please). Use as much markup as you can so that we can test parsers to destruction :-) In that way we can accumulate a body of XML documents... P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mrc at allette.com.au Wed Nov 26 22:01:29 1997 From: mrc at allette.com.au (Marcus Carr) Date: Mon Jun 7 16:59:06 2004 Subject: SGML and XML References: Message-ID: <347C9C17.F74678C7@allette.com.au> Simon St.Laurent wrote: > The market has spoken that SGML does a great job for managing enormous amounts > of information. It has also spoken that SGML presents enormous barriers to > entry (steep learning curve, cost of development, etc.) that have kept a lot of > people from using it. SGML does a great job in many systems. The "many" there, > however, is a tiny select few compared to the many that a simpler syntax (i.e. > XML) could reach. The scale of those projects is very different from those XML > makes possible. Our company ramped into SGML by doing conversions from one proprietary format to another. Even on relatively small data sets, we frequently used SGML in the middle because tools like OmniMark made it easy to gather semantic information and apply context-sensitive formatting on the down-translate. This meant that many of our clients didn't even know that they used SGML. If you looked at this intermediate data, you would not be able to classify it as SGML or XML - it is both, leaving the only difference the tools that you use to manipulate the data. You sound somewhat bitter about SGML, perhaps due to a large and difficult project, but there are numerous small, simple SGML implementations around as well. I'm not suggesting this approach is necessarily the norm, but nor do I don't think that the delineation between what should be an SGML or XML project is as clear as you imply - in many cases we plan to call the normalised output from an SGML parser XML. Why not? -- Regards Marcus Carr email: mrc@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 9262 4777 fax: +61 2 9262 4774 _______________________________________________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Nov 26 23:20:22 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:06 2004 Subject: XML Example and DTD Archive? Message-ID: <01bcfac1$6a9d3110$0100007f@localhost> Fellow XML Developers, I have searched for but could not find an extensive archive of XML examples and DTD. If there is such an archive, please let me know. If not, I would like to build one so we can all benefit. Don "JStud" Park Consultant donpark@quake.net xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Thu Nov 27 02:15:03 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:59:06 2004 Subject: EMBED and validation Message-ID: This may be obvious, but I can't find it in the spec. In XML-Link, does XML content that is included by EMBED in a valid document have to go through validation like the other parts of the document? Is EMBEDded content considered part of the document for styling purposes, grove manipulation, etc.? This could potentially have an enormous impact on two DTDs I'm developing. At present, the material I would like to embed will validate anyway, but it may not always be the case in the future. Information embedded after the document has loaded appears to create an entirely new set of parsing and styling problems, but hopefully there's an answer already - the tool is too good to pass up. There's always ANY... Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From cbullard at hiwaay.net Thu Nov 27 03:25:20 1997 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 16:59:06 2004 Subject: SGML and XML References: <199711261619.DAA22835@jawa.chilli.net.au> Message-ID: <347CE7DD.23FC@hiwaay.net> Rick Jelliffe wrote: > > Please say "XML is simpler than SGML '86" and "XML is better for small > systems than SGML '86" and "SGML has many things that are not needed" > but not "SGML people are trying to make everything complicated and make > XML as bad, complex, over-engineered and stinky as SGML". This demonizing > of "SGML people" is a bad way to win people over to XML. And as we have seen again and again, it will be the same arguments that the next person will use on XML and Simon's book. That's the way it's done. len bullard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Nov 27 07:33:34 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:07 2004 Subject: XML Example and DTD Archive? In-Reply-To: <01bcfac1$6a9d3110$0100007f@localhost> Message-ID: <3.0.1.16.19971127010002.3f5f9ff6@pop3.demon.co.uk> At 15:17 26/11/97 -0800, Don Park wrote: >Fellow XML Developers, > >I have searched for but could not find an extensive archive of XML examples >and DTD. If there is such an archive, please let me know. If not, I would >like to build one so we can all benefit. Don, This is a most exciting offer! You are right that there is no *extensive* archive of XML material and we are suffering because of that lack. Certain people have contributed things which may (or may not) be consistent with the latest draft :-) - that's one of the problems. The places where these are reported are: - XML-DEV , and I try to extract things like this into XML-JEWELS at http://www.vsms.notingham.ac.uk/vsms/xml/jewels.html - http://www.sil.org/sgml/xml.html - Robin Cover keeps an eagle eye for anything of value. Jon Bosak's Shakespeare, and religion are pre-eminent and are a good test for whether a system can cope with 'real documents'. I haven't looked at religion, but Shakespeare has a clean and natural markup without attributes. So it's not a torture test. (I don't think there are DTDs - I think I hacked my own). I don't think there is any mixed content in Shakespeare Michael Sperberg-McQueen wrote a torture-test for XML parsers early this year. We seriously need this up-to-date - maybe Michael is reading this :-) I have written a lot of Chemical markup language (CML) at http://www.vsms.nottingham.ac.uk/vsms/java/jumbo and it uses attributes heavily. However there is NO mixed content in CML, and the output is disappointing without a chemical browser :-) There are snippets of XML in the XSL spec, an the RDF spec and in the MathML spec. None of these have (I think) DTDs [MathML has one in principle]. I have now tested 3.5 parsers under JUMBO and have found that there is sufficient variation between them that we really need some test documents. (Some of the variation is behavioural - i.e. should a browser fail if it reads and foo.dtd doesn't exist.) In my view, collaborative *action* is worth many kilowords of discussion, and if you can help put together such a resource it would be extremely useful. Best Wishes P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Nov 27 08:30:52 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:07 2004 Subject: EMBED and validation In-Reply-To: Message-ID: <3.0.1.16.19971127090853.3dbf5b44@pop3.demon.co.uk> At 02:13 27/11/97 UT, Simon St.Laurent wrote: >This may be obvious, but I can't find it in the spec. No - it's not obvious and - yes, it isn't in the spec. Deliberately, I think. > >In XML-Link, does XML content that is included by EMBED in a valid document >have to go through validation like the other parts of the document? Is >EMBEDded content considered part of the document for styling purposes, grove >manipulation, etc.? This could potentially have an enormous impact on two >DTDs I'm developing. At present, the material I would like to embed will >validate anyway, but it may not always be the case in the future. Information >embedded after the document has loaded appears to create an entirely new set >of parsing and styling problems, but hopefully there's an answer already - the >tool is too good to pass up. When XML-LINK came out I asked (probably to the point of boredom) what the semantics associated with XML-LINK are. The answer (I hope I'm being fair) is that its completely application-dependent. In particular this applies to the word 'EMBED'. If, as I believe, the spec will stay in its very crisp and semantic-free form, then I believe it is critical for the XML community to get at least some communal consensus on XML-LINK semantics or I think we shall have serious interoperability problems. That's only *my* view - others seem either more relaxed, or seem to think it's a totally insoluble problem. That is why I have suggested that we use XDEV as a way of at least identifying different approaches. I believe that the motivation for AUTO + EMBED was to replicate the construct in HTML (USER+REPLACE corresponds to so long as replace is the whole 'resource' (again I have asked repeatedly for clarification as to what a 'resource' is.) A 'resource' seems to be (according to different authorities) : - a nodes in trees (Eliot Kimber on XML-DEV) - the content of the linking element (e.g. the content of ... - the whole containing element (i.e. as above but including the and tags - the whole 'document' in which the link occurs (this emulates in HTML). There seems to be no concern or urgency to clarify this further to webhackers like me, so perhaps I am the only one who sees a problem :-) *What* embed *does* is even less talked about and defined by the experts. It is clearly seen as being able to support has any semantics suggesting that the document linked to should become part of the current document (I hope you understand what I mean :-). In this way linked-to 'resources' could become 'included' or 'transcluded' in the current document. JUMBO has the capability of doing this, though I haven't switched it on because I wanted someone other than me to come up with ideas. Example:

The equation for exponential growth is
and its first derivative is identical

would link to an equation in MathML. The advantage is that your document could use a different DTD from MathML. Whether you process the MathML document on linking to it is 'application-dependent'. What happens if *it* points to further documents is most exciting and most certainly undefined. Note that XML-LINK need not link to an XML document. It could link to a *.gif, or (as in the latest version of JUMBO) to *.txt and *.mol (molecules). In this way object can sometimes be converted on the fly into XML trees, and could - if required - be display and treated (e.g. for searching) as part of the document. I have proposed that a MIME attribute be added to the XML-LINK repertoire. IMO this is more powerful that entities because one can mix different DTDs, but it's also more complex. Even entities have undefined areas as I pick up that some parsers may allow expansion of entities or not at user control. Some SGML experts may respond and say that NOTATION will manage this. I have to admit that I don't understand NOTATION. It seems to have implied semantics of linking to an entity. IMO XML-LINK can do everything I need and I don't required NOTATION. It will be a lot of work if I have to hack JUMBO to use NOTATION and no-one uses it. QUESTION: does anyone actually intend to use NOTATION in XML-LINK? If so, what for, and where is the software coming from? :-) So - I suggest that in XML-DEV we try to agree on sets (not a set) of semantics that can be used with XML-LINK. I know that most people think I'm mad to suggest this, but I *have* had some private support for the XDEV idea. Therefore let us suggest: (ignore capitalisation) The BEHAVIOR value can be chosen from a list of value of the form XDEV:* (and of course 'application-dependent' values :-) Their semantics are determined by referring to postings on this list. Let's kick off with: XDEV:DISPLAY "a graphical or textual rendering of the object" XDEV:DISPLAY_IN_CONTEXT "a graphical or textual rendering as part of another element or resource" (See PeterMR earlier on XML-DEV) XDEV:INCLUDE "make the linked-to resource part of the current tree/document" XDEV:INCLUDE_RECURSIVELY "make the linked-to resource and its child links part of the current document" and include another attribute: XDEV:MIME which can have values consistent with (whatever the MIME RFC is). These are just starters. please refine them, but please also keep them simpler that HyTime. I am sure the HyTime experts are planning to build HyTime on top of XML anyway, so if you want a full hypermedia system no doubt it will arrive sometime. I suggest we keep this very simple, with the main objective being to avoid inconsistent semantics if possible. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Patrice.Bonhomme at loria.fr Thu Nov 27 09:01:44 1997 From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme) Date: Mon Jun 7 16:59:07 2004 Subject: ]]> within a CDATA marked section ? Message-ID: <199711270901.KAA22662@chimay.loria.fr> Is it possible to put the sequence ]]> within a CDATA marked section ? Exemple:

Here is the beginning of the CDATA marked section:

Here is the true end. ]]>

Pat -- ============================================================== bonhomme@loria.fr | Office : B.228 http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 -------------------------------------------------------------- * Projet Aquarelle : http://aqua.inria.fr * Serveur Silfide : http://www.loria.fr/Projet/Silfide ============================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Nov 27 11:48:59 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:07 2004 Subject: ]]> within a CDATA marked section ? In-Reply-To: <199711270901.KAA22662@chimay.loria.fr> References: <199711270901.KAA22662@chimay.loria.fr> Message-ID: <199711271148.GAA00360@unready.microstar.com> Patrice Bonhomme writes: > Is it possible to put the sequence ]]> within a CDATA marked section ? No -- in XML, there is no way at all. In full SGML, you could use RCDATA instead of CDATA: (In the DTD) (In the document instance) ]]> I don't think this is a big problem, though, since CDATA marked sections are simply a typing convenience. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jarle.stabell at dokpro.uio.no Thu Nov 27 12:17:51 1997 From: jarle.stabell at dokpro.uio.no (Jarle Stabell) Date: Mon Jun 7 16:59:07 2004 Subject: ]]> within a CDATA marked section ? Message-ID: <01BCFB36.A746EBE0@xyplex34.uio.no> Pat wrote: <<<< Is it possible to put the sequence ]]> within a CDATA marked section ? Exemple:

Here is the beginning of the CDATA marked section:

Here is the true end. ]]>

>>>> [JS] I don't think so. A "workaround" is to close the first CDATA section, write the ]]> (or for compatibility it seems you have to use ]]> and then open up a new CDATA section to continue. Example:

Here is the beginning of the CDATA marked section: ]]> Here is the true end. ]]>

BTW: Do people think XML parsers generally will/should complain about a ]]> when it for *compatibility* should be ]]> ? (Or do I misinterpret the draft text: 'and must for compatibility, be escaped using ">" or a character reference when it appears in the string "]]>", when that string is not marking the end of a CDATA section' Does it mean that the user should better use ">" to be compatible with SGML, or that the XML parser should report this as an error if not escaped using ">"?) I have some concerns related to & and < when not followed by a char which can start a name (or "nil then doSomething" as legal content is because it is better for users that & and < are consistently not allowed for anything than markup, but I'm not convinced about this. (At least it seems trivial for parsers to check this situation) Cheers, Jarle xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Thu Nov 27 12:21:32 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:07 2004 Subject: MONDO Design Document v0.3.1 Message-ID: Not to be a notification pest (or turkey), but I decided the v0.3 MONDO document was missing some sections that were important to explaining the MONDO ObjectBuilder. So I added them, fixed a number of other sections, and put up a new version at: http://www.chimu.com/projects/mondo/design/mondoDesign.pdf The additions and changes include the following: v0.3.1 971127 Added sections 4.2 through 4.5 (Building and Recipes), fixed the conclusion of chapter 5 and added a comparison table. Cleaned up chapter 10. --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Patrice.Bonhomme at loria.fr Thu Nov 27 13:11:48 1997 From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme) Date: Mon Jun 7 16:59:07 2004 Subject: A Personnal XML release of the TEI Lite DTD Message-ID: <199711271308.OAA23421@chimay.loria.fr> Hi, As i am working both with TEI and XML, i am pleased to announce the availability of my personnal XML release of the TEI Lite DTD. The xteilite DTD and 2 famous TEI lite encoded documents are available at the following URL: http://www.loria.fr/~bonhomme/xml.html It is not an official release of the TEI Lite. A lot of things remains to be done, for example the use of the XML-LINK (XLL). And some of the problems are still pending (inclusion / exclusion on content model). Both of the XML documents have been tested with the MSXML parser (v. 1.6) and the Lark parser (v. 0.92). I am also trying to make an XML compatible version of the big TEI P3 DTD(s), but the task is much more difficult as it requires almost a complete rewriting of the TEI DTD modules. Of course, all feedback is very appreciated. Pat. -- ============================================================== bonhomme@loria.fr | Office : B.228 http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 -------------------------------------------------------------- * Projet Aquarelle : http://aqua.inria.fr * Serveur Silfide : http://www.loria.fr/Projet/Silfide ============================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Nov 27 14:33:33 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:07 2004 Subject: ]]> within a CDATA marked section ? References: <01BCFB36.A746EBE0@xyplex34.uio.no> Message-ID: <347D8586.2DB5D07C@technologist.com> Jarle Stabell wrote: > BTW: Do people think XML parsers generally will/should complain about a ]]> when it for *compatibility* should be ]]> ? I think that they should. This requirement seems strange at first, but it stops mistakes like the one you made. You can never accidently make a CDATA marked section end be content. > I assume the reasons for *not* allowing "if x<>nil then doSomething" as legal content is because it is better for users that & and < are consistently not allowed for anything than markup, but I'm not convinced about this. > (At least it seems trivial for parsers to check this situation) Parser writers are rebelling at the number of trivial things that they must manage. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Nov 27 14:37:11 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:07 2004 Subject: EMBED and validation References: Message-ID: <347D8643.64CFE9DD@technologist.com> Simon St.Laurent wrote: > > In XML-Link, does XML content that is included by EMBED in a valid document > have to go through validation like the other parts of the document? No. Validation is defined for XML documents. XML Link is a completely different spec and has no bearing on the definition of an XML document. You seem to rather be thinking of the XML "hyperdocument" (in hytime terms). > Is > EMBEDded content considered part of the document for styling purposes, grove > manipulation, etc.? XML has no style language yet and also has no definition of a grove. So the answer is "nobody knows yet." > Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) Do you really have an XML book coming out in January? What spec will it be based upon? Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Nov 27 14:52:30 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:07 2004 Subject: XML and standards (was Re: Integrity in the Hands of the Client) References: Message-ID: <347D89F2.362B8E61@technologist.com> Mark Baker wrote: > > On Mon, 24 Nov 1997, Paul Prescod wrote: > > > What if that troff document contained a link to an implementation of a > > > troff formatter? What if that implementation described its interface using > > > XML? > > > > What if it didn't? What if it described its interface using CORBA or > > some proprietary language that is more powerful than CORBA? You don't > > lose any flexibity or expressive power, you just have to write another > > parser for CORBA or your proprietary language. > > My point is that if it did, then no longer are clients responsible for > interpreting the semantics of the data - a contained/referenced > implementation is. Well at the hardware level, it is still the client. I think you are distinguishing between clients being hard-wired to accept a fixed number of notations and being extensible (e.g. through Java). That sounds reasonable. > In comp doc frameworks, when a new stream of data is introduced into a > container, the framework decides the type of the data and then attempts > to find an editor based on that type. The editor knows what to do with > that data, and negotiates with the container for the real-estate for its > presentation. I think this is more tricky then it sounds, especially that bit about "negotiating for real estate" (unless you are talking about unit squares). But okay. > So if a well-formed document comes streaming into our container, the > framework would start parsing it, come across a tag called 'troff', and > then proceed to try and discover and install a chunk of code that knows > how to parse/render troff. Or the document could provide its own ref(s) > (more likely for scalability purposes). Either way, it's not the > container (the client) that's responsible for interpreting the semantics > of the data. It's the document itself that is responsible. You seem to be arguing in favour of self-labelling data formats, which I agree could be quite useful. But XML doesn't give you that "for free" in any sense. There is no standard for having XML documents, entities or elemenets link to Java Beans or Active-X controls that can render them. You must invent such a standard and it will be only marginally easier to invent an XML-based one than to use OpenDoc or OLE Structured Storage which handle this already. XML has the benefit that it has momentum today and may "take over the universe." It has the serious downside that it cannot (reasonably) encode binary information so .GIFs and .JPEGs cannot be self-describing in this way (whereas they could be in OpenDoc Bento or OLE Structured Storage). In other words, something like Bento or OLESS is probably still needed. We could surely find a way to recreate it with XML and (e.g.ZIP), but it seems to me that that would be more of a political decision than a technical one. The SGML standards family has something called "SDIF". There is also mime/multipart, Amiga IFF and probably a hundred other kicks at this can. Anyhow, I think that a high priority of the XML WG/Community should be inventing the XML equivalent of the JAR file. It is way too much of a hassle to ship multipart documents (whether they be SGML, HTML or XML). It needent be much harder than shipping around Word Docs (which are really multipart documents). This XAR files should be able to label their contents. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Thu Nov 27 15:16:24 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:59:07 2004 Subject: EMBED and validation Message-ID: Given the wide variety of possible interpretations Peter has enumerated, it looks like I'll be taking the most conservative road possible and developing documents and links in such a way that they will remain valid whether or not the EMBEDded material is included as part of the document. So far, I think I'll only need one ANY. The documents I want to link (at this point) all share the same DTD - I hate to imagine what will happen if I need to open that up. Still, this is a considerable improvement on the tools I've worked with before. This EMBED issue raises even more bizarre questions for styling - context-dependent styling could well be forced to adjust if EMBEDded material is considered part of the document tree. Taking this into account will be an interesting challenge that may force me to use some old-style CLASS attributes, but we'll see. CSS will have some problems, but they may be surmountable. XML styling hasn't exactly happened yet, but I hope the developers are keeping this in mind. XDEV sounds like a much-needed idea given the latitude of interpretation allowed to applications. It may also be needed (or need to be extended further) given some of the switches we may need for turning on and off these SGML features people seem to want included in their parsers. But maybe we _can_ make everyone happy. Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From rrseibel at att.com Thu Nov 27 17:09:35 1997 From: rrseibel at att.com (Seibel, Robert R) Date: Mon Jun 7 16:59:07 2004 Subject: Have we settled on XML and related mime types? Message-ID: <11BF90556669D01195F3080009B3AC813CA9A2@nj8102po01.lz.att.com> Team: I've seen bits and pieces of mail regarding XML mime types. Does anyone know of the official list of mime types for XML and related support applications like XSL? The ones I have seen are: 1) text/xml with .xml extension 2) application/xml with .xml extension 3) text/xsl with .xsl extension Are these correct? Are there more? Thanks for your help, Bob Seibel AT&T WorldNet xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jarle.stabell at dokpro.uio.no Thu Nov 27 18:12:38 1997 From: jarle.stabell at dokpro.uio.no (Jarle Stabell) Date: Mon Jun 7 16:59:07 2004 Subject: Rebelling parser writers (Was: ]]> within a CDATA marked section ?) Message-ID: <01BCFB68.64DAE950@xyplex34.uio.no> I wrote: <<< > I assume the reasons for *not* allowing "if x<>nil then doSomething" as legal content is because it is better for users that & and < are consistently not allowed for anything than markup, but I'm not convinced about this. > (At least it seems trivial for parsers to check this situation) >>> Paul Prescod wrote: <<< Parser writers are rebelling at the number of trivial things that they must manage. >>> [JS] I'm actually surprised that I haven't heard much rebelling here. :-) I think there are lots of *non-trivial* things parser writers must manage in XML, so I don't think they care much about trivial things if they actually are useful to many users. I'm afraid of making my parser look stupid/stubborn, because that very likely means higher support costs, and also lowers the average user's impression of the quality of the product. Gurus may know why the parser complains, but perhaps not the average support personell, and certainly not the average user My current "favourite XML annoyance" is the rules for entity expansion, which makes writing the name AT&T in an entity rocket science for the average XML user, and probably gives some implementors gray hairs. (I understand that these rules gives maximum power, but I can hardly see the need for it. (Or is it "often" needed because one has chosen " or ' to mark the end of an entity value?)) I'll try to explain why it probably will give me some gray hairs when I'll implement it: After attempting to process a document containing errors, I want to present to the user a list of error messages, and when the user clicks on one of these messages, I want to highlight the exact part of the document where the error occurs. The problem with entity expansion is that the parser isn't parsing what the user literally wrote into the entity definitions, it is parsing a processed/"virtual" version, which *may* not be a real subpart of the document, so one has to map "virtual" locations/positions to physical (real document) positions, which doesn't seem trivial to me. It is also likely to give slightly confusing error messages, as it may be mentioning expanded stuff ("") which the user never wrote, the user may have written "<xxx>" etc. This single issue is likely to give me many hours of thinking (and programming) , while allowing stuff like "x < 5" in content only takes me a single line to handle. I sometimes get the impression that XML contains many hard to implement (and understand) things (which won't be useful to anyone but the gurus), while disallowing things that are easy to implement and also useful to the average user. Ok, enough rebelling for now... :-) Cheers, Jarle xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tms at ansa.co.uk Thu Nov 27 18:24:13 1997 From: tms at ansa.co.uk (Toby Speight) Date: Mon Jun 7 16:59:07 2004 Subject: Have we settled on XML and related mime types? In-Reply-To: "Seibel, Robert R"'s message of "Thu, 27 Nov 1997 12:07:34 -0500" References: <11BF90556669D01195F3080009B3AC813CA9A2@nj8102po01.lz.att.com> Message-ID: A non-text attachment was scrubbed... Name: not available Type: text/plain (pgp signed) Size: 1416 bytes Desc: not available Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971127/87031bad/attachment.bin From serres-doug at usa.net Thu Nov 27 19:38:44 1997 From: serres-doug at usa.net (Doug Serres) Date: Mon Jun 7 16:59:07 2004 Subject: Wanted: C/C++ based Validating XML Parser Message-ID: <347DCBF5.78A31AB6@usa.net> Hi, I'm looking for a C/C++ based Validating XML Parser. I see references to a few Java based ones and a TCL based one on the W3C page but none in C or C++. Any ideas? Thanks -- Doug Serres xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Nov 27 19:47:56 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:08 2004 Subject: Wanted: C/C++ based Validating XML Parser In-Reply-To: <347DCBF5.78A31AB6@usa.net> References: <347DCBF5.78A31AB6@usa.net> Message-ID: <199711271947.OAA05381@unready.microstar.com> Doug Serres writes: > I'm looking for a C/C++ based Validating XML Parser. I see > references to a few Java based ones and a TCL based one on the W3C > page but none in C or C++. Any ideas? Get James Clark's SP: http://www.jclark.com/sp/ To use the command-line version with XML, you need to use the -wxml flag and prepend the SGML declaration included in the distribution; i.e. nsgmls -wxml /usr/lib/sgml/sgmldecl/xml.dcl myfile.xml For easier use, make up a shell script or batch file. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Nov 27 21:39:40 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:08 2004 Subject: A Personal XML release of the TEI Lite DTD In-Reply-To: <199711271308.OAA23421@chimay.loria.fr> Message-ID: <3.0.1.16.19971127221239.3a4f6ef6@pop3.demon.co.uk> At 14:08 27/11/97 +0100, Patrice Bonhomme wrote: > >Hi, > >As i am working both with TEI and XML, i am pleased to announce the >availability of my personnal XML release of the TEI Lite DTD. The xteilite >DTD and 2 famous TEI lite encoded documents are available at the following URL: > This is a wonderful thing to have, thanks. I glanced at the DTD - haven't had time to download. It's certainly an excellent thing to test all our stuff on. [Perhaps the official custodians of the TEI could say how they see TEI being XMLised?] P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mrc at allette.com.au Thu Nov 27 22:37:28 1997 From: mrc at allette.com.au (Marcus Carr) Date: Mon Jun 7 16:59:08 2004 Subject: XML Example and DTD Archive? References: <3.0.1.16.19971127010002.3f5f9ff6@pop3.demon.co.uk> Message-ID: <347DF5ED.151DD891@allette.com.au> Peter Murray-Rust wrote: > At 15:17 26/11/97 -0800, Don Park wrote: > >I have searched for but could not find an extensive archive of XML examples > >and DTD. If there is such an archive, please let me know. If not, I would > >like to build one so we can all benefit. > > You are right that there is no *extensive* archive of XML material and we are > suffering because of that lack. Has anyone considered using OmniMark's 'The Compleat SGML' CD as a starting place? It was designed as a conformance suite for SGML parsers, with over 10,000 documents in various states of validity, size and degrees of complexity. I'm not sure of the legal issues related to copyright - it might be worth an inquiry to OmniMark - but it was a marketable product at a cost of about $200. With the number of XML parsers in the pipeline, a full conformance suite might even turn a few dollars. -- Regards Marcus Carr email: mrc@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 9262 4777 fax: +61 2 9262 4774 _______________________________________________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Thu Nov 27 23:17:17 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:08 2004 Subject: ]]> within a CDATA marked section ? References: <01BCFB36.A746EBE0@xyplex34.uio.no> Message-ID: <347D91A2.B0689DB9@jclark.com> Jarle Stabell wrote: > BTW: Do people think XML parsers generally will/should complain about a ]]> when it for *compatibility* should be ]]> ? > (Or do I misinterpret the draft text: > > 'and must for compatibility, be escaped using ">" or a character reference when it appears in the string "]]>", when that string is not marking the end of a CDATA section' > > Does it mean that the user should better use ">" to be compatible with SGML, or that the XML parser should report this as an error if not escaped using ">"?) A conforming XML parser *must* report this as an error. "For compatibility" just gives the rationale for the requirement; it doesn't lessen the requirement on parsers to report the error. The spec's definition of "for compatibility" makes this clear: for compatibility A feature of XML included solely to ensure that XML remains compatible with SGML. Note that "for compatibility" is quite different from "for interoperability". James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Thu Nov 27 23:36:19 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:08 2004 Subject: EMBED and validation Message-ID: <3.0.32.19971127135240.00b7d260@pop.intergate.bc.ca> At 02:13 AM 27/11/97 UT, Simon St.Laurent wrote: >In XML-Link, does XML content that is included by EMBED in a valid document >have to go through validation like the other parts of the document? No; it's not part of the document; it's a hyperlink to something completely different; there's no reason to expect what it points at to be XML. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Nov 28 01:33:28 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:08 2004 Subject: Revelling parser writers (was Rebelling) In-Reply-To: <01BCFB68.64DAE950@xyplex34.uio.no> Message-ID: <3.0.1.16.19971128022334.36a76ba0@pop3.demon.co.uk> JUMBO now has an interface to 3.5 parsers including Lark and NXP. This means that the user can parse the same document with different parsers or can (in principle) use a different parser for the initial document than for the XML-LINKed ones (I haven't actually include a 'Change Parsers' button. It has been 'quite easy'. Authors have generally provided a set of test routines to be either hacked or subclassed (see Lark for examples.) I think this is a good model for distribution, as it's a quite way to make minor changes and get them hooked into your system. It shouldn't take more than about 2 hours per parser - I can't spare more. I have not done the MSXML system because I don't know if it has been WORA'ed yet... have I missed it? JUMBO may not be a complete test bed as it builds a tree and can then do things from that. It may lose information (it doesn't store comments at present). Since it was written before the WG decided on joined-up writing for XML names, it still uppercases everything and I'm waiting for the white smoke before I make that change. It *does* store PIs as children of the immediately preceding non-PCDATA Element. It does not store NOTATIONs as it has never seen one and doesn't know what to do with one when it gets it. It is also not very good on things like IMPLIED attribute values since it may not always have a DTD. If anyone can come up with simple rules for what a tree should contain, that could be useful. [Not a grove at this stage, as no one seems to write their parsers to create groves.] P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Nov 28 01:55:35 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:08 2004 Subject: Editing text In-Reply-To: <347CE7DD.23FC@hiwaay.net> References: <199711261619.DAA22835@jawa.chilli.net.au> Message-ID: <3.0.1.16.19971128024541.36a7e07e@pop3.demon.co.uk> I am writing an editor for JUMBO where I expect most of the characters like '"<>& to have been converted into entities (e.g. &apos, etc.). [I do not expect any raw ; } I assume there is no short cut... I applaud the work of the WG on the Internationalisation and I don't want to detract from it. What I would suggest is that because of the extremely likelihood of error if individuals do try to hack their own isNameChar(), and because if ever this list is revised software will be invalidated, that the WG, or W3C or whoever, maintain an isNameChar() routine in the common languages (C, C++, Java) so that we know we shall all be working with the same one. There may be other similar aspects of the spec where it is worth having a central curated resource... P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Fri Nov 28 05:10:55 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:59:08 2004 Subject: EMBED and validation Message-ID: >No; it's not part of the document; it's a hyperlink to something >completely different; there's no reason to expect what it points at >to be XML. -Tim While there is no reason to expect the target to be XML (which I strongly approve of), I have to wonder what's supposed to happen if the target _is_ XML. If the target is another complete XML document, including a document type declaration, then I can see the wisdom of parsing it separately and keeping it separate. If the target is XML but not a complete document, for instance a set of elements returned by a reference using XPointers, I'm not sure about what the application should do. Is the application supposed to treat this chunk as (hopefully) well-formed XML in a separate parsing process? Would it be legitimate for an application to fold EMBEDded chunks into the document containing the link for purposes of styling in particular but also validation in certain circumstances? Many situations will arise in which EMBEDded content needs to be styled, but the chunk of XML referenced by the link contains neither document type declaration or styling information. My instinct is to be as conservative as possible and make sure that all XML chunks EMBEDded by a link could be folded into the linking document without making it invalid, but this is a more radical constraint than I expect most developers would like. Leaving this behavior up to the application is probably the only course available at present, but I suspect this practice may lead to considerable chaos. XML-Link has opened up realms of capability that go far beyond those provided by entities and notations, and I look forward to using them. Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Fri Nov 28 05:30:25 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:08 2004 Subject: Editing text Message-ID: <199711280528.QAA04939@jawa.chilli.net.au> > From: Peter Murray-Rust > I assume there is no short cut... On the contrary, there *IS* a short cut: the most obvious one! Just treat the name as a token (i.e. terminated by whitespace or >, or any other delimiter if you want to be careful). Any valid XML will work with just that! If you want to completely validate your XML, then the more sophisticated checks are appropriate. The intent (as I see it) is to let people use customary words in their language and script, if they want to. It is bad practise to use crazy symbols and uncommon characters in markup, because the purpose of markup is to reveal meaning, not hide it. The complexity of the rules merely encodes that to give guidance in the peripheral cases. > I applaud the work of the WG on the Internationalisation and I don't want Yes, they have been exemplory in this, I think. They have taken the issue very seriously, and kept their eyes on the goal. It is very easy for I18N to bamboozle people, in that there is always a fuzzy and heaving morass of quibbling that makes people want to give up. But in the case of XML, we can have our cake (the fans of strict, codified naming rules can exactly specify what is allowed) *AND* eat it (bewildered parser-writers can just use simple tokenizing). > to detract from it. What I would suggest is that because of the extremely > likelihood of error if individuals do try to hack their own isNameChar(), > and because if ever this list is revised software will be invalidated, that > the WG, or W3C or whoever, maintain an isNameChar() routine in the common > languages It is possible that isNameChar() will be adequate. The issue of how complex the naming rules should be is under last-minute finalization. The important thing is not to bee distracted by how detailed the official list is. If you do not have a validating XML processor (which means you in fact are assuming that your documents are valid) then a much simpler tokenizing regime should work fine. That was a thing explicit in the discussions for the naming system: it must be straightforward to implement a (non-validating) XML parser. > (C, C++, Java) so that we know we shall all be working with the same one. There is a draft ISO technical report on this issue, for future programming language standards. This technical report has clearly been influenced by XML and SGML's approaches to the problem. I know that the WG representatives who are looking after finalizing the naming rules are looking at that as well. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Fri Nov 28 06:54:27 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:08 2004 Subject: Editing text Message-ID: <3.0.32.19971127225432.00bf150c@pop.intergate.bc.ca> At 02:45 AM 28/11/97, Peter Murray-Rust wrote: >Appendix B lists six and a half pages of potential NameChars for which >JUMBO has to test - is this correct? If so I have code of the form: Be warned; Appendix B will change again. Anyhow, if you really want an isNameChar() function, I recommend something along the lines of isNameChar(char c) { if (c < 128) return BooleanArrayOfSize127WithTrueInNameCharPositions[c]; else return DoIckyLookupInBigTableFromAppendixB(c); } Actually, I posted some Java code that reads the XML spec and generates a reasonably efficient Java version of DoIckyLookup... the Lark distribution currently has a CharClasses.java. I'll re-test and re-generate and re-distribute after the next cut of the spec. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Fri Nov 28 07:51:52 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:08 2004 Subject: REQ: XML Example and DTD Catalog Submissions Message-ID: <01bcfbd2$0d6889b0$0100007f@localhost> Fellow XML Developers, I have put up a catalog of XML Examples and DTDs to serve as the place to get links to samples and definition files. For now, the catalog is just a web page divided into sections for each XML applications. It is my hope to fill the catalog with links to most of available XML samples and DTDs out there. If you have XML example files or DTDs you would like to see in the catalog, please send its URL to me. I can not use the actual files because I can not handle the volume on my website. The catalog is at: http://www.quake.net/~donpark/xmlcat.html My sincere thanks in advance, Don "JStud" Park Java/MFC Consultant donpark@quake.net xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Fri Nov 28 07:55:34 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:59:08 2004 Subject: Rebelling parser writers (Was: ]]> within a CDATA marked section ?) In-Reply-To: <01BCFB68.64DAE950@xyplex34.uio.no> Message-ID: In message <01BCFB68.64DAE950@xyplex34.uio.no>, Jarle Stabell writes >After attempting to process a document containing errors, I want to present to >the user a list of error messages, and when the user clicks on one of these >messages, I want to highlight the exact part of the document where the error >occurs. >The problem with entity expansion is that the parser isn't parsing what the user >literally wrote into the entity definitions, it is parsing a processed/"virtual" >version, which *may* not be a real subpart of the document, so one has to map >"virtual" locations/positions to physical (real document) positions, which >doesn't seem trivial to me. It is also likely to give slightly confusing error >messages, as it may be mentioning expanded stuff ("") which the user never >wrote, the user may have written "<xxx>" etc. I don't think this is as much of a problem as you fear. Every entity is physically declared somewhere in a real source - usually a good ol' file on disc. Of course, that file may not be the one you started from ... My RunSP program (http://www.light.demon.co.uk/runsp) does exactly what you describe (for nsgmls). It runs it under Windows and then allows the user to navigate from one error message to the next, in a simple editor environment that lets them sort out the problems they find. All I did was to parse the error messages, pick out file name, line number and character offset, and place a bookmark at the relevant point in the file concerned. This works equally well for errors in the DTD or SGML Declaration as for those in what we think of as the 'real document'. (Which is something that never occurred to me when designing RunSP - but of course the Declaration and DTD are equally part of the document as far as the parser is concerned.) Richard Light. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Nov 28 08:18:50 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:09 2004 Subject: EMBED and validation In-Reply-To: Message-ID: <3.0.1.16.19971128091613.34efaa0c@pop3.demon.co.uk> At 05:09 28/11/97 UT, Simon St.Laurent wrote: >>No; it's not part of the document; it's a hyperlink to something >>completely different; there's no reason to expect what it points at >>to be XML. -Tim No - and JUMBO can eat about 17 types of non-XML files (e.g. *.txt, *.gif, and lots of lovely chemistry). If *any of you* want to write a simple routine for RTF, Word binary, MAC BinHex, it would be marvellous. All you need to do is decide on the tree structure - JUMBO can then output it in shining XML. > >While there is no reason to expect the target to be XML (which I strongly >approve of), I have to wonder what's supposed to happen if the target _is_ You approve that it must/needNot be XML. For me the latter is essential. Sometime ago I proposed an extra attribute MIME to describe the MIME type of the target HREF. (Note that this is NOT always available from contentType since it may be a local file. If this doesn't get into the SPEC, I suggest we need an XDEV attribute and I proposed that 2 days ago... >XML. If the target is another complete XML document, including a document >type declaration, then I can see the wisdom of parsing it separately and >keeping it separate. If the target is XML but not a complete document, for >instance a set of elements returned by a reference using XPointers, I'm not This is (I believe) 'application-dependent. I see the following possibilities. (A) Render the tree and paint the referred elements blue. JUMBO does this. You don't get a choice of colours at present (B) Render the event stream and paint the elements red. JUMBO cannot do joined up writing yet, but is gradually learning how to render event streams (it can do most of HTML 2.0) (B) Regard this as a query (remember our discussions here?) and use the nodes in some other way. That's why I think XLL Xpointer syntax is the appropriate base for a query language. >sure about what the application should do. The more I think about this, the more I think we have to delineate the possible actions and systematise them here. I think some people will want to treat XML-LINK as simply like HTML, others will want automatic inclusion. Since I am not a hypermedia expert, I am hoping to get some guidance. The question is ACUTATE="AUTO" SHOW="EMBED". There are several options. A. treat it as a separate object (possibly a BLOB like a gif), work out how big it is (pixel wise), create a pretty box and render it in there . JUMBO started to do this, but got lost in flowObjects. Now I think it would do better. But you need to be able to handle flowObjects in your metaphor. B. parse it as a tree and replace the XML-LINK node. This would then look very similar to &foo;. The advantages are that the target can use a different DTD (although writing out the combined tree could be hairy). One disadvantage is we need a switch to do this, which is why I proposed XDEV:INCLUDE. A more serious disadvantage is that recursive following of EMBED/AUTO could give rise to all sorts of fun things, like cyclic recursion, getting into hairy areas, actuating buttons on nuclear power stations and so on. C. render it as a thumbnail and get the user to click it In many ways EMBED/AUTO can do everything that &foo; does and (as far as I can see) everything that NOTATION does. The attraction is that it can be further customised through attributes. &foo; cannot refer to non-XML objects, NOTATION seems to have an additional level of indirection and I don't understand it yet, since I've never seen it used. >Is the application supposed to treat this chunk as (hopefully) well-formed XML >in a separate parsing process? Would it be legitimate for an application to As with all tricky questions on XML the answer is 'application-dependent'. So - if we can agree some semantics here that would be very helpful. >fold EMBEDded chunks into the document containing the link for purposes of >styling in particular but also validation in certain circumstances? Many Yes, if the application has been written to do so :-) >situations will arise in which EMBEDded content needs to be styled, but the >chunk of XML referenced by the link contains neither document type declaration >or styling information. I shall make something like this available in JUMBO. All the guts are there, it's just agreeing on the public face - i.e. whether there is an XDEV attribute Again it may be possible to request the application to supply styling and DTD (e.g. through an XDEV attribute or PI). Again I'd like to see public discussion on this. > >My instinct is to be as conservative as possible and make sure that all XML >chunks EMBEDded by a link could be folded into the linking document without >making it invalid, but this is a more radical constraint than I expect most I think it is far better to have the semantics explicit and in the open, rather than for different application developers to think what is best here. This is an area where - without XDEV - we have severe problems of interoperability. I know that a lot of people think that interoperable XML applications is Quixotic, but *I* believe it's possible if we have the communal will. Otherwise the average user will pick up application A and find their HREFs folded in and swear and curse when application B doesn't. Remember, if you don't like what I'm suggesting, you don't even have to read it :-) >developers would like. Leaving this behavior up to the application is >probably the only course available at present, but I suspect this practice may >lead to considerable chaos. See the idealistic ideas above :-) > >XML-Link has opened up realms of capability that go far beyond those provided >by entities and notations, and I look forward to using them. Yup - it has revolutionised my thinking. It means I can through away 50% of my code because there are general solutions. XML-LINK EXTENDED is even more fun. I shall have some proposals there :-) P. > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Nov 28 08:24:36 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:09 2004 Subject: Editing text In-Reply-To: <199711280528.QAA04939@jawa.chilli.net.au> Message-ID: <3.0.1.16.19971128084309.36a76b24@pop3.demon.co.uk> At 16:27 28/11/97 +1100, Rick Jelliffe wrote: > > >> From: Peter Murray-Rust > >> I assume there is no short cut... > >On the contrary, there *IS* a short cut: the most obvious one! > >Just treat the name as a token (i.e. terminated by whitespace or >, >or any other delimiter if you want to be careful). Any valid XML will >work with just that! I think I had a brownout over this. I thought that it could be difficult to find the balancing semicolon without scanning the NameChars. But rereading the spec (e.g. 2.4) convinces me that there isn't a problem. [Perhaps I thought that AT&T was now legal in PCDATA. I'm glad it isn't :-)] But since the text is being *edited* it's probably a good thing to run isNameChar() over new entities, tagNames, etc. JUMBO can just about think as quickly as a human typing in. P. But I think we need IckyLookup as a communal resource :-) Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Fri Nov 28 12:29:55 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:59:09 2004 Subject: New version of RunSP Message-ID: <7dcYKBAiMrf0Ewm6@light.demon.co.uk> I have just updated my RunSP program so that you can specify command- line arguments. This means that it can now be used to run NSGMLS on XML documents (with the -wno-valid switch introduced in version 1.2). See http://www.light.demon.co.uk/runsp/ for details. (It may be up to a day before the new version is made available by my ISP.) Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Nov 28 12:30:32 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:09 2004 Subject: NameChar (was: Editing text) In-Reply-To: <3.0.1.16.19971128024541.36a7e07e@pop3.demon.co.uk> References: <199711261619.DAA22835@jawa.chilli.net.au> <347CE7DD.23FC@hiwaay.net> <3.0.1.16.19971128024541.36a7e07e@pop3.demon.co.uk> Message-ID: <199711281230.HAA00341@unready.microstar.com> Peter Murray-Rust writes: > I am writing an editor for JUMBO where I expect most of the characters like > '"<>& to have been converted into entities (e.g. &apos, etc.). [I do not > expect any raw transformed by the parser. On the other hand there may be other entities > which have not been expanded (e.g. &foo; > > My understanding of the spec [71] is that an entity is a Name and that Names > [4], [5] and [6] are constructed from letters, digits and numbers. In > determining whether something is an entity, I have to look for a string of > the form: '&'(Letter | '_' | ':') (NameChar)* ';' > NameChars are Digits, MiscNames and Letters. > > Appendix B lists six and a half pages of potential NameChars for which > JUMBO has to test - is this correct? If so I have code of the form: > > public boolean isNameChar(char ch) { > return ; > } > > I assume there is no short cut... I have not checked them for alignment, but there is a good chance that you could use Java's built-in java.lang.Character.isLetterOrDigit() predicate to eliminate most of it, something like this: public boolean isNameChar (char ch) { return java.lang.Character.isLetterOrDigit(ch) | isMiscChar(ch); } public boolean isMiscChar (char ch) { switch(ch) { case '.': case '-': case '_': case ':': return true; default: return isCombining(ch) || isIgnorable(ch) || isExtender(ch); } } public boolean isIgnorable (char ch) { int c = (int)ch; return ((c >= 0x200c && c <= 0x200f) || (c >= 0x202a && c <= 0x202e) || (c >= 0x206a && c <= 0x206f)); } public boolean isExtender (char ch) { int c = (int)ch; switch (c) { case 0x00b7: case 0x02d0: case 0x02d1: case 0x0387: case 0x0640: case 0x0e46: case 0x0ec6: case 0x3005: return true; default: return ((c >= 0x3031 && c <= 0x3035) || (c >= 0x309b && c <= 0x309e) || (c >= 0x30fc && c <= 0x30fe)); } } public boolean isCombining (char ch) { // lots of stuff } The only long one left is isCombining(), which I haven't bothered to fill in. Before anyone uses these, please check them against both the XML spec and the Java Language Spec, to see if isLetterOrDigit() really aligns properly. > I applaud the work of the WG on the Internationalisation and I don't want > to detract from it. What I would suggest is that because of the extremely > likelihood of error if individuals do try to hack their own isNameChar(), > and because if ever this list is revised software will be invalidated, that > the WG, or W3C or whoever, maintain an isNameChar() routine in the common > languages > (C, C++, Java) so that we know we shall all be working with the same one. Not a bad idea, but it is unlikely that everyone would want to use the same one. The fastest solution would be to maintain a static 65,536 (or at least 32,768) entry array, with bit flags for different character properties. That would be fine for big programs, but it would kill Java applets and other size-sensitive applications unless it were already built-into the Java environment. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at cogsci.ed.ac.uk Fri Nov 28 14:09:00 1997 From: richard at cogsci.ed.ac.uk (Richard Tobin) Date: Mon Jun 7 16:59:09 2004 Subject: NameChar (was: Editing text) In-Reply-To: David Megginson's message of Fri, 28 Nov 1997 07:30:19 -0500 Message-ID: <199711281408.OAA00647@stevenson.cogsci.ed.ac.uk> > The fastest solution would be to maintain a static 65,536 > (or at least 32,768) entry array, with bit flags for different > character properties. That would be fine for big programs, but it > would kill Java applets Bear in mind that the main problem of size for Java applets is the time taken for downloading, rather than the memory used at runtime. So it may well be practical to store the data in a compact-but-slow form and use that to initialise a large-but-fast lookup table. -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Fri Nov 28 16:06:22 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:09 2004 Subject: Rebelling parser writers (Was: ]]> within a CDATA marked section ?) References: Message-ID: <347ED4F1.C772F211@jclark.com> Richard Light wrote: > > In message <01BCFB68.64DAE950@xyplex34.uio.no>, Jarle Stabell > writes > > >After attempting to process a document containing errors, I want to present to > >the user a list of error messages, and when the user clicks on one of these > >messages, I want to highlight the exact part of the document where the error > >occurs. > >The problem with entity expansion is that the parser isn't parsing what the > user > >literally wrote into the entity definitions, it is parsing a > processed/"virtual" > >version, which *may* not be a real subpart of the document, so one has to map > >"virtual" locations/positions to physical (real document) positions, which > >doesn't seem trivial to me. It is also likely to give slightly confusing error > >messages, as it may be mentioning expanded stuff ("") which the user never > >wrote, the user may have written "<xxx>" etc. > > I don't think this is as much of a problem as you fear. Every entity is > physically declared somewhere in a real source - usually a good ol' file > on disc. Of course, that file may not be the one you started from ... > > My RunSP program (http://www.light.demon.co.uk/runsp) does exactly what > you describe (for nsgmls). It runs it under Windows and then allows the > user to navigate from one error message to the next, in a simple editor > environment that lets them sort out the problems they find. All I did > was to parse the error messages, pick out file name, line number and > character offset, and place a bookmark at the relevant point in the file > concerned. This works equally well for errors in the DTD or SGML > Declaration as for those in what we think of as the 'real document'. > (Which is something that never occurred to me when designing RunSP - but > of course the Declaration and DTD are equally part of the document as > far as the parser is concerned.) SP does exactly the sort of virtual location to physical location mapping that Jarle was talking about. For example, given a file test.xml: ]> &e2; nsgmlsu -e will report: In entity e2 included from test.xml:6:9 nsgmlsu:test.xml:3:16:E: "ELEMENT" declaration not allowed in instance The position it reports (column 16 in line 3 of test.xml) is the position of "ELEMENT" in test.xml. It has kept track of the fact that the 3rd character in the replacement text of e2 came from the 2nd character in the replacement text of e1 and that the 1st character in the replacement text of e1 was specified at line 3 column 16 of test.xml. Implementing this is not trivial. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Fri Nov 28 16:06:51 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:09 2004 Subject: Editing text References: <199711280528.QAA04939@jawa.chilli.net.au> Message-ID: <347ECD85.49069FD@jclark.com> Rick Jelliffe wrote: > But in the case of XML, we > can have our cake (the fans of strict, codified naming rules can exactly > specify what is allowed) *AND* eat it (bewildered parser-writers can just > use simple tokenizing). Not if they want to be conforming. All conforming XML processors are required to detect well-formedness errrors. If a XML document uses a character in a name that is not allowed, the document is not well-formed and every conforming XML parser is required to report it and is required not to process the document. I think it would be better if well-formedness allowed simple tokenizing to be used, and the detailed checking of name characters was needed only for validity, but that's not how the spec is currently. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Nov 28 16:21:40 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:09 2004 Subject: NameChar (was: Editing text) In-Reply-To: <199711281408.OAA00647@stevenson.cogsci.ed.ac.uk> References: <199711281408.OAA00647@stevenson.cogsci.ed.ac.uk> Message-ID: <199711281620.LAA00769@unready.microstar.com> Richard Tobin writes: > > The fastest solution would be to maintain a static 65,536 > > (or at least 32,768) entry array, with bit flags for different > > character properties. That would be fine for big programs, but it > > would kill Java applets > > Bear in mind that the main problem of size for Java applets is the > time taken for downloading, rather than the memory used at runtime. > So it may well be practical to store the data in a compact-but-slow > form and use that to initialise a large-but-fast lookup table. (I hear that memory _is_ a problem right now on Windows systems, since both Netscape and (especially) MSIE 4 bloat to ridiculous sizes, sometimes double or triple the typical 32MB of RAM on people's systems; however, an extra 64k or so would make little difference). The best optimisation will depend on your expected usage. If, for example, you expect that 80% of all characters would be <=0x007f, then Tim's approach of using a bit-array for those characters and jumping to a hairy lookup method for the rest would make sense; if, however, you expected that some documents might be almost entirely encoded with characters >=0x0080 (say, in Han Chinese characters), then a 64K lookup table would be necessary for acceptable performance. If you were keeping only one bit for each character, then you could encode a compact lookup table in only 4K. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Fri Nov 28 18:01:29 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:59:09 2004 Subject: XML Example and DTD Archive? In-Reply-To: <3.0.1.16.19971127010002.3f5f9ff6@pop3.demon.co.uk> (message from Peter Murray-Rust on Thu, 27 Nov 1997 01:00:02) Message-ID: <199711281800.KAA17411@boethius.eng.sun.com> [Peter Murray-Rust:] | Jon Bosak's Shakespeare, and religion are pre-eminent and are a good | test for whether a system can cope with 'real documents'. I haven't | looked at religion, but Shakespeare has a clean and natural markup | without attributes. So it's not a torture test. (I don't think there | are DTDs - I think I hacked my own). I don't think there is any mixed | content in Shakespeare The current distributions at http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/shakespeare.1.02.xml.zip http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/religion.1.02.xml.zip include the DTDs. For the curious, I append them below; they are achingly simple. Frankly, I have lost track of whether they are conformant with our current case rules; I think so, but I would be grateful for any corrections from the parser writers. Jon ======================================================================== ======================================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Fri Nov 28 19:54:44 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:09 2004 Subject: Revelling parser writers (was Rebelling) References: <3.0.1.16.19971128022334.36a76ba0@pop3.demon.co.uk> Message-ID: <347EE127.623D7A12@technologist.com> Peter Murray-Rust wrote: > [Not a grove at this stage, as > no one seems to write their parsers to create groves.] I'm not sure what this means. Building a grove is not the job of a parser. Typically the parser outputs the events and some other process builds the grove from the information. The only way a parser could be not written to create groves is if the parser did not output sufficient information to build a grove conforming to a particular grove plan. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Fri Nov 28 20:31:41 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:59:09 2004 Subject: EMBED and validation In-Reply-To: Message-ID: At 5:09 AM -0000 11/28/97, Simon St.Laurent wrote: >While there is no reason to expect the target to be XML (which I strongly >approve of), I have to wonder what's supposed to happen if the target _is_ >XML. If the target is another complete XML document, including a document >type declaration, then I can see the wisdom of parsing it separately and >keeping it separate. If the target is XML but not a complete document, for >instance a set of elements returned by a reference using XPointers, I'm not >sure about what the application should do. It's a quotation. One thing you could do is put an embedded scrollable window in the linking document, so that he reader sould read the entire linked-to document in context. Or you might want to format it inline as a "long quote" or something. or you might want to simply note that a citation was made in the form of a quote of a particular region of the linked-to document as part of a citation-gathering process. The Link records a relationship between a document and a portion of a another document. I think the term EMBED is fr from ideal because it encourages an operational definition that is not always appropriate (thought it is probably the proper definition for simple browsing apps). Asd with most generic markup, how it is to be displayed or processed is something that information providers and users must be free to change as supporting technology and the use of the document evolve. >Is the application supposed to treat this chunk as (hopefully) well-formed >XML >in a separate parsing process? If that makes sense. >Would it be legitimate for an application to >fold EMBEDded chunks into the document containing the link for purposes of >styling in particular but also validation in certain circumstances? Not for XML validation, ever, because XML validation is only done according to the rules in the XML standard. Your application and DTDs might require such an extra kind of validation, although I think that this would be a very bad decision for a general-purpose XML processor since that requirement will _not_ be hinired by many documents. Obviously, inline formatting is a reason for the processing hint intended by the word IMBED in the first place. > Many >situations will arise in which EMBEDded content needs to be styled, but the >chunk of XML referenced by the link contains neither document type >declaration >or styling information. To the extent this is done, such documents may be hard to process with soem applicaitions. However, there's nothing to prevent a resonable formatting script from being provided as part of the format specifiation for the linking document that can properly format the EMBEDed data. In fact, that would probably be a requirement for providing such documents to browsers. >My instinct is to be as conservative as possible and make sure that all XML >chunks EMBEDded by a link could be folded into the linking document without >making it invalid, but this is a more radical constraint than I expect most >developers would like. Leaving this behavior up to the application is >probably the only course available at present, but I suspect this practice >may >lead to considerable chaos. It will only lead to chaos if people assume that an application is responsible for figuring out what to do in such cases. if you are providing such documents as part of a publication process, you are well-served by providing stylesheets that will format the link _as you want_. if you are creating some form of repository, you need to document the intended meaning of such links so that future creators of presentation and interaction specifications can provide appropriate implementations for them. >XML-Link has opened up realms of capability that go far beyond those provided >by entities and notations, and I look forward to using them. Definitely. One thing that takes a while to get used to is the "declarative" way of thinking require to make effective use of XML and content markup generally. Then, once you've done that, you need to apply the same abstractions to hypertext structures: a link is not just shorthand for a particular interaction behavior, but a description of a relationship between document portions that might be displayed, analzed, or otherwise used in many different ways. This was common wisdom in the hypertext community, but is having to be rediscovered on the Web. (It's particularly ironic, since Tim Berners-Lee understood this from the beginning, even though he didn't invent it). -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Fri Nov 28 20:31:54 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:59:09 2004 Subject: EMBED and validation In-Reply-To: <3.0.1.16.19971128091613.34efaa0c@pop3.demon.co.uk> References: Message-ID: At 9:16 AM -0000 11/28/97, Peter Murray-Rust wrote: >You approve that it must/needNot be XML. For me the latter is essential. >Sometime ago I proposed an extra attribute MIME to describe the MIME type >of the target HREF. (Note that this is NOT always available from >contentType since it may be a local file. If this doesn't get into the >SPEC, I suggest we need an XDEV attribute and I proposed that 2 days ago... That's what NOTATION is for. Use an external entity, and makes its notation be the MIME type of the content, and then you're all set. >The more I think about this, the more I think we have to delineate the >possible actions and systematise them here. I think some people will want >to treat XML-LINK as simply like HTML, others will want automatic >inclusion. Since I am not a hypermedia expert, I am hoping to get some >guidance. This is all a question for the stylesheet/processing langagues, and not for XML per se. >>Is the application supposed to treat this chunk as (hopefully) well-formed >XML >>in a separate parsing process? Would it be legitimate for an application to > >As with all tricky questions on XML the answer is 'application-dependent'. >So - if we can agree some semantics here that would be very helpful. not strictly correct. the stylesheet processing is application dependent. The validation is _not allowed_ as part of XML validation. You can of course require that your application limit the valid XML documents that it will process, but then you are limiting the documents that it will process, which may not be a good idea. >>fold EMBEDded chunks into the document containing the link for purposes of >>styling in particular but also validation in certain circumstances? Many > >Yes, if the application has been written to do so :-) Styling yes, validation no. >I shall make something like this available in JUMBO. All the guts are >there, it's just agreeing on the public face - i.e. whether there is an >XDEV attribute > >Again it may be possible to request the application to supply styling and >DTD (e.g. through an XDEV attribute or PI). Again I'd like to see public >discussion on this. For an XML document, you can refer to the whole document and use extended pointers to pick out the linked sub-part -- this lets you get DTD, and content-type (via NOTATION). >>My instinct is to be as conservative as possible and make sure that all XML >>chunks EMBEDded by a link could be folded into the linking document without >>making it invalid, but this is a more radical constraint than I expect most This is not consevative, but radical, sit it imposes an ad-hoc constrain on linking, based on a limited processing model. >I think it is far better to have the semantics explicit and in the open, >rather than for different application developers to think what is best >here. This is an area where - without XDEV - we have severe problems of >interoperability. I know that a lot of people think that interoperable XML >applications is Quixotic, but *I* believe it's possible if we have the >communal will. Otherwise the average user will pick up application A and >find their HREFs folded in and swear and curse when application B doesn't. >Remember, if you don't like what I'm suggesting, you don't even have to >read it :-) No, interoperable in the sense you mean (interoperable without a particualar stylesheet to interpret against) is in fact a gigantic mistake that XML is designed to help peopole avoid. The point of XML is to separate rendering and other processing from document representation and semantics. This means that no viewer can process an XMl document for display wothout a separate specification of what display is desired. It will be possible to test application compatibility when applications take XSL document + XML document pairs. An XML document in isolation intentionally does _not_ have a single correct display. _Any_ display driven by a consistent transformation from the XML source is in some sense a sensible view _for some application._ >>developers would like. Leaving this behavior up to the application is >>probably the only course available at present, but I suspect this practice >may >>lead to considerable chaos. > >See the idealistic ideas above :-) Without a formatting specification you can't display an XML document "interoperably". With such a specification, it's a relatively simple matter of program correctness to determine whether you have or not. >Yup - it has revolutionised my thinking. It means I can through away 50% of >my code because there are general solutions. XML-LINK EXTENDED is even more >fun. I shall have some proposals there :-) I'm glad to hear this. None of the problems you are mentioning are insignificant, but they are almost all problems for XSL, and _not_ XML itself. -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Fri Nov 28 20:56:12 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:10 2004 Subject: Editing text Message-ID: <199711282054.HAA27168@jawa.chilli.net.au> > From: James Clark > I think it would be better if well-formedness allowed simple tokenizing > to be used, and the detailed checking of name characters was needed only > for validity, but that's not how the spec is currently. That sounds sensible: any chance of it James? It was discussed before, but in the salad days of case insensitity. There have been several proposals for what grain the naming rules should have: opinions range from "allow nearly everything" to "the grain of Unicode blocks" to "whatever Unicode says for identifiers" to "whatever the new ISO report on identifiers says" to "whatever the Java function does" to "almost nothing: just ASCII" to "lets look at each character individually and judge". Having quite a large grain (e.g., divide Unicode into 256 rows and disable or allow whole rows {but with special treatment for row 0}) also gets the SGML declaration into a less daunting size. This might be be good enough namechecking for XML, in line with the 80% rule. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Fri Nov 28 21:30:59 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:59:10 2004 Subject: XML Example and DTD Archive? Message-ID: <199711282129.NAA17698@boethius.eng.sun.com> Thanks to several people, especially Eve Maler, for pointing out XML errors in my play and tstmt DTDs. In particular: | > | | Illegal; must be . | | > | | Illegal; must be . | | > ]> ...

This is Pythagoras' theorem:
&pythagoras;

and I run it through a parser what will happen? The answer is parser-dependent. It might: - always include and validate external entities in which case there will be a validation error (MathML uses a different DTD from HTML). If the entity is valid, then it creates a 'single document' which is easy to search, etc. One disadvantage is that (for Java) the document could get too big for the JVM. - offer a commandline switch that allows inclusion of external entities OR defers their expansion to the application/processor. In that case the *application* has to be able to able to run a parser over the 'included' MathML. (JUMBO can do this at present - it can even use a different parser from the initial one, which may be useful if they have different behaviours). P. Note, of course, that an application may also want to run a validating parser over the targets of HREF and JUMBO can do this as well. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Nov 29 16:05:46 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:10 2004 Subject: XML-DEV membership In-Reply-To: Message-ID: <3.0.1.16.19971129170008.2f57ad26@pop3.demon.co.uk> At 06:14 29/11/97 UT, Simon St.Laurent wrote: [...] >nor does it sound like the readership of this list Henry tells me that the membership is about 500-600. It is a relatively unrewarding task managing any mailing list, and Henry has been much encouraged by messages of thanks for this service. We are both keen to see the scientific publishing process move beyond marks on paper, which is not yet a universal vision. We are making slow but steady progress in getting XML/CML known in the molecular sciences. At least it's now known and regarded as 'yet another file format'. So we have a little way to go yet. The likelihood that the rest of the world will adopt it will hopefully have a modest effect as well :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Sat Nov 29 20:36:47 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:59:10 2004 Subject: No subject Message-ID: >I don't think I've seen it explicitly suggested here, so here goes. If you >want to ensure that what's pointed to is real XML, and "belongs" in that >location, how about using a plain old external text entity? With a >validating XML processor, you can guarantee that (a) the entity will be >expanded in place before it even gets to the application and that (b) it >will be validated in context. Entities are extremely useful, to be sure, but don't offer the flexibility of XPointers by a long margin. The project I'm working on is really a feasibility study at this point, and that flexibility is key to this particular project. It could in fact be implemented with entities, but that would require creating far more files that we have hoped for, as the data we want to reference comes from different and overlapping portions of a small set of documents. Implementing this as entities would require far more maintenance whenever a change came down the pike. The processing model we'd like to see for EMBED is very similar to that used for a text entity, but it doesn't look like we'll be getting there soon. Entities and NOTATIONs serve their purposes, but XML-Link seems far more flexible, especially for our needs. Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Sat Nov 29 22:03:09 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:59:10 2004 Subject: EMBED and validation In-Reply-To: Message-ID: At 6:14 AM -0000 11/29/97, Simon St.Laurent wrote: >>It's a quotation. One thing you could do is put an embedded scrollable >>window in the linking document, so that he reader sould read the entire >>linked-to document in context. > >Yes, it is effectively a quotation. Long quotations, however, often have >more >structure and require more formatting than a short quotation in a print >document, and I would like somehow to preserve that structure. Providing an >embedded scrollable window is a good idea for things you COULD do, but is not >something that can be counted on. We don't plan to create applications >specific to this document set at present; though we may do so in the future, >this document set would still be likely to cross into foreign applications. I also listed several other things you could do with the link, such as formatting it inline. My point was that the _presentation_ of the linked data is a matter for the application and/or stylesheet -- not the XML document. Do stylesheets need to be able to process such stuff -- yes, that was the point I was trying to make. I chose the nested window as a main example because I think it's a good idea that is _not_ supported by current software, but that could be enabled by the this type of link. And it's a grewatr example of how you could instantly take your old-fashoned "long quotes" and turn them into "Browsers, Inc's Web-o-Namic Nested Dcouments (tm)" without changing your markup at all. >>As with most generic markup, how it is to be displayed or processed is >>something that information providers and users must be free to change as >>supporting technology and the use of the document evolve. > >While freedom to change is certainly valuable, freedom to work consistently >with a variety of applications is of considerably more importance on this >project. Communicating more clearly the way in which these documents should >be treated by applications appears to be a necessity, as XML itself >appears to >provide no such support, nor does it sound like the readership of this list >(with a few exceptions, of course) is particularly interested in providing >such support at this time. No, you need to make an appropriate decision about stylesheets if you are providing documents. Any requirements that you have for XSL would be well made public, as input to that standardization process, now underway. If you will be delivering your documents before the stylesheet work is complete you will have to work with prerelease software or roll your own, or use CSS or compile to HTML, or something else. The lack of such presentational details in XML itself is still a good thing. You are free to create content today that will work with XSL -- even when XSL does not exist yet -- and can design your own processor if you need to. On the other hand if you invent a bunch of "conventions" that import presentation details into your documents you will simply be doing work that you will at best have to throw away, and that at worst may lead to bad encodings of your document semantics and send you back to square one to re-markup your documents. XML is the content part of the equation, and that's what it's for. XSL and its possible competitors (there _will be competitors_, because formatting is a place the people will want to compete) will be the way to realize the presentations you prefer (whatever they are) using the same technology-independent source files. > >>However, there's nothing to prevent a resonable formatting >>script from being provided as part of the format specifiation for the >>linking document that can properly format the EMBEDed data. > >Formatting script? I think we were hoping to use something more in the order >of CSS or eventually XSL. While XSL will provide scripting capabilities, it >seems like we're piling on additional complexity and new problems for >applications. Though I haven't tried it yet, it seems like it will be an odd >challenge to create a specification for the linking document that will >contain >styles for linked information that isn't included as part of the parser tree >for the linking document, particularly if the type of information to be >linked >isn't known at the time the styles for the linking document are established. >It can be done; I just don't look forward to it. I used the term script to emphasize that any programmatic trasnformation for viewing is usable -- some people have been confused by my use of the term stylesheets for link-rendering, so I've tried to avoid using _only_ that term. Sorry for the confusion. CSS or XSL are _exactly_ where this whole problem belongs. >>if you are providing such documents as part of a publication process, you >>are well-served by providing stylesheets that will format the link _as you >>want_. if you are creating some form of repository, you need to document >>the intended meaning of such links so that future creators of presentation >>and interaction specifications can provide appropriate implementations for >>them. > >What I would like to be able to do is provide stylesheets and documentation >that can be understood by a variety of processing applications that will work >in a consistent manner with linked material. The paragraph above describes >quite neatly what I want; the rest of this conversation, however, has >indicated that you can't get there from here. No, it's indicated that without stylesheets or some other programmatic process, you can't get there. This is not a big surprise, as that's the point of content markup. Yes, stylesheets are going to have to handle links. In some cases you may have to write scripts to perform interactions you want, and embed them into stylesheets. That's just part of the work of document delivery. >>>My instinct is to be as conservative as possible and make sure that all XML >>>chunks EMBEDded by a link could be folded into the linking document without >>>making it invalid, but this is a more radical constraint than I expect most > >>This is not consevative, but radical, sit it imposes an ad-hoc constrain on >>linking, based on a limited processing model. > >Radical? Radically practical, or so I thought. I'm hardly saying that >developers _should_ obey this, or that application developers should >implement >a limited processing model. Being conservative in this instance means >accepting a reasonably loose set of rules designed to make certain that >documents can still be processed in a wide variety of application processing >contexts. As I'm developing document sets here, and not applications per se, >I'm not sure this ad-hoc constraint is anything but a simple concession to >the >vagaries of the standard. If you're just talking about a rule that you want to adopt as part of your authoring process, there's no problem. You posed a question about software, and so I answered in terms of how the software should work. If you're a document provider, you will need to specify how to format linked material if you want it formatted inline. If you want it to just show up pre-scrolled in another window, I expect that XSL will have away to do that -- then you'll lose some context, but gain by having simpler stylesheets. This is a tradeoff that is independent of linking strategy. If you intend to use inline formatting, and are writing the stylesheets as well as the documents, such a discipline may well make your documents cleaner, and your stylesheets simpler The question is what these presentation details have to do with XML? >I had hoped the standard would be clearer on these issues, but the wide >latitude given applications will have a dramatic, though not especially >painful, impact on this document set and others I may create. Paul Prescod >pointed out that yes, of course, applications CAN follow several of the >models >I proposed, but that this behavior cannot be counted upon. CAN is not good >enough in many situations, so I'll develop the document set so that it WILL >work regardless of the processing model applied. Seems simple enough, though >it requires some extra effort. Paul is right, but applications that purport to implement XSL, however, will not have so much latitude when the are processing a document according to a stylesheet. In fact, they will be constrained by the XSL standard in exactly what liberties may be taken. So I think you're worrying about a non-problem, as long as you will be providing stylesheets for your documents. >Sooner or later I'll write some applications, and maybe I'll be able to take >advantage of the freedom given application developers. In the meantime, I'll >explore the constraints set upon document developers that are imposed by that >freedom. The discipline will probably produce better DTDs and documents >anyway. I don't really know what this last means, but certainly we all look forward to imporved and richer displays once the semantics of documents and the format for displaying them can be separated. -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Sat Nov 29 22:03:15 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:59:11 2004 Subject: EMBED and validation In-Reply-To: <3.0.1.16.19971129164634.20effec6@pop3.demon.co.uk> References: <3.0.32.19971129100857.00ab9150@village.doctools.com> Message-ID: At 4:46 PM -0000 11/29/97, Peter Murray-Rust wrote: >The only area of fuzziness is what the default and optional behaviours of a >parser (sic) are. If I write: I don't think there is any fuzziness at all. > > >]> > >... > >

This is Pythagoras' theorem:
>&pythagoras; >

> > > >and I run it through a parser what will happen? The answer is >parser-dependent. It might: > - always include and validate external entities in which case there >will >be a validation error (MathML uses a different DTD from HTML). If the >entity is valid, then it creates a 'single document' which is easy to >search, etc. One disadvantage is that (for Java) the document could get too >big for the JVM. If the MathML elements are not declared in the DTD, _no_ validating parser can ever accept this as legal. > > - offer a commandline switch that allows inclusion of external >entities OR >defers their expansion to the application/processor. In that case the >*application* has to be able to able to run a parser over the 'included' >MathML. No, external entities are parsed in place. WF-only applications might not follow the entities (under user choice, whether interactive or command-line), or they might folliow them and present the information. Parsing relative to a different DTD would be unfortunate behavior, since validation should be done according to the rules of XML. Of course, a WF application might jsut swallow the elements and use its own stylesheet language to format some math. >(JUMBO can do this at present - it can even use a different parser from the >initial one, which may be useful if they have different behaviours). You mean if they have bugs? >Note, of course, that an application may also want to run a validating >parser over the targets of HREF and JUMBO can do this as well. sure... it could, but that would be odd, since you can't include a _valid_ XML document into either a valid or a well-formed document, since the Doctype delcaration is not legal in the isntance. You would have to to refer to the external entity using an ENTITY attribute, rather than expanding it via an entity reference if you want to make valid use of this kind of processing based on entities. -- David _________________________________________ David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com Boston University Computer Science \ Sr. Analyst http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams --------------------------------------------\ http://www.dynamicDiagrams.com/ MAPA: mapping for the WWW \__________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Sat Nov 29 23:06:40 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:11 2004 Subject: Revelling parser writers (was Rebelling) In-Reply-To: Message-ID: Simon St.Laurent wrote: > One key piece of the XML puzzle that has consistently driven me crazy is the > lack of explanation for which part of an application is supposed to handle > which part of processing. You may want to look at the MONDO architecture and processing model. Its components and flows (IMO) subsume all the processing models I have seen for SGML and XML documents. It is also unusually flexible and general. The basic "forward" (from text to application functionality) flow is: 1. [Parser] Parse the text (say XML) and turn it into a recipe (what objects to build and what ingredients to use) 2. [ObjectBuilder] Build the recipe and construct objects within the ObjectBase 3. [ObjectBase & App] Interact with the resulting objects Note that the recipe is usually virtual: The interface between (1) and (2) could be approximated with parse event notifications. The interface between (2) and (3) is done (usually) with Factories that know how to build particular types of objects. As an example of an ObjectBuilder, a GroveBuilder is a particular type of ObjectBuilder that builds a Grove-based object model (possibly using a GroveObjectFactory). --- At the point of (3) the application can do whatever it wants, but it is likely to want to: 3.a. Visit the objects [traverse from one to another doing some task] 3.b. Inspect their properties 3.c. Modify the objects or ask for more sophisticated behavior 3.d. Create new objects that transform the old ones 3.e. Produce changes to the world outside of the ObjectBase. For example: 3.e.1 Present the objects to the UI 3.e.2 Write the objects to a database 3.e.3 Convert the objects to an external stream Although not complete, the above describes common behavior that applications are likely to want to do with information. The high level architecture and component responsibilities can be useful for organizing an application. There are also well known techniques and available code for all of these pieces of functionality. MONDO itself is supplying an architecture, frameworks, and some of the functionality listed above. But many tools could do the same work. ====== Another good source for architecture and flow models for SGML/XML is: Developing SGML DTDs: From Text to Model to Markup. Eve Maler and Jeanne El Andaloussi. Prentice Hall, Upper Saddle River, NJ, 1996. ISBN: 0-13-309881-8 This is one of my favorite SGML books. It describes how to think about and put together SGML processing systems from the for-everyone basics to large-scale system issues. It also very readable. ====== For information on MONDO, see: http://www.chimu.com/projects/mondo/ You may also be interested in: http://www.chimu.com/publications/oopsla96tutorial23/ That tutorial slides may be a little impenetrable if you can not see the connection to an XML/SGML flow. SGML/XML and the MONDO architecture populate the DomainModel layer of the system. The rest of the architecture then shows how you can present or store that DomainModel. The architecture may look a bit like overkill but it actually reduces to simpler models easily or parts can be fully realized without much pain (for example, by using a good MVC UI framework). --Mark mark.fussell@chimu.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Sat Nov 29 23:36:23 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:11 2004 Subject: EMBED and validation In-Reply-To: Message-ID: Peter Murray-Rust wrote: > >Note, of course, that an application may also want to run a validating > >parser over the targets of HREF and JUMBO can do this as well. David Durant wrote: > sure... it could, but that would be odd, since you can't include a _valid_ > XML document into either a valid or a well-formed document, since the > Doctype delcaration is not legal in the isntance. I think Peter Murray-Rust was suggesting that a running application may want to subsequently read in and process another XML document based on a reference to it (at a semantic level) in a first document. That is at a stage after the actual "Parsing", but not very far after (in MONDO it is called Building) because the application simply wants to see all the information together when the stage is done. As people have mentioned, one of the difficulties is in seperating what SGML and XML as "parsers"/technology do from what SGML and XML as "concepts" (all the possible applications) encompass. I think the terms are commonly used as concepts and only rarely used to mean the precise technology. As precise technology XML is currently just a semi-configurable parser specification, so what ever back end you want to place on a parser is up to you. The rest is all nebulous "spirit". I think this is a bit difficient. Even if flexibility should be allowed, some precise definition of the goals and meaning of XML markup would be useful to those building applications. Having a general parser is not that useful (parsers are pretty easy to create nowadays), but having a general model for encoding information and interpretting the meaning of that information (I feel) is extremely useful. A standardized DTD plus common applications for that DTD provide an interpretation for a particular domain (either large [TEI] or small [HTML]). I believe MONDO provides a useful overall picture that provides structure and meaning even before the applications have been developed. --Mark mark.fussell@chimu.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Nov 30 00:04:18 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:11 2004 Subject: EMBED and validation In-Reply-To: References: <3.0.1.16.19971129164634.20effec6@pop3.demon.co.uk> <3.0.32.19971129100857.00ab9150@village.doctools.com> Message-ID: <3.0.1.16.19971130004046.2237dd84@pop3.demon.co.uk> At 17:02 29/11/97 -0500, David G. Durand wrote: >At 4:46 PM -0000 11/29/97, Peter Murray-Rust wrote: >>The only area of fuzziness is what the default and optional behaviours of a >>parser (sic) are. If I write: > >I don't think there is any fuzziness at all. Well, please pardon my slowness and be patient - it has taken me a long time to get this far with SGML. The spec repeatedly uses the word 'may', which I take to be optional behaviour (e.g. 4.3.3 'may, but need not, include the entity's replacement text.' I expect that some parsers may allow the user to decide, some may take unilateral action. Perhaps 'fuzziness' was the wrong word - a 'variety of options with which the user may be confronted' might be more accurate. Other actions which a parser 'may' take could include: - whether to read the external DTD subset - whether to read the internal subset - whether to validate - whether to expand the external entities or not Some of these may be defined clearly in the new spec, some may not. It may be that most parsers end up with a list of commmandline options like sgmls. >> [...] >>be a validation error (MathML uses a different DTD from HTML). If the >>entity is valid, then it creates a 'single document' which is easy to >>search, etc. One disadvantage is that (for Java) the document could get too >>big for the JVM. > >If the MathML elements are not declared in the DTD, _no_ validating parser >can ever accept this as legal. Fair enough - what I wrote was incorrect :-) Sorry. > >> >> - offer a commandline switch that allows inclusion of external >>entities OR >>defers their expansion to the application/processor. In that case the >>*application* has to be able to able to run a parser over the 'included' >>MathML. > >No, external entities are parsed in place. WF-only applications might not >follow the entities (under user choice, whether interactive or >command-line), or they might folliow them and present the information. >Parsing relative to a different DTD would be unfortunate behavior, since >validation should be done according to the rules of XML. > >Of course, a WF application might jsut swallow the elements and use its own >stylesheet language to format some math. Understood. Thanks. > >>(JUMBO can do this at present - it can even use a different parser from the >>initial one, which may be useful if they have different behaviours). > >You mean if they have bugs? No. They may deliberately have different behaviours. Some may be very good at handling large documents, others may be validating and possibly slower. Some may offer more information as a result of the parse. P. Thanks for the help - I keep learning :-) Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gfrer at luna.nl Sun Nov 30 00:26:37 1997 From: gfrer at luna.nl (Gerard Freriks) Date: Mon Jun 7 16:59:11 2004 Subject: EMBED and validation In-Reply-To: <3.0.1.16.19971130004046.2237dd84@pop3.demon.co.uk> References: <3.0.1.16.19971129164634.20effec6@pop3.demon.co.uk> <3.0.32.19971129100857.00ab9150@village.doctools.com> Message-ID: As an outsider I follow the discussions about the topic. Within Health CAre I forsee a need to achieve the following: - there will be one Universal DTD (or whatever) - based on this one DTD users will select portions of it to construct messages - these messages might contain other messages or references to it - depending on circumstances decided upon by the user he might or might not want to view the whole collection of data as one piece (merged) or as data plus references - messages will be added to a receiving master patient record and either be shown as references or merged. So which way you organise it, I don't mind. And Oh Yes. We in medicine count upon the fact that all DTD's and subDTD's will be stored in an Internet repository. Keep up the good work :-) Gerard Freriks Gerard Freriks,huisarts, MD C. Sterrenburgstr 54 3151JG Hoek van Holland the Netherlands Telephone: (+31) (0)174-384296/ Fax: -386249 Mobile : (+31) (0)6-54792800 ARS LONGA, VITA BREVIS xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sun Nov 30 01:19:55 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:11 2004 Subject: WORA help required Message-ID: <3.0.32.19971129172115.00955a70@pop.intergate.bc.ca> At 04:25 PM 29/11/97, Peter Murray-Rust wrote: >C javac/jvc > > I have problems compiling Lark V097 under javac. It throws a compiler >error trying to load Lark.class. (java.io.UTFDataFormatException). This >suggest that Lark which may have been compiled with jvc (Tim?) will not >load with javac. (I does *run* with java) OK, I've figured out what's going on. Lark.java contains some compiled data structure, stored, for compactness, as strings. Example: static final String sOCT = "\u003c\u0302\u3c03\u0321\u0903\u2f84\u033f\u0405\u3f07\u063f\u0807" + "\u3e8f\u083e\u8f09\u2d0a\u0944\u0109\u5b0e\u0a2d\u0b0b\u2d0c\u0c2d" + "\u0d0d\u3e8f\u0e43\u010f\u5b10\u105d\u1111\u5d12\u123e\u8f15\u3e8f" + "\u1550\u1615\u531b\u155b\u2116\u5501\u1822\u1f18\u2524\u1827\u1e19" + "\u221f\u1925\u2419\u271e\u1b59\u011d\u221f\u1d25\u241d\u271e\u1e27" + "\u8f1f\u228f\u203e\u8f20\u5b21\u2125\u2421\u3c27\u215d\u2222\u3e8f" + "\u225d\u2323\u3e21\u253b\u8f26\u3e8f\u2721\u2827\u3f04\u282d\u2d28" + "\u4101\u2845\u5c28\u5b29\u2925\u2429\u492a\u2a47\u012a\u4e01\u2b5b" + "\u212c\u5b21\u2d2d\u2e2e\u2d2f\u2f2d\u3030\u3e21\u3225\u2433\u3e21" + "\u3425\u2434\u3e21\u3625\u2436\u2837\u3643\u0136\u453d\u3649\u4636" + ... and so on for lots more. Possibly javac detects that one or more of the characters may not be legal per Unicode? Or it's just tough to compile... I can repeatably force jvc to generate bogus code by changing the *indentation* of the stuff above :) I stopped using javac when I got Sun's "fastjavac" that comes with jws, it has never had this problem. I'll give it a try and report. Peter, is the use of javac strategic to you at this time? -T. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Sun Nov 30 01:19:59 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:11 2004 Subject: WORA help required In-Reply-To: <3.0.1.16.19971129162530.1fef159c@pop3.demon.co.uk> Message-ID: Peter Murray-Rust : > I am having a number or problems turning JUMBO into a WORA-compliant animal > (WriteOnceReadAnywhere).... You might want to also ask these questions on advanced-java. There are a lot of good people on that list and your questions are interesting and applicable [the only ones "questionable" to the charter of the list have to do with applets, but your code is for both]. I will try to answer some of them here. All my answers will be 1.1 oriented since this is all I have been using recently. > A JDK1.02/JDK1.1.x > I have refrained from converting to 1.1. since I have been told that not > all browsers supported it. Is this still true? Or should I convert now? I think you will be "trapped" supporting 1.02 for a while if you want to support as many browsers as possible. For example, I still use Netscape 3.x because it is extremely stable. By my web logs that a lot of 3.x (and earlier) browsers still out there. Note that at the VM level there are very, very few changes between 1.0 and 1.1. The real problem is that the class libraries have migrated and if you migrate also you will not be backward compatible. > B.1 Is there a function I can call to tell whether I am in an applet or > application? Yes (maybe) but I don't remember what it is :-( It also somewhat depends what you want to know. You can find out about the overall environment with System.getProperties() [if you want to find out about the host VM which may indicate appletness] and you can find out about the ClassLoader and SecurityManager from their respective sources. I think the only object that really knows its an Applet is the Applet itself, so to propogate this knowledge outward requires either a web of associations to the Applet or some "static" information. The later can be very clean if you simply have a registry where you can put applet information (which obviously includes their existence). It sounds like you are doing something like that anyway. Again, I recall there existing another approach but don't remember. Someone on advanced-java would probably know. > B.2 I use ancillary files located in the *.class directories (e.g > icon.gif). A nice extension in JUMBO is a per-class schema.xml file, with > additional class information. Since CLASSPATH may contain many components, > how can I tell which component was used for the class I am now running, so > I can locate these files? Use Class#getResource(...). If you are in a particular object: URL url = this.getClass().getResource(relativePathName+fileName); The relativePathName should be from the current Class (in your case it would probably be empty). If you are in a static method you need to explicitly specify the class object: URL url = ThisClass.class.getResource... Because static methods are not connected to any object (they are completely resolved at compile time). > D java/jview > There are significant differences here, especially with filenames/URLs. No solutions, just a couple comments. The 100% Pure tester warns about all hardcoded '/' and '\', so these are obviously considered non-WORA. For most of these you can use either the functionality within a class (File concatenation) or the System.properties: *
file.separator
File separator ("/" on Unix) *
user.home
User home directory *
user.dir
User's current working directory *
java.home
Java installation directory URL's should generally work with standard URL notation (it is up to the implementation to work correctly). --Mark mark.fussell@chimu.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Nov 30 08:31:16 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:11 2004 Subject: WORA help required In-Reply-To: References: <3.0.1.16.19971129162530.1fef159c@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971130091051.3e6f6f2e@pop3.demon.co.uk> At 17:19 29/11/97 -0800, Mark L. Fussell wrote: Many thanks Mark. > [...] > >No solutions, just a couple comments. The 100% Pure tester warns about >all hardcoded '/' and '\', so these are obviously considered non-WORA. >For most of these you can use either the functionality within a class >(File concatenation) or the System.properties: > *
file.separator
File separator ("/" on Unix) > *
user.home
User home directory > *
user.dir
User's current working directory > *
java.home
Java installation directory At one stage I *did* use File.separator. However I don't think it's relevant, except when the system interacts directly with the local file system. I converted this to SLASH (i.e. '/'), since this is what is required in URLs. My problem was/is that I believe some of the class libraries to be buggy on some implementations (or at least poorly documented). > >URL's should generally work with standard URL notation (it is up to the >implementation to work correctly). If I come up with what I think is a bug, I'll post it :-) P. > >--Mark >mark.fussell@chimu.com > > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gfrer at luna.nl Sun Nov 30 09:14:26 1997 From: gfrer at luna.nl (Gerard Freriks) Date: Mon Jun 7 16:59:11 2004 Subject: EMBED and validation In-Reply-To: References: Message-ID: In my parlance I, allmost, equate DTD's and Datamodels. Since DTD's and Tag's are derived from the same Datamodel and use the same Terminologies (Concepts) I think we both agree that XML might work in this respect. GF >Have you looked at the RDF metadata work from the W3C? It might be the >case that you could have multiple DTDs in use, and tie them together using >RDF statements which document their various elements using formal >standardised vocabularies from various fields of medicine. UMLS provides >one possible knowledge-base for constructing RDF schemas. So... what I'm >sketching is a world in which there are many DTDs, many DTD-less >documents, but semantic interoperability achieved by using RDF to >associate DTDs or chunks of XML with thesauri or classification schemes >from various domains. Does this sounds a plausible alternative to the >single-DTD vision you outline below? > >Dan Gerard Freriks,huisarts, MD C. Sterrenburgstr 54 3151JG Hoek van Holland the Netherlands Telephone: (+31) (0)174-384296/ Fax: -386249 Mobile : (+31) (0)6-54792800 ARS LONGA, VITA BREVIS xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Sun Nov 30 10:02:56 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:11 2004 Subject: Revelling parser writers (was Rebelling) In-Reply-To: Message-ID: Another book that has descriptions of the structure and flow of document processing is: ABCD...SGML: A User's Guide to Structured Information Liora Alschuler Thomson, London, England. 1995. ISBN: 1-850-32197-3 This book divides the possible states between nine boxes resulting from two axes: Input, Manage, Output and WYSIWYG, SGML, Data. Different projects are then evaluated in how the information moved between the boxes. --Mark mark.fussell@chimu.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Sun Nov 30 10:30:46 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:11 2004 Subject: Response to Simon St.L. on Entities v. XLL Message-ID: <199711301029.VAA06940@jawa.chilli.net.au> > From: Simon St.Laurent > The processing model we'd like to see for EMBED is very similar to that used > for a text entity, but it doesn't look like we'll be getting there soon. > Entities and NOTATIONs serve their purposes, but XML-Link seems far more > flexible, especially for our needs. There is actually a fundamental difference in paradigm between XLL and entities too! * The SGML entity mechanism is based on having type information as part of the declaration of the entity, not in the entity reference and not in the entity itself. * The XLL mechanism (well, I should say the MIME mechanism really) is based on the entity being self-identifying as to type (aided by any additional attributes you like on the linking element). The first way works best in heterogenous and dumb systems, and large systems where you need to keep track of entities in one place (i.e. it is moving constants to the prolog of the document). The second way is more appropriate for the Web. So XML/XLL is very rich. I think it is important to note that even though XML is "SGML for the Web", it has always been assumed, I think, that XML will be powerful enough to be more than just a delivery format-- it has entities, for example, to allow it to be used for simple processing before and after it gets sent over the web. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Nov 30 11:01:11 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:11 2004 Subject: EMBED and validation In-Reply-To: References: <3.0.1.16.19971130004046.2237dd84@pop3.demon.co.uk> <3.0.1.16.19971129164634.20effec6@pop3.demon.co.uk> <3.0.32.19971129100857.00ab9150@village.doctools.com> Message-ID: <3.0.1.16.19971130115815.3e6f75be@pop3.demon.co.uk> At 01:25 30/11/97 +0200, Gerard Freriks wrote: >As an outsider I follow the discussions about the topic. Welcome Gerard, We do not have 'outsiders' here :-). We welcome the diversity and crossfertilisation from other disciplines. > I was invited to a Drug Information Association mtg 2 weeks ago about e-submissions for new drugs. There was a lot of excitement about XML. :-) >Within Health CAre I forsee a need to achieve the following: I assume you are familiar with the HL7 effort - I believe they are seriously thinking of using XML. >- there will be one Universal DTD (or whatever) >- based on this one DTD users will select portions of it to construct messages >- these messages might contain other messages or references to it >- depending on circumstances decided upon by the user he might or might not >want to view the whole collection of data as one piece (merged) or as data >plus references >- messages will be added to a receiving master patient record and either be >shown as references or merged. I think this is a very general concern among the XML/SGML community. A useful concept is 'information objects' or 'DTD fragments' [please correct me if these are not identical :-)]. Essentially they are 'Pick-N-Mix' DTDs, which you combine for your own purposes. Thus in submitting a new drug, you have to submit clinical records, manufacturing processes, personal data, documents, safety, statistics, and (yes) chemistry. IMO it is impossible to create a single DTD that covers all of this. These are all different and complex disciplines and it is much better to re-use the work that people who are experts have done. (So, gratifyingly, there was interest in using CML for drug submissions.) I would therefore strongly advise people not to develop a multidiscipline DTD at present without looking carefully at what is being done by the specialist communities. That may even extend to textual passages (at least for technical documents). For example I use XMLised HTML for all my chemical stuff rather than invent my own , , etc. The technical problem of how these are combined in any given document is a very active concern of the W3C and related community. The problem is that if you simple combine all the relevant DTDs you will get name clashes. E.g. <A> means anchor for HTML, may mean Answer for someone else, may mean Author for another. If these are blindly combined, the validation will fail (DavidD has pointed this out clearly). Two current XML ways to get round this are: XLL, where sections from different DTDs are XML-LINKed, rather than being merged or included via entities. If the two components are to be jointly displayed or otherwise combined the application has to be quite flexible. JUMBO does this by using different java.awt.Frames to display them. Namespaces. the W3C/XML community is investigating namespaces as a way of tackling this. There are no firm recommendations yet, so treat this with great caution. The formal position is (XML 2.3) that 'the colon character is [...] reserved for experimentation with name spaces'. So, if JUMBO (which is nothing if not experimental :-) is given two elements <MathML:VAR> and <CML:VAR> it knows they are different and can also link to different 'schema' files which will tell you about the different namespaces (and will enable namespace-dependent display). > >So which way you organise it, I don't mind. > >And Oh Yes. >We in medicine count upon the fact that all DTD's and subDTD's will be >stored in an Internet repository. Absolutely essential. The curation of DTDs and semantics (e.g through terminology) is a critical part of markup. Most DTDs are semantically void (the semantics are added through prose) and this worries me. I therefore see the need for additional representation of semantics in machine-readable form, and XML is the obvious format. Therefore JUMBO is able to read 'schemas' (which use DTD information if available) and include help, datatyping, etc. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Nov 30 11:31:08 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:12 2004 Subject: Response to Simon St.L. on Entities v. XLL In-Reply-To: <199711301029.VAA06940@jawa.chilli.net.au> Message-ID: <3.0.1.16.19971130122745.2ec75a98@pop3.demon.co.uk> At 21:29 30/11/97 +1100, Rick Jelliffe wrote: > > >* The SGML entity mechanism is based on having type information as part >of the declaration of the entity, not in the entity reference and not in the >entity itself. I am very interested in automatic Typing of information components and think that this will be a very active area for the XML community. XML(SGML) entities (NOTATION) have traditionally used PUBLIC and FPIs (Formal Public Identifier) for adding type information. This works if there is a registry of FPIs for this purpose. Without it is not much use. My impression - and I'm happy to be corrected - is that there are few useful FPIs for Typing objects. Using a SYSTEM Id is subject to the problem of permanence and uniqueness of URLs. > >* The XLL mechanism (well, I should say the MIME mechanism really) is >based on the entity being self-identifying as to type (aided by >any additional attributes you like on the linking element). Unfortunately, not all targets of XLL HREFs will be self-identifying. This is true of local files and not-very-smart-servers. It is therefore useful for the author to be able to add MIME types to the target. As yet, MIME is not part of the XLL mechanism. I wish it was, and keep squeaking for it. If it isn't I suggest we use XDEV:MIME as a FUA 'frequently used attribute' in XML-LINKs. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Patrice.Bonhomme at loria.fr Sun Nov 30 16:57:29 1997 From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme) Date: Mon Jun 7 16:59:12 2004 Subject: Entities vs #PCDATA with msxml 1.6 ? Message-ID: <199711301657.RAA08971@chimay.loria.fr> Hi, I have a problem with msxml 1.6. If i put only one entity within an element, this element must be able to contain some PCDATA because msxml considers an entity as a piece of PCDATA ! But if i have: <?XML VERSION="1.0" ?> <!DOCTYPE EXAMPLE [ <!ELEMENT EXAMPLE (P+)> <!ELEMENT P (S+)> <!ELEMENT S (#PCDATA)> <!ATTLIST S ID ID #IMPLIED> <!-- ENTITY incs SYSTEM "inc-s.xml" --> <!ENTITY incs "<S>A third in a new paragraph.</S>"> ]> <EXAMPLE> <P><S ID="s1">A sentence.</S><S ID="s2">An another.</S></P> <P>&incs;</P> </EXAMPLE> I get this message: % java msxml2 -i -d test-ext-ent.xml Invalid element 'PCDATA' in content of 'P'. Expected [S] Location: file:test-ext-ent.xml(12,5) Context: <EXAMPLE><P> And with this one, it works (just because P contains PCDATA in its content !): <?XML VERSION="1.0" ?> <!DOCTYPE EXAMPLE [ <!ELEMENT EXAMPLE (P+)> <!ELEMENT P (#PCDATA | S+)><!-- <<= here --> <!ELEMENT S (#PCDATA)> <!ATTLIST S ID ID #IMPLIED> <!-- ENTITY incs SYSTEM "inc-s.xml" --> <!ENTITY incs "<S>A third in a new paragraph.</S>"> ]> <EXAMPLE> <P><S ID="s1">A sentence.</S><S ID="s2">An another.</S></P> <P>&incs;</P> </EXAMPLE> Is there something broken in the msxml kingdom ? Pat. -- ============================================================== bonhomme@loria.fr | Office : B.228 http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 -------------------------------------------------------------- * Projet Aquarelle : http://aqua.inria.fr * Serveur Silfide : http://www.loria.fr/Projet/Silfide ============================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Nov 30 22:48:03 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:12 2004 Subject: Response to Simon St.L. on Entities v. XLL In-Reply-To: <199711302224.OAA13898@dynamicdiagrams.com> Message-ID: <3.0.1.16.19971130231621.20575406@pop3.demon.co.uk> At 14:24 30/11/97 -0800, David G. Durand wrote: [...] >I don't understand why (if you are putting the information in the >source document) you don't simply use NOTATION, which works very well >with XLL without the need to invent your own private attribute convention. >SGML entity declarations allow the association of a type with a >destination in the source document. Untyped XLL links should only be >used in cases (and they exist) where the HTTP MIME type information is >preferable to static in-document declaraions. I'm obviously missing something very fundamental here. If I have a document foo.xml on my file system and it contains: <FOO>Plugh! Y2?</FOO> that's all. What sort of file is it? It is not self identifying. However it's a legal XML file. Suppose I know that and I want to process it as XML, I need to be able to tell software that it is XML. If we tell the software that it is of type "text/xml" this is understood by millions of browsers over the world. If I refer to it by: <!NOTATION XML PUBLIC "-//W3C//DTD XML Version 1.0//EN" "http://www.w3.org/TR/WD-xml-971117"> then I do not know any software in the world that will work out what the file type is. My point is that browsers and mailers use MIME types. They don't use FPIs. Unless the file includes its own MIME type, how do I find the MIME type of a file using NOTATION, especially when the file is not on a server but on local filestore? I can't see the objection to telling software what the MIME type of a file is :-) P. > > -- David------------------------------------------+---------------------------- >David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com >Boston University Computer Science | Dynamic Diagrams >http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/ > | MAPA: mapping for the WWW > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)