From SimonStL at classic.msn.com Sun Nov 2 01:45:01 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:58:47 2004
Subject: > as end tag
Message-ID:
While looking over the release notes for the 31 October 97 version of the Java
MSXML parser, I noticed that they've added a 'feature' that allows for 'Short
end tags,' using >. This won't be too difficult to implement, perhaps, but
it seems like an odd break with XML's (so far) rather strict rules for start
and end tags, particularly 3.1 of the 7 August 97 Working Draft:
>The end of every element may (for elements which are not empty, must) be
marked by an end-tag containing >a name that echoes the element's type as
given in the start-tag...
>Well-Formedness Constraint - GI Match:
>The Name in an element's end-tag must match that in the start-tag.
Is this something new going on with the spec, or is it just Microsoft? It
looks like they fixed a lot of the bugs, but this may introduce some new
problems. (They also allow ampersands in PCDATA, as long as they're 'not
followed by a valid name character.) It seems a little early for XML to begin
fragmenting.
Source: http://www.microsoft.com/standards/xml/xmlchgs.htm.
Simon St.Laurent
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From btrafford at worldnet.att.net Sun Nov 2 06:10:04 1997
From: btrafford at worldnet.att.net (Ben Trafford)
Date: Mon Jun 7 16:58:47 2004
Subject: Unusual error with MSXML
Message-ID: <345C18F0.DFF9C0DD@worldnet.att.net>
Hello!
Has anybody tried out Microsoft's latest download of MSXML? I'm finding
that parsing the HTML 4.0 DTD causes it to crash out. Here's the error:
JVIEW caused an invalid page fault in module MSJAVA.DLL at
014f:7c009445.
Anybody else seeing this error? Any ideas why? For info's sake, the
version of MSJAVA.DLL I'm using is:
4.79.1518 of May 5th of 1997.
--->Ben Trafford
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Sun Nov 2 07:00:42 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:58:47 2004
Subject: MSXML comments
Message-ID: <345C2331.9C0F2D9A@jclark.com>
I played a little today with MSXML and have a couple of suggestions:
- MSXML wrongly rejects this
]>
It appears to require notations to be declared before use in entity
declarations rather than just declared in the DTD. The XML spec could
probably be clearer here, but this definitely is not desirable: you
often need to declare external entities in the DTD subset that use
notations declared in the DTD. It's also incompatible with SGML.
- It appears to be impossible to prevent MSXML performing certain
validity checks. Worse, MSXML appears to apply Draconian error handling
to validity errors not just to well-formedness errors. This makes it
impossible to parse some well-formed XML documents. For example:
]>
I would suggest that applications should be able to control whether
validation is performed. I would also suggest that validity errors not
be handled as fatal errors using exceptions; instead, the parser should
continue processing in the presence of validity errors, and should make
information about validity errors available in the object model.
- It requires the version attribute in the XML declaration to be in
upper-case. The draft still has "version" in lower case.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Mon Nov 3 01:21:42 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:47 2004
Subject: ANNOUNCE: New MSXML Java Parser Available
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCA7@red-17-msg.dns.microsoft.com>
See http://www.microsoft.com/standards/xml/xmlparse.htm for details.
I would like to thank all those people who sent bug reports and
suggestions. This is a newer version of the parser than the one
included in IE 4.0. A lot went into this version including:
* Case sensitivity
* Conditional sections in the DTD (INCLUDE and IGNORE keywords)
* Full support for NAMESPACES (see
http://www.microsoft.com/standards/xml/Namespaces.htm).
* Support for the ENCODING attribute on the XML tag
* Support for the XML-SPACE attribute in regular XML and in the
DTD
* Support for the RMD attribute on the XML tag
* Support for W3C DOM ignorable whitespace nodes.
* Support for processing of external text entities.
* Optimization on Windows that makes parsing 4 times faster.
Other non-spec experimental things that were added:
* New Document save options for COMPACT and PRETTY save formats
(the default save option uses the ignorable whitespace nodes to save in
exactly the same format as the original document).
* Support for floating ampersands, e.g., "This & that"
* Support for empty end tags, e.g., bar>
* New helper classes like ElementCollection, TreeEnumeration, etc.
For a detailed description of changes see
http://www.microsoft.com/standards/xml/xmlchgs.htm.
Enjoy !!
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Mon Nov 3 01:27:10 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:47 2004
Subject: > as end tag
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-msg.dns.microsoft.com>
This is totally optional and experimental. The only rational is that for
large documents or documents with long tag names, this saves a lot of bytes.
Think of it as a kind of compression technique that would only be enabled
when both ends of the pipe can handle it.
As for the ampersands, this is a real problem. We found with our experience
with CDF that customers just can't handle putting & inside their URL's.
We want to comply with XML standards, but we also want XML to be successful
in the marketplace. One area that we didn't compromise is with case
sensitivity. The new parser is fully case sensitive - but with a switch
that sets it back to case insensitive for those people that are reading XML
that was generated before case sensitivity was decided. You have to make
some tough compromizes sometimes.
> -----Original Message-----
> From: Simon St.Laurent [SMTP:SimonStL@classic.msn.com]
> Sent: Saturday, November 01, 1997 5:43 PM
> To: Xml-Dev (E-mail)
> Subject: > as end tag
>
> While looking over the release notes for the 31 October 97 version of the
> Java
> MSXML parser, I noticed that they've added a 'feature' that allows for
> 'Short
> end tags,' using >. This won't be too difficult to implement, perhaps,
> but
> it seems like an odd break with XML's (so far) rather strict rules for
> start
> and end tags, particularly 3.1 of the 7 August 97 Working Draft:
>
> >The end of every element may (for elements which are not empty, must) be
> marked by an end-tag containing >a name that echoes the element's type as
> given in the start-tag...
> >Well-Formedness Constraint - GI Match:
> >The Name in an element's end-tag must match that in the start-tag.
>
> Is this something new going on with the spec, or is it just Microsoft? It
>
> looks like they fixed a lot of the bugs, but this may introduce some new
> problems. (They also allow ampersands in PCDATA, as long as they're 'not
> followed by a valid name character.) It seems a little early for XML to
> begin
> fragmenting.
>
> Source: http://www.microsoft.com/standards/xml/xmlchgs.htm.
>
> Simon St.Laurent
>
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Mon Nov 3 01:29:39 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:47 2004
Subject: Unusual error with MSXML
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCA9@red-17-msg.dns.microsoft.com>
Try turning off the I/O optimization as follows:
c:\msxml> regsvr32 /u
classes\com\ms\xml\xmlstream\xmlurlstream\xmlurlstream.dll
> -----Original Message-----
> From: Ben Trafford [SMTP:btrafford@worldnet.att.net]
> Sent: Saturday, November 01, 1997 10:09 PM
> To: xml-dev@ic.ac.uk
> Subject: Unusual error with MSXML
>
> Hello!
>
> Has anybody tried out Microsoft's latest download of MSXML? I'm
> finding
> that parsing the HTML 4.0 DTD causes it to crash out. Here's the error:
>
> JVIEW caused an invalid page fault in module MSJAVA.DLL at
> 014f:7c009445.
>
> Anybody else seeing this error? Any ideas why? For info's sake, the
> version of MSJAVA.DLL I'm using is:
>
> 4.79.1518 of May 5th of 1997.
>
> --->Ben Trafford
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Mon Nov 3 01:38:34 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:47 2004
Subject: MSXML comments
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCAA@red-17-msg.dns.microsoft.com>
> I played a little today with MSXML and have a couple of suggestions:
> - MSXML wrongly rejects this
>
>
>
> ]>
>
> It appears to require notations to be declared before use in entity
> declarations rather than just declared in the DTD. The XML spec could
> probably be clearer here, but this definitely is not desirable: you
> often need to declare external entities in the DTD subset that use
> notations declared in the DTD. It's also incompatible with SGML.
Yes, order independence of DTD's is not yet implemented. We're still trying
to figure out how to implement this without slowing down the parser ?!
> - It appears to be impossible to prevent MSXML performing certain
> validity checks. Worse, MSXML appears to apply Draconian error handling
> to validity errors not just to well-formedness errors. This makes it
> impossible to parse some well-formed XML documents. For example:
>
>
> ]>
>
> I would suggest that applications should be able to control whether
> validation is performed. I would also suggest that validity errors not
> be handled as fatal errors using exceptions; instead, the parser should
> continue processing in the presence of validity errors, and should make
> information about validity errors available in the object model.
Yes, I agree. Perhaps we should add a handleValidityError method to the
ElementFactory, and if the subclass returns true we continue on parsing or
something. You can turn off DTD processing all together using RMD="NONE"
> - It requires the version attribute in the XML declaration to be in
> upper-case. The draft still has "version" in lower case.
What's the story here anyway ? We added support for case sensitivity, and
decided to make all XML reserved keywords upper case for consistency.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Jon.Bosak at eng.Sun.COM Mon Nov 3 01:54:13 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun 7 16:58:47 2004
Subject: > as end tag
In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-msg.dns.microsoft.com> (message from Chris Lovett on Sun, 2 Nov 1997 17:27:01 -0800)
Message-ID: <199711030153.RAA09117@boethius.eng.sun.com>
[Chris Lovett:]
| This is totally optional and experimental. The only rational is that
| for large documents or documents with long tag names, this saves a lot
| of bytes.
Tests have shown that this difference disappears under compression.
| Think of it as a kind of compression technique that would
| only be enabled when both ends of the pipe can handle it.
Empty end tags are a well formedness error, and the behavior of a
conforming XML processor upon encountering such an error is to stop
parsing.
The prohibition on empty end tags was adopted specifically to enable
users to perform a large class of maintenance operations on XML
documents without having to buy commercial software. I'm having a
very difficult time seeing this as anything but a blatant attempt to
subvert the standard by implementing a nonstandard feature in a widely
disseminated parser. Please help me to understand this differently.
Jon
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Mon Nov 3 05:24:17 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:58:47 2004
Subject: > as end tag
Message-ID: <3.0.32.19971102212301.00b323a8@pop.intergate.bc.ca>
t 05:53 PM 02/11/97 -0800, Jon Bosak wrote:
>| Think of it as a kind of compression technique that would
>| only be enabled when both ends of the pipe can handle it.
>Empty end tags are a well formedness error, and the behavior of a
>conforming XML processor upon encountering such an error is to stop
>parsing.
Seconded. I am flabbergasted. In November 1997, we should be forgiving
about well-intentioned parsers missing details of compliance with a spec
which we keep changing, but this apparently-deliberate step out of bounds
is incomprehensible; let us assume that it is a transient error which
will soon be rectified. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Mon Nov 3 08:50:50 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:58:47 2004
Subject: > as end tag
In-Reply-To:
Message-ID: <3.0.1.16.19971103094614.295f36e0@pop3.demon.co.uk>
At 01:43 02/11/97 UT, Simon St.Laurent wrote:
>While looking over the release notes for the 31 October 97 version of the
Java
>MSXML parser, I noticed that they've added a 'feature' that allows for
'Short
>end tags,' using >. This won't be too difficult to implement, perhaps,
but
>it seems like an odd break with XML's (so far) rather strict rules for start
>and end tags, particularly 3.1 of the 7 August 97 Working Draft:
>
>>The end of every element may (for elements which are not empty, must) be
>marked by an end-tag containing >a name that echoes the element's type as
>given in the start-tag...
>>Well-Formedness Constraint - GI Match:
>>The Name in an element's end-tag must match that in the start-tag.
>
>Is this something new going on with the spec, or is it just Microsoft? It
>looks like they fixed a lot of the bugs, but this may introduce some new
>problems. (They also allow ampersands in PCDATA, as long as they're 'not
>followed by a valid name character.) It seems a little early for XML to
begin
>fragmenting.
I would urge readers of this list to adhere to the specs absolutely. [I
pass no comments on the motivation for msxml supporting > or &.] XML can
ONLY be implemented if everyone adheres totally to the specs. I believe
this list shares the view that both data and software can be modularised so
that different parts of the effort can be investigated by different people.
For example I intend to rely completely on parsers (e.g. Lark, NXP) to
provide the parsing part of JUMBO at present. In similar vein I am
developing JUMBO with the clear motivation that it tracks everything in the
spec and implements it as far as possible. [I hope to release a new JUMBO
in a few days - I want to see how it sits on top of Lark first.]
Even when everyone agrees to implement the spec it is not easy. Any spec
has ambiguities and special cases. There are also genuine differences of
opinion about procedure in areas where the spec makes no comment. For
example there is an uncertainty about when a document can be validated -
can the author or the document assert that validation should/not take place?
It is going to be EXTREMELY important that documents circulating in the
devlopers' community adhere to the specs as closely as possible. We already
have challenges from capitalisation (we are case-sensitive now, but await a
final resolution on the exact case of some names and a possible policy more
generally.) I know that when a policy is announced I am going to have to
transform many of my current prototype documents - that's the price of
being a developer :-) But I do not intend to change any to deal with
software that *knowingly* does not conform to the spec, nor do I intend to
test my software on any documents that knowingly do not conform to the
spec. There is quite enough problem with ones (including mine:-) that do it
unknowingly :-)
P.
>
>Source: http://www.microsoft.com/standards/xml/xmlchgs.htm.
>
>Simon St.Laurent
>
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at light.demon.co.uk Mon Nov 3 09:21:12 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun 7 16:58:47 2004
Subject: Unescaped '&'
In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-msg.dns.microsoft.com>
Message-ID:
In message <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-
msg.dns.microsoft.com>, Chris Lovett writes
>As for the ampersands, this is a real problem. We found with our experience
>with CDF that customers just can't handle putting & inside their URL's.
I don't follow the logic here. My understanding is that spaces within
URLs have to be escaped (hence most of the changes, for XPointers, to
the TEI Extended Pointer spec). So if '&' has to be followed by a space
in order to be unescaped, but that space itself has to be escaped - what
exactly are you gaining?
Richard.
Richard Light
SGML/XML and Museum Information Consultancy
richard@light.demon.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From neil at bradley.co.uk Mon Nov 3 09:46:53 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun 7 16:58:47 2004
Subject: more link questions
Message-ID: <199711030946.JAA19093@andromeda.ndirect.co.uk>
I am wondering why it is stated that a STEPS value of 2 is
appropriate when using a hub document that contains all the extended
links. Surely, STEPS=1 is all that is required, as it should not be
necessary to process the other documents in the collection to look
for extended links.
When using a URL to locate a document on the local system, should the
protocol be 'file://', or is this a default if no protocol is given?
This question applies to entities as well as links, of course.
-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Mon Nov 3 16:12:58 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:58:47 2004
Subject: Unusual error with MSXML
In-Reply-To: <345C18F0.DFF9C0DD@worldnet.att.net> (message from Ben Trafford
on Sat, 01 Nov 1997 23:08:48 -0700)
Message-ID: <199711031616.LAA13873@geode.ora.com>
[Ben Trafford]
> Has anybody tried out Microsoft's latest download of MSXML?
> I'm finding that parsing the HTML 4.0 DTD causes it to crash
> out. Here's the error:
Why are you parsing the HTML 4.0 DTD with an XML parser? It's not
XML; it uses AND groups and exclusions.
True, MSXML should probably fail more gracefully on non-XML data, but
hey - it's beta.
-Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From aray at q2.net Mon Nov 3 17:25:16 1997
From: aray at q2.net (Arjun Ray)
Date: Mon Jun 7 16:58:47 2004
Subject: > as end tag
In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-msg.dns.microsoft.com>
Message-ID:
On Sun, 2 Nov 1997, Chris Lovett wrote:
> This is totally optional and experimental. The only rational is that for
> large documents or documents with long tag names, this saves a lot of bytes.
Sorry, this rationale (among others) was discussed to death in Sept 96 on
the old XML-WG list and found inadequate. Please review the archives
() for
anything we might have missed. The good arguments for empty end-tags have
nothing to do with byte economy, but they involve other design issues that
impinge in a non-trivial way on SGML's minimization rules -- the upshot is
that empty end-tags as an *isolated* option (i.e. just an option per se)
is a very bad idea for XML. I say this even though I was one of those
arguing for empty end-tags back then.
Please reconsider.
Arjun
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Mon Nov 3 18:44:43 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:47 2004
Subject: > as end tag
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCB9@red-17-msg.dns.microsoft.com>
Enough already !!
I can tell that none of you actually tried the latest MSXML parser. To even
get short end tags the programmer has to explicitly turn on the option as
follows:
Document d = new Document();
d.load("http://www.somewhere.com/somexml.xml");
OutputStream o = new FileOutputStream("test.xml");
XMLOutputStream out = d.createOutputStream(o);
o.setShortEndTags(true);
d.save(out);
In other words, it is a completely experimental feature that is thoroughly
buried in the API and the naive user won't even know it exists. The only
reason it is there is because of the very fact that there was a lot of
discussion about short end tags in the first place. So I decided to play
with the idea and quite frankly I thought it was kind of cool that XML was
so simple that end tags were redundant. I think this further emphasizes the
simplicity of XML.
As for blatant attempts at subversion, I'm just a country boy from
Australia, I don't get involved in that sort of thing :-) So, enough
politics. I'm more interested in constructive feedback from people you have
actually played with the new parser....
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Jon.Bosak at eng.Sun.COM Mon Nov 3 19:12:47 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun 7 16:58:47 2004
Subject: > as end tag
In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCB9@red-17-msg.dns.microsoft.com> (message from Chris Lovett on Mon, 3 Nov 1997 10:43:57 -0800)
Message-ID: <199711031911.LAA09574@boethius.eng.sun.com>
[Chris Lovett:]
| As for blatant attempts at subversion, I'm just a country boy from
| Australia, I don't get involved in that sort of thing :-) So, enough
| politics. I'm more interested in constructive feedback from people
| you have actually played with the new parser....
Thank you so much for providing the alternative interpretation that I
asked for. I really didn't want to see the inclusion of a nonstandard
extension in your parser as an attempt to make changes to the standard
by going outside of the process. I'm very glad to hear that that is
not what you had in mind.
So, now that you're aware that the effect of leaving the code as an
option in the parser will be to encourage nonstandard implementations,
when do you intend to remove it?
Jon
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Mon Nov 3 19:36:01 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:47 2004
Subject: Unusual error with MSXML
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCBA@red-17-msg.dns.microsoft.com>
I notice you are using a JDK 1.0.2 version of MSJAVA.DLL. I've
reproduced the problem, it crashes on the line Document d = new
Document() and so I'm guessing this has something to do with JavaBeans,
since there is a DocumentBeanInfo. I'll trying building another version
that works with JDK 1.0.2. In the meantime you should be able to get it
to work using Microsoft's Java SDK 2.0.
> -----Original Message-----
> From: Ben Trafford [SMTP:btrafford@worldnet.att.net]
> Sent: Saturday, November 01, 1997 10:09 PM
> To: xml-dev@ic.ac.uk
> Subject: Unusual error with MSXML
>
> Hello!
>
> Has anybody tried out Microsoft's latest download of MSXML? I'm
> finding
> that parsing the HTML 4.0 DTD causes it to crash out. Here's the
> error:
>
> JVIEW caused an invalid page fault in module MSJAVA.DLL at
> 014f:7c009445.
>
> Anybody else seeing this error? Any ideas why? For info's sake,
> the
> version of MSJAVA.DLL I'm using is:
>
> 4.79.1518 of May 5th of 1997.
>
> --->Ben Trafford
>
> xml-dev: A list for W3C XML Developers. To post,
> mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Ingo.Macherius at TU-Clausthal.de Mon Nov 3 20:18:35 1997
From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius)
Date: Mon Jun 7 16:58:47 2004
Subject: > as end tag
In-Reply-To: <41135C785691CF11B73B00805FD4D2D703E4FCB9@red-17-msg.dns.microsoft.com>
Message-ID: <199711032018.VAA20687@sinfonix.rz.tu-clausthal.de>
> From: Chris Lovett
> To: xml-dev@ic.ac.uk
> Subject: RE: > as end tag
> Date: Mon, 3 Nov 1997 10:43:57 -0800
Chris Lovett said:
> I'm more interested in constructive feedback from people
> you have actually played with the new parser....
I spent half of the day playing with msxml, which means I tried to
get it running on Linux with Java SDK 1.1.3. Here some points I
found:
1) Case folding
The filenames in the msxml.tar.gz are all folded to lowercase. This
might not matter with DOS filesystems, Unix is a bit harsher here.
One has to rebuild all *.class files.
2) Makefile
Rebuilding the *.class files from source is not easy (haven't managed
yet) because of various cross-dependencies. Is it possible for you to
provide a Unix style Makefile, or give me a pointer to docs on how to
read your MS specific Makefile ?
3) Missing imports
These imports are hidden in the source. Surely I could copy them from
IE4's Java files, but I'd be nice if msxml was self-contained.
import com.ms.com.*;
import com.ms.com.IUnknown;
import com.ms.com.Variant;
import com.ms.osp.*;
import netscape.javascript.JSObject;
Has anyone successfully run msxml on non-windows platforms ? Is this
at least intended ?
++im
--
Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From agreene at bitstream.com Mon Nov 3 20:27:01 1997
From: agreene at bitstream.com (Andrew Greene)
Date: Mon Jun 7 16:58:48 2004
Subject: How best to represent unrepresentable characters in NAME tokens?
Message-ID: <19971103195249.AAA18429@AGREENE-PC.bitstream.com>
If you have a Unicode-friendly XML environment, then users can create
elements whose GIs or attribute names contain "interesting"
characters. (Yes? A NAME token can contain "BaseChars", which includes
characters beyond ASCII and even beyond Latin-1.)
So, if a user requests that the document instance be saved as an ASCII
file, what is the best way for a Unicode-aware and standards-compliant
application to represent these characters? It's not legal to say
and the user may already have an element type called "Strasse" so it
would be inappropriate to "reduce" it. [I chose this example because
it is easy to describe in email; the problem is much more difficult
if, instead of German, the user has used Cyrillic or Hebrew NAMEs.]
I've thought of three solutions:
1. It's an error. Tell the user "Sorry, your file could not be saved
in that character encoding because the element name 'StraBe' could
not be represented.
Advantages: It's fully compliant and no data can get lost.
Disadvantages: No data can get out, either. Perhaps the user has
an 8-bit app to massage the data in a particular way, and she
doesn't want to rename all her elements.
2. Rename all the offending elements and attributes, and use PIs to
ensure that when they're read back in we can patch things up.
So, for example, the file could contain:
foo bar
Advantages: It's fully compliant.
Disadvantages: It assumes that all other processing applications
will be nice and won't lose my processing instructions, and it
makes the file hard to read. It's also non-portable; unless we
as a community decide on a "semi-standard" PI to use, no one else
will know how to interpret this convention. (On the other hand,
this is exactly why I'm bringing the issue up here. Maybe we can
all agree on a semi-standard and I'll feel less uneasy about
doing something like this....)
3. Violate the standard and use character entities to represent the
ineffable, for example:
foo bar
Advantages: It's compact and unambiguous (even if it's illegal :-).
Disadvantages: It violates both XML and 8879 in a new and perverse
way. The user's file will not be usable by any other piece of
standards-compliant software. That's worse than refusing to write
the file at all (number 1).
My questions to the assembled multitudes are:
* Is there a need for a "semi-standard" solution to this problem, or am
I the only one struggling with it?
* Is there interest in adopting some variation of number 2 so that we're
better able to exchange such data?
* I can't help but think that number 3 would be the most elegant solution
if it were only legal. Yet I'm also sure that the XML committee had a
good reason for disallowing it. I'd be interested in hearing what their
reason was, so that I may become enlightened. :-)
Thanks for your thoughts,
Andrew Greene
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Tue Nov 4 00:37:09 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:58:48 2004
Subject: > as end tag
Message-ID:
>In other words, it is a completely experimental feature that is thoroughly
>buried in the API and the naive user won't even know it exists.
It is deeply buried in the API, yes, but it was shown vividly in the site
demonstrating XML with IE 4.0, where it's presented as perfectly ordinary XML
and parsed as such. For instance, the DSO example:
Number, the Language of Science>
Danzig>
5.95>
3>
>
192817265>
etc.... (from http://www.microsoft.com/standards/xml/ - XML Parser, Samples,
DSO example.)
(I just revisited the site, and the files all still seem to be there, though
my IE 4.0 browser is exploding with JavaScript errors that I hope are a sign
that the site is under construction and coming down.)
>As for blatant attempts at subversion, I'm just a country boy from
>Australia, I don't get involved in that sort of thing :-) So, enough
>politics. I'm more interested in constructive feedback from people you have
>actually played with the new parser....
On Sunday, I ran a lot of my files that had demonstrated freaky parsing
behavior in the past - files that parsed as valid when they were explosively
wrong (some referred to the wrong DTD, for instance), files that used
parameter entities, and files that wouldn't parse at all. They all seem to
work properly now - so at least the bugs I had found are now dead. (I only
used jview - I'll run it through Sun's JDK and see what happens when I have a
chance.)
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From btrafford at worldnet.att.net Tue Nov 4 00:37:45 1997
From: btrafford at worldnet.att.net (Ben Trafford)
Date: Mon Jun 7 16:58:48 2004
Subject: Unusual error with MSXML
References: <199711031616.LAA13873@geode.ora.com>
Message-ID: <345E6F88.6C4D8B18@worldnet.att.net>
Chris Maden wrote:
>
> [Ben Trafford]
> > Has anybody tried out Microsoft's latest download of MSXML?
> > I'm finding that parsing the HTML 4.0 DTD causes it to crash
> > out. Here's the error:
>
> Why are you parsing the HTML 4.0 DTD with an XML parser? It's not
> XML; it uses AND groups and exclusions.
>
> True, MSXML should probably fail more gracefully on non-XML data, but
> hey - it's beta.
Well, I often use parsers to find errors in DTDs. Since XML is
nominally SGML compatible, an XML parser should find the errors in an
SGML DTD (even if just to say that it's got a bunch of stuff XML doesn't
recognize). What I was hoping to do was to parse the DTD, read the
errors, then figure out what I need to change in the HTML DTD to make it
XML-compliant. I've already made a number of changes according to the
revised note on the differences between XML and SGML.
As I'm currently working with other people's SGML in my professional
life, I've found it very useful to parse their DTDs with James Clark's
NSGMLS, and to correct their DTDs from that. I'd hoped to do the same
thing with MSXML.
In addition, not everyone's copy of MSXML crashes on this DTD; I've
been working with Simon St.Laurent on the problem, and his MSXML parsed
without crashing, using an older version of MSJAVA.DLL.
Oh, and just in case Chris Lovett is reading this message, thanks for
your initial advice, Chris, but it doesn't appear to have had an impact.
I get the same error. Do you have any other suggestions?
--->Ben Trafford
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From btrafford at worldnet.att.net Tue Nov 4 00:44:36 1997
From: btrafford at worldnet.att.net (Ben Trafford)
Date: Mon Jun 7 16:58:48 2004
Subject: Unusual error with MSXML
References: <41135C785691CF11B73B00805FD4D2D703E4FCBA@red-17-msg.dns.microsoft.com>
Message-ID: <345E7134.3E8606EE@worldnet.att.net>
Chris Lovett wrote:
>
> I notice you are using a JDK 1.0.2 version of MSJAVA.DLL. I've
> reproduced the problem, it crashes on the line Document d = new
> Document() and so I'm guessing this has something to do with JavaBeans,
> since there is a DocumentBeanInfo. I'll trying building another version
> that works with JDK 1.0.2. In the meantime you should be able to get it
> to work using Microsoft's Java SDK 2.0.
Please ignore my previous plea for advice in a letter directed to Chris
Maden. I'll download the 2.0 SDK tonight and give 'er a whirl. Is there
any public documentation on the error messages that MSXML gives out?
Simon St.Laurent sent me some the other night, and they were a little. .
.confusing.
Here's the copy of the error message he sent me:
Error: null(24,9)
Context: -
com.ms.xml.ParseException: Expected: Doctype
at com.ms.xml.Parser.error(Parser.java:110)
at com.ms.xml.Parser.parseToken(Parser.java:583)
at com.ms.xml.Parser.parseKeyword(Parser.java:599)
at com.ms.xml.Parser.tryDocTypeDecl(Parser.java:748)
at com.ms.xml.Parser.parseProlog(Parser.java:676)
at com.ms.xml.Parser.parseDocument(Parser.java:642)
at com.ms.xml.Parser.parse(Parser.java:58)
at com.ms.xml.Document.load(Document.java:183)
at msxml.main(msxml.java:48)
Can you explain to me what that error represents, in terms of what each
field means? It's obviously sorted in a rational fashion, but I'm not
very good at guesswork. Any help you could proffer would be very much
appreciated, even if it's just pointing me to some more narrative-style
documentation (if any exists).
--->Ben Trafford
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Tue Nov 4 01:12:21 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:48 2004
Subject: > as end tag
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCCC@red-17-msg.dns.microsoft.com>
> 1) Case folding
>
> The filenames in the msxml.tar.gz are all folded to lowercase. This
> might not matter with DOS filesystems, Unix is a bit harsher here.
> One has to rebuild all *.class files.
I have a new tar file going out at 6pm. Turns out I had the wrong
environment variables set in my C-Shell for Windows.
> 2) Makefile
>
> Rebuilding the *.class files from source is not easy (haven't managed
> yet) because of various cross-dependencies. Is it possible for you to
> provide a Unix style Makefile, or give me a pointer to docs on how to
> read your MS specific Makefile ?
Any volunteers ?
> 3) Missing imports
>
> These imports are hidden in the source. Surely I could copy them from
> IE4's Java files, but I'd be nice if msxml was self-contained.
>
> import com.ms.com.*;
> import com.ms.com.IUnknown;
> import com.ms.com.Variant;
> import com.ms.osp.*;
> import netscape.javascript.JSObject;
All these go away if you remove XMLDSO.java I believe. You shouldn't even
try and build this file on other platforms anyway since it designed to only
work with the the Data Binding features of IE 4.0.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Tue Nov 4 01:22:58 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:48 2004
Subject: Unusual error with MSXML
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCCF@red-17-msg.dns.microsoft.com>
> Is there
> any public documentation on the error messages that MSXML gives out?
> Simon St.Laurent sent me some the other night, and they were a little. .
> .confusing.
>
> Here's the copy of the error message he sent me:
>
> Error: null(24,9)
> Context: -
> com.ms.xml.ParseException: Expected: Doctype
> at com.ms.xml.Parser.error(Parser.java:110)
> at com.ms.xml.Parser.parseToken(Parser.java:583)
> at com.ms.xml.Parser.parseKeyword(Parser.java:599)
> at com.ms.xml.Parser.tryDocTypeDecl(Parser.java:748)
> at com.ms.xml.Parser.parseProlog(Parser.java:676)
> at com.ms.xml.Parser.parseDocument(Parser.java:642)
> at com.ms.xml.Parser.parse(Parser.java:58)
> at com.ms.xml.Document.load(Document.java:183)
> at msxml.main(msxml.java:48)
>
> Can you explain to me what that error represents, in terms of what
> each
> field means? It's obviously sorted in a rational fashion, but I'm not
> very good at guesswork. Any help you could proffer would be very much
> appreciated, even if it's just pointing me to some more narrative-style
> documentation (if any exists).
>
[Chris Lovett] This is the old error message format are you sure it
installed ok ? The new msxml.java doesn't print out the exception stack any
more.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Tue Nov 4 01:23:35 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:48 2004
Subject: > as end tag
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCD0@red-17-msg.dns.microsoft.com>
> >In other words, it is a completely experimental feature that is
thoroughly
> >buried in the API and the naive user won't even know it exists.
>
> It is deeply buried in the API, yes, but it was shown vividly in the site
> demonstrating XML with IE 4.0, where it's presented as perfectly ordinary
XML
> and parsed as such. For instance, the DSO example:
>
>
>
> Number, the Language of Science>
> Danzig>
> 5.95>
> 3>
> >
> 192817265>
> etc.... (from http://www.microsoft.com/standards/xml/ - XML Parser,
Samples,
> DSO example.)
Woops - this as an oversight. Turns out this XML is generated dynamically
via JavaScript and I forgot to update this script. This will be fixed
tonight.
> >As for blatant attempts at subversion, I'm just a country boy from
> >Australia, I don't get involved in that sort of thing :-) So, enough
> >politics. I'm more interested in constructive feedback from people you
have
> >actually played with the new parser....
>
> On Sunday, I ran a lot of my files that had demonstrated freaky parsing
> behavior in the past - files that parsed as valid when they were
explosively
> wrong (some referred to the wrong DTD, for instance), files that used
> parameter entities, and files that wouldn't parse at all. They all seem
to
> work properly now - so at least the bugs I had found are now dead. (I
only
> used jview - I'll run it through Sun's JDK and see what happens when I
have a
> chance.)
Music to my ears. Finally some good news :-)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Tue Nov 4 01:54:55 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:48 2004
Subject: Unusual error with MSXML
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FCC6@red-17-msg.dns.microsoft.com>
I was able to fix the error on my machine by removing all references to
java.io.Serializable. I will be posting a fixed version soon.
-----Original Message-----
From: Ben Trafford [SMTP:btrafford@worldnet.att.net]
Sent: Monday, November 03, 1997 4:43 PM
To: xml-dev@ic.ac.uk
Subject: Re: Unusual error with MSXML
Chris Maden wrote:
>
> [Ben Trafford]
> > Has anybody tried out Microsoft's latest download of
MSXML?
> > I'm finding that parsing the HTML 4.0 DTD causes it to crash
> > out. Here's the error:
>
> Why are you parsing the HTML 4.0 DTD with an XML parser? It's
not
> XML; it uses AND groups and exclusions.
>
> True, MSXML should probably fail more gracefully on non-XML
data, but
> hey - it's beta.
Well, I often use parsers to find errors in DTDs. Since
XML is
nominally SGML compatible, an XML parser should find the errors
in an
SGML DTD (even if just to say that it's got a bunch of stuff XML
doesn't
recognize). What I was hoping to do was to parse the DTD, read
the
errors, then figure out what I need to change in the HTML DTD to
make it
XML-compliant. I've already made a number of changes according
to the
revised note on the differences between XML and SGML.
As I'm currently working with other people's SGML in my
professional
life, I've found it very useful to parse their DTDs with James
Clark's
NSGMLS, and to correct their DTDs from that. I'd hoped to do the
same
thing with MSXML.
In addition, not everyone's copy of MSXML crashes on
this DTD; I've
been working with Simon St.Laurent on the problem, and his MSXML
parsed
without crashing, using an older version of MSJAVA.DLL.
Oh, and just in case Chris Lovett is reading this
message, thanks for
your initial advice, Chris, but it doesn't appear to have had an
impact.
I get the same error. Do you have any other suggestions?
--->Ben Trafford
xml-dev: A list for W3C XML Developers. To post,
mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following
message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the
following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Tue Nov 4 02:44:01 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:58:48 2004
Subject: Unusual error with MSXML
Message-ID:
My apologies to all, especially Ben; it looks like I dropped into the wrong
directory when ran MSXML on Ben's DTD - there are two copies floating on each
of the hard drives of two machines, one with IE 4 and one with IE 3. I now
get the same weird Java errors he did running the correct combination of the
new version of MSXML with the old version of jview.
I used the viewer applet included in the MSXML package under IE 4 to test a
number of files. When there are errors, the viewer still brings up quite a
list of errors that look a lot like the list from the previous version, but
the errors seem accurate, a significant improvement on version 1.0. The
viewer is much handier than the control line was.
A preliminary run-through of the new version using Sun's JDK 1.1.3 produced:
java.lang.NoClassDefFoundError: com/ms/xml/om/Document
at msxml.main(msxml.java:28)
No idea why, yet.
Simon St.Laurent
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Tue Nov 4 04:10:20 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:58:48 2004
Subject: > as end tag
Message-ID: <01bce8ce$45bad370$0100007f@localhost>
>> 3) Missing imports
>>
>> These imports are hidden in the source. Surely I could copy them from
>> IE4's Java files, but I'd be nice if msxml was self-contained.
>>
>> import com.ms.com.*;
>> import com.ms.com.IUnknown;
>> import com.ms.com.Variant;
>> import com.ms.osp.*;
>> import netscape.javascript.JSObject;
>
>All these go away if you remove XMLDSO.java I believe. You shouldn't even
>try and build this file on other platforms anyway since it designed to only
>work with the the Data Binding features of IE 4.0.
I am not sure where JSObject is imported from but only problem I had running
MSXML under JDK 1.1.4 was with XMLInputStream's use of XMLStream (and
IXMLStream) which has native methods. I made minor change to XMLInputStream
and it now runs wonderfully under JDK. I could post my changes if someone
wants it and if MS approves.
Don Park
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From btrafford at worldnet.att.net Tue Nov 4 04:34:40 1997
From: btrafford at worldnet.att.net (Ben Trafford)
Date: Mon Jun 7 16:58:48 2004
Subject: Unusual error with MSXML
References: <41135C785691CF11B73B00805FD4D2D703E4FCCF@red-17-msg.dns.microsoft.com>
Message-ID: <345EA735.9CFF15B0@worldnet.att.net>
Chris Lovett wrote:
> > very good at guesswork. Any help you could proffer would be very much
> > appreciated, even if it's just pointing me to some more narrative-style
> > documentation (if any exists).
> >
> [Chris Lovett] This is the old error message format are you sure it
> installed ok ? The new msxml.java doesn't print out the exception stack any
> more.
> of the hard drives of two machines, one with IE 4 and one with IE 3. I now
> get the same weird Java errors he did running the correct combination of the
> new version of MSXML with the old version of jview.
So, let's see if I get this straight:
Simon has experienced the same crashes as myself, which Chris hopes to
fix with his latest build. Now, the error message I got from Simon was
from an old version of MSXML?
Regardless, is there more documentation on the error messages MSXML is
giving?
--->Ben Trafford
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Tue Nov 4 12:12:03 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:58:48 2004
Subject: XML processing experiments
Message-ID: <345F0F2E.C3FF9AD7@jclark.com>
One nice feature of XML is that it is easily processable by the
Desperate C/C++/Java/Perl hacker: the syntax is simple enough that you
can do useful things with XML without a full XML parser. I've been
exploring this sort processing. If all you want to do is be able to
correctly parse well-formed XML, and you don't care about detecting
whether or not it is well-formed, how much code does it take and is it
significantly faster than using an XML parser that does proper
well-formedness or validity checking?
I used Jon's Old Testament XML file as test data (after removing the
doctype line), which is about 3.7Mb. I ran the tests on a Toshiba Tecra
720CDT (133MHz Pentium, 80Mb RAM) with Windows NT 4.0. I used the IE
4.0 Java VM. The timings I give are after a couple of runs, so there's
little or no disk I/O involved. Lark 0.97 parsed the file in about 10.5
seconds, MSXML in about 24 seconds. I suspect the difference is partly
because MSXML is building a tree (I didn't see any command line switch
to turn this off). By comparison nsgmlsu -s took about 8 seconds. I
also tried LT XML (which is written in C). I didn't find a program that
did nothing but parsing. The fastest one I found was the sgcount
program (which counts the number of each element type); it took about 11
seconds. That's much slower than I expected; I suspect there may be
some Windows-specific performance problems.
The code I wrote is available at
. First I wrote a little
library in C for doing XML "tokenization". This code just splits the
input up into "tokens" where each token is data or some kind of XML
markup (start-tag, end-tag, comment etc). The idea is that it does the
minimum necessary to do any kind of useful XML-aware processing. I wrote
a little application xmlec that just counts the number of elements in an
XML document. This can compiled either to use Win32 file mapping (if
FILEMAP is defined) or normal read() calls. You'll probably have to
tweak the code a little if you're using anything other than Visual C++.
I then translated this into Java (I'm not much of a Java programmer, so
there's probably plenty of scope for improvement in the Java version).
xmlec parses the test file in about 0.5 seconds. Using read() instead of
file mapping increases the time to about 0.65 seconds. The Java
version takes about 1.5 seconds.
I also wrote a Java version of the LT XML textonly program (which
extracts the non-markup of an XML document). The LT XML version ran in
about 13.5 seconds. My Java version ran in about 3.5 seconds.
The class files for the Java element counting program total about 6k.
The source for the C version is about 750 lines, including both the file
mapping and read()ing version.
I was quite surprised that there was such a big performance difference
between real, conforming XML processing that does well-formedness
checking, and quick and dirty XML processing that does the minimum
necessary to get the correct result. This doesn't seem right to me...
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Tue Nov 4 12:31:17 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:58:48 2004
Subject: How best to represent unrepresentable characters in NAME tokens?
References: <19971103195249.AAA18429@AGREENE-PC.bitstream.com>
Message-ID: <345E9AA0.96BDDE55@jclark.com>
Andrew Greene wrote:
>
> If you have a Unicode-friendly XML environment, then users can create
> elements whose GIs or attribute names contain "interesting"
> characters. (Yes? A NAME token can contain "BaseChars", which includes
> characters beyond ASCII and even beyond Latin-1.)
>
> So, if a user requests that the document instance be saved as an ASCII
> file, what is the best way for a Unicode-aware and standards-compliant
> application to represent these characters?
I would use numeric character references wherever XML allows them; if
there are non-ASCII characters in places where numeric character
references aren't allowed I would use UTF-8 and give a warning to the
user. The ASCII characters will still be there as ASCII, and the
non-ASCII characters won't get lost, although they will look a bit funny
in an 8-bit editor. An interesting case is when there are non-ASCII
characters in places where numeric character references are not
recognized but do not cause an error (eg PIs, comments); one could have
an application convention that recognizes numeric character references
in these cases.
> 2. Rename all the offending elements and attributes, and use PIs to
> ensure that when they're read back in we can patch things up.
> So, for example, the file could contain:
>
>
> foo bar
>
> Advantages: It's fully compliant.
If I was going to do this sort of thing, I think I would use a variation
on URL % encoding. I would have a convention that underscore (say)
followed by 4 hex digits represented the Unicode character with that hex
code.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Nov 4 14:43:34 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:58:48 2004
Subject: How best to represent unrepresentable characters in NAME tokens?
Message-ID: <199711041438.BAA25777@jawa.chilli.net.au>
> From: Andrew Greene
> * Is there a need for a "semi-standard" solution to this problem, or am
> I the only one struggling with it?
>
> * Is there interest in adopting some variation of number 2 so that we're
> better able to exchange such data?
>
> * I can't help but think that number 3 would be the most elegant solution
> if it were only legal. Yet I'm also sure that the XML committee had a
> good reason for disallowing it. I'd be interested in hearing what their
> reason was, so that I may become enlightened. :-)
When I proposed the "native language markup" scheme (for the ERCS project of
the Standardization Project Regarding East Asian Documents of the China/Japan/
Korea Document Processing Group) which XML implements, we also developed
the idea of "lowest-common-denominator naming".
This means that you should only use characters in names which are available
in all the systems through which the document will pass. So, if you have
a requirement (known upfront) to save in ASCII, then you should use "ue" not
"u umlaut". The best solution is to not create one in the first place!
(For example, Japanese users should restrict themselves to only using
characters in Shift JIS for names, not in JIS 212 or the additional sets coming.)
I do not think there is any requirement for global interoperability of
DTDs: if there is, then some system of numeric character references in
names would be appropriate.
However, I can suggest a 4th approach that may be better than your three.
It is to provide a language or encoding specific fixed attribute, giving
the ASCII form of the GI for use in dumping. OF course, it requires a
minimum of smarts to convert to the new IDs. You might have an "also known
as" aka attribute (I'll use B instead of esszet):
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at cogsci.ed.ac.uk Tue Nov 4 15:46:09 1997
From: richard at cogsci.ed.ac.uk (Richard Tobin)
Date: Mon Jun 7 16:58:48 2004
Subject: XML processing experiments
Message-ID: <199711041546.PAA29474@stevenson.cogsci.ed.ac.uk>
>I also tried LT XML (which is written in C). I didn't find a program that
>did nothing but parsing. The fastest one I found was the sgcount
>program (which counts the number of each element type); it took about 11
>seconds. That's much slower than I expected; I suspect there may be
>some Windows-specific performance problems.
It's true that we do our development under unix, and I don't have any
benchmarks for MS Windows. I just ran "sgcount >I was quite surprised that there was such a big performance difference
> >between real, conforming XML processing that does well-formedness
> >checking, and quick and dirty XML processing that does the minimum
> >necessary to get the correct result. This doesn't seem right to me...
It isn't, and we're hoping to reduce it.
-- Richard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From istvanc at microsoft.com Tue Nov 4 16:57:42 1997
From: istvanc at microsoft.com (Istvan Cseri)
Date: Mon Jun 7 16:58:49 2004
Subject: XML processing experiments
Message-ID: <91B7E292027DCF1195CD08002BB690B00298B81E@red-93-msg.dns.microsoft.com>
I can offer a couple of reasons why a 'real' parser would be slower then
an ad-hoc processor:
- abstraction for encapsulating different encodings
- keeping track of line and column information for error reporting
- storing attributes and checking for uniqueness
- checking for valid element close tags
- processing entity references
In addition to this the MSXML parser is building the tree. We are going
to have a version where this can be turned off but when XML is used as
data it is extremely useful to have the tree around so you can actually
do different kinds of lookups, navigation on it and can update it.
Istvan
> ----------
> From: James Clark[SMTP:jjc@jclark.com]
> Reply To: James Clark
> Sent: Tuesday, November 04, 1997 4:03 AM
> To: XML Developers' List
> Subject: XML processing experiments
>
> One nice feature of XML is that it is easily processable by the
> Desperate C/C++/Java/Perl hacker: the syntax is simple enough that you
> can do useful things with XML without a full XML parser. I've been
> exploring this sort processing. If all you want to do is be able to
> correctly parse well-formed XML, and you don't care about detecting
> whether or not it is well-formed, how much code does it take and is it
> significantly faster than using an XML parser that does proper
> well-formedness or validity checking?
>
> I used Jon's Old Testament XML file as test data (after removing the
> doctype line), which is about 3.7Mb. I ran the tests on a Toshiba
> Tecra
> 720CDT (133MHz Pentium, 80Mb RAM) with Windows NT 4.0. I used the IE
> 4.0 Java VM. The timings I give are after a couple of runs, so there's
> little or no disk I/O involved. Lark 0.97 parsed the file in about
> 10.5
> seconds, MSXML in about 24 seconds. I suspect the difference is
> partly
> because MSXML is building a tree (I didn't see any command line switch
> to turn this off). By comparison nsgmlsu -s took about 8 seconds. I
> also tried LT XML (which is written in C). I didn't find a program
> that
> did nothing but parsing. The fastest one I found was the sgcount
> program (which counts the number of each element type); it took about
> 11
> seconds. That's much slower than I expected; I suspect there may be
> some Windows-specific performance problems.
>
> The code I wrote is available at
> . First I wrote a little
> library in C for doing XML "tokenization". This code just splits the
> input up into "tokens" where each token is data or some kind of XML
> markup (start-tag, end-tag, comment etc). The idea is that it does
> the
> minimum necessary to do any kind of useful XML-aware processing. I
> wrote
> a little application xmlec that just counts the number of elements in
> an
> XML document. This can compiled either to use Win32 file mapping (if
> FILEMAP is defined) or normal read() calls. You'll probably have to
> tweak the code a little if you're using anything other than Visual
> C++.
> I then translated this into Java (I'm not much of a Java programmer,
> so
> there's probably plenty of scope for improvement in the Java version).
>
> xmlec parses the test file in about 0.5 seconds. Using read() instead
> of
> file mapping increases the time to about 0.65 seconds. The Java
> version takes about 1.5 seconds.
>
> I also wrote a Java version of the LT XML textonly program (which
> extracts the non-markup of an XML document). The LT XML version ran
> in
> about 13.5 seconds. My Java version ran in about 3.5 seconds.
>
> The class files for the Java element counting program total about 6k.
> The source for the C version is about 750 lines, including both the
> file
> mapping and read()ing version.
>
> I was quite surprised that there was such a big performance difference
> between real, conforming XML processing that does well-formedness
> checking, and quick and dirty XML processing that does the minimum
> necessary to get the correct result. This doesn't seem right to me...
>
>
> James
>
> xml-dev: A list for W3C XML Developers. To post,
> mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Tue Nov 4 17:37:09 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:58:49 2004
Subject: How best to represent unrepresentable characters in NAME
tokens?
In-Reply-To: <19971103195249.AAA18429@AGREENE-PC.bitstream.com>
Message-ID:
At 2:52 PM -0500 11/3/97, Andrew Greene wrote:
>If you have a Unicode-friendly XML environment, then users can create
>elements whose GIs or attribute names contain "interesting"
>characters. (Yes? A NAME token can contain "BaseChars", which includes
>characters beyond ASCII and even beyond Latin-1.)
Sure can...
I'l give my solution at the end, but first, a few comments on the suggestions.
>So, if a user requests that the document instance be saved as an ASCII
>file, what is the best way for a Unicode-aware and standards-compliant
>application to represent these characters?
>I've thought of three solutions:
>
>1. It's an error. Tell the user "Sorry, your file could not be saved
> in that character encoding because the element name 'StraBe' could
> not be represented.
>
> Advantages: It's fully compliant and no data can get lost.
>
> Disadvantages: No data can get out, either. Perhaps the user has
> an 8-bit app to massage the data in a particular way, and she
> doesn't want to rename all her elements.
This works, but isn't needed.
>2. Rename all the offending elements and attributes, and use PIs to
> ensure that when they're read back in we can patch things up.
> So, for example, the file could contain:
>
>
> foo bar
>
> Advantages: It's fully compliant.
>
> Disadvantages: It assumes that all other processing applications
> will be nice and won't lose my processing instructions, and it
> makes the file hard to read. It's also non-portable; unless we
> as a community decide on a "semi-standard" PI to use, no one else
> will know how to interpret this convention. (On the other hand,
> this is exactly why I'm bringing the issue up here. Maybe we can
> all agree on a semi-standard and I'll feel less uneasy about
> doing something like this....)
This is actively evil, in that it obfuscates the markup, and makes it
impossible to validate against the original DTD. Validating against a DTD
at all requires a DTD translation tool to change element and attribute
names there as well. The use of PIs to affect the meaning of markup (as
opposed to enable additional application processing that can't be expressed
in markup) is generally a bad idea. In fact, most SGML experts concur that
PIs are best used in _exceptional_ cases. The reason for this is that
applications are allowed (and usually do) ignore any PIs that they are not
specialized for.
>
>3. Violate the standard and use character entities to represent the
> ineffable, for example:
>
> foo bar
>
> Advantages: It's compact and unambiguous (even if it's illegal :-).
>
> Disadvantages: It violates both XML and 8879 in a new and perverse
> way. The user's file will not be usable by any other piece of
> standards-compliant software. That's worse than refusing to write
> the file at all (number 1).
Yes, this is not good.
>* Is there a need for a "semi-standard" solution to this problem, or am
> I the only one struggling with it?
Yes, but it's already built into XML.
>* Is there interest in adopting some variation of number 2 so that we're
> better able to exchange such data?
Not from me...
>* I can't help but think that number 3 would be the most elegant solution
> if it were only legal. Yet I'm also sure that the XML committee had a
> good reason for disallowing it. I'd be interested in hearing what their
> reason was, so that I may become enlightened. :-)
Part of it is simply compatibility -- this cannot be done in SGML. The
argument about SGML compatibility is no worth rehashing here, the archive
of the working group discussions include many messages on it.
So now that I've objected to all three solutions, you may think I'm a
negative kind of guy... But I do have a suggestion.
Support for UTF-8 is required for XML processors, so that an "8-bit" tool
can always be fed something that it can understand, even though some
strings may look funny in some editors. Since XML parsers do _not_ perform
any kind of character format normalization (e.g. of diacritical marks) each
element name will be a constant string, even if that string is not readable.
[[ Note for anyone who may be puzzled: UTF-8 is a clever little encoding
trick that uses variable length character codes to represent the larger
space of Unicode (and 10646) codes in 8-bit chunks. Codes < 128 represent
USASCII, and codes above are concatenated together to represent large
values. The details (and sample code in C) can be found at
http://www.unicode.org/ So aplain ASCII file in UTF-8 looks the same, but
other characters show up as strings with leading chars >= 128. One detail
is that Latin-1 etc., are _not_ valid UTF-8 because they use the eighth-bit
high codes for single characters.]]
The core of your problime is the very good, and very real point: writers of
XML processors need to remember that the Unicode basis of XML is
fundamental -- so conversion to another character set may fail because the
characters in a document may simply not exist in the target code. Of
course, for many documents, the markup will allow transcoding to Latin-1
(and other local processing codes), but this does depend on the document.
Text can be modified to use numeric character references but this is
probably too horrible, especially for the asian ideographic scripts.
So, you can keep your 8-bit tools, but you may need UTF-8 display code to
make them maximally usable.
-- David
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Tue Nov 4 18:23:37 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:58:49 2004
Subject: Ampersand in URLs (was: RE: > as end tag)
In-Reply-To:
<41135C785691CF11B73B00805FD4D2D703E4FCA8@red-17-msg.dns.microsoft.com>
Message-ID:
At 8:27 PM -0500 11/2/97, Chris Lovett wrote:
>As for the ampersands, this is a real problem. We found with our experience
>with CDF that customers just can't handle putting & inside their URL's.
>We want to comply with XML standards, but we also want XML to be successful
>in the marketplace. One area that we didn't compromise is with case
>sensitivity. The new parser is fully case sensitive - but with a switch
>that sets it back to case insensitive for those people that are reading XML
>that was generated before case sensitivity was decided. You have to make
>some tough compromizes sometimes.
There was a query on the XML-SIG about HTML and the ampersand rule (XML
agrees with the HTML standard, but not all HTML implementations). I thought
that my answer fits well in this discussion as well.
Internet Explorer, ironically, already insists on the escaping of ampersand
in some circumstances. All that I've tested, actually, but I won't assume
that it follows the standard -- if they do, "some" should be changed to
"all". I am not sure about the story with whitespace, but in fact, if they
don't require & before space, it matters little to me, since space
isn't legal in a URL.
I don't see ampersand as a show stopper, especially once people realize how
useful entities can be in modularizing long URLs. And, as Paul notes, we
can fall back on the authoring tool argument. More important, since we have
"draconian" error handling in XML, simple testing of the document will
ensure that the error is detected (rather than the HTML case, where it
depends on the browser that you test with). One of the biggest problems
with HTML has been that that the standard and the implmentations differ(ed)
so widely and on so many points -- this is a primary reason that we should
be very careful to implement XML exactly. Consistent parsing will go a long
way to salve the wounds of slight differences from HTML. Divergent syntax
in any software that purports to be XML-compliant will cause real problems
from users, who may not be technical enough to read and understand the
specification to judge correctness of implementations.
We're sure to have bugs, but implementors we have a very real
responsibility to conform in every way that they can, regardless of what
design decisions they would rather have made differently. This truth is
what makes standards such an object of (seemingly pointless) passion --
because you have to take them as they _actually are_ if they are to have
the value that they promise (even when you feel that that value is
uncoscionably less than it could have been).
-- David
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From andrewl at microsoft.com Tue Nov 4 19:20:46 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun 7 16:58:49 2004
Subject: How best to represent unrepresentable characters in NAMEtoken
s?
Message-ID: <7BB61B44F197D011892800805FD4F79201CD64AF@red-03-msg.dns.microsoft.com>
I'm left unclear by this response. Suppose that I have a Java program with
an object class called "$Price" and I want to serialize this into XML.
Something such as the following is not legal XML:
<$Price>15.95<$Price>
What can I do? One thing I could do is to avoid such names when writing
Java. But suppose that isn't an option. I could do the following:
But, as you say, this "obfuscates the markup, and makes it impossible to
validate against the original DTD" (in the sense that the declaration for
the OBJECT element type would be almost meaningless).
What is the recommended solution?
--Andrew Layman
AndrewL@microsoft.com
> -----Original Message-----
> From: dgd@cs.bu.edu [SMTP:dgd@cs.bu.edu]
> Sent: Tuesday, November 04, 1997 8:44 AM
> To: xml-dev@ic.ac.uk
> Subject: Re: How best to represent unrepresentable characters in
> NAMEtokens?
>
> At 2:52 PM -0500 11/3/97, Andrew Greene wrote:
> >If you have a Unicode-friendly XML environment, then users can create
> >elements whose GIs or attribute names contain "interesting"
> >characters. (Yes? A NAME token can contain "BaseChars", which includes
> >characters beyond ASCII and even beyond Latin-1.)
>
> Sure can...
>
> I'l give my solution at the end, but first, a few comments on the
> suggestions.
>
> >So, if a user requests that the document instance be saved as an ASCII
> >file, what is the best way for a Unicode-aware and standards-compliant
> >application to represent these characters?
>
>
> >I've thought of three solutions:
> >
> >1. It's an error. Tell the user "Sorry, your file could not be saved
> > in that character encoding because the element name 'StraBe' could
> > not be represented.
> >
> > Advantages: It's fully compliant and no data can get lost.
> >
> > Disadvantages: No data can get out, either. Perhaps the user has
> > an 8-bit app to massage the data in a particular way, and she
> > doesn't want to rename all her elements.
>
> This works, but isn't needed.
>
> >2. Rename all the offending elements and attributes, and use PIs to
> > ensure that when they're read back in we can patch things up.
> > So, for example, the file could contain:
> >
> >
> > foo bar
> >
> > Advantages: It's fully compliant.
> >
> > Disadvantages: It assumes that all other processing applications
> > will be nice and won't lose my processing instructions, and it
> > makes the file hard to read. It's also non-portable; unless we
> > as a community decide on a "semi-standard" PI to use, no one else
> > will know how to interpret this convention. (On the other hand,
> > this is exactly why I'm bringing the issue up here. Maybe we can
> > all agree on a semi-standard and I'll feel less uneasy about
> > doing something like this....)
>
> This is actively evil, in that it obfuscates the markup, and makes it
> impossible to validate against the original DTD. Validating against a DTD
> at all requires a DTD translation tool to change element and attribute
> names there as well. The use of PIs to affect the meaning of markup (as
> opposed to enable additional application processing that can't be
> expressed
> in markup) is generally a bad idea. In fact, most SGML experts concur that
> PIs are best used in _exceptional_ cases. The reason for this is that
> applications are allowed (and usually do) ignore any PIs that they are not
> specialized for.
>
> >
> >3. Violate the standard and use character entities to represent the
> > ineffable, for example:
> >
> > foo bar
> >
> > Advantages: It's compact and unambiguous (even if it's illegal :-).
> >
> > Disadvantages: It violates both XML and 8879 in a new and perverse
> > way. The user's file will not be usable by any other piece of
> > standards-compliant software. That's worse than refusing to write
> > the file at all (number 1).
>
> Yes, this is not good.
>
> >* Is there a need for a "semi-standard" solution to this problem, or am
> > I the only one struggling with it?
>
> Yes, but it's already built into XML.
>
> >* Is there interest in adopting some variation of number 2 so that we're
> > better able to exchange such data?
>
> Not from me...
>
> >* I can't help but think that number 3 would be the most elegant solution
> > if it were only legal. Yet I'm also sure that the XML committee had a
> > good reason for disallowing it. I'd be interested in hearing what their
> > reason was, so that I may become enlightened. :-)
>
> Part of it is simply compatibility -- this cannot be done in SGML. The
> argument about SGML compatibility is no worth rehashing here, the archive
> of the working group discussions include many messages on it.
>
> So now that I've objected to all three solutions, you may think I'm a
> negative kind of guy... But I do have a suggestion.
>
> Support for UTF-8 is required for XML processors, so that an "8-bit" tool
> can always be fed something that it can understand, even though some
> strings may look funny in some editors. Since XML parsers do _not_ perform
> any kind of character format normalization (e.g. of diacritical marks)
> each
> element name will be a constant string, even if that string is not
> readable.
>
> [[ Note for anyone who may be puzzled: UTF-8 is a clever little encoding
> trick that uses variable length character codes to represent the larger
> space of Unicode (and 10646) codes in 8-bit chunks. Codes < 128 represent
> USASCII, and codes above are concatenated together to represent large
> values. The details (and sample code in C) can be found at
> http://www.unicode.org/ So aplain ASCII file in UTF-8 looks the same, but
> other characters show up as strings with leading chars >= 128. One detail
> is that Latin-1 etc., are _not_ valid UTF-8 because they use the
> eighth-bit
> high codes for single characters.]]
>
> The core of your problime is the very good, and very real point: writers
> of
> XML processors need to remember that the Unicode basis of XML is
> fundamental -- so conversion to another character set may fail because the
> characters in a document may simply not exist in the target code. Of
> course, for many documents, the markup will allow transcoding to Latin-1
> (and other local processing codes), but this does depend on the document.
> Text can be modified to use numeric character references but this is
> probably too horrible, especially for the asian ideographic scripts.
>
> So, you can keep your 8-bit tools, but you may need UTF-8 display code to
> make them maximally usable.
>
> -- David
>
> _________________________________________
> David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
> Boston University Computer Science \ Sr. Analyst
> http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
> --------------------------------------------\
> http://www.dynamicDiagrams.com/
> MAPA: mapping for the WWW \__________________________
>
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Nov 4 19:29:37 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:58:49 2004
Subject: Useless XML Statistics
Message-ID: <199711041929.OAA03197@unready.microstar.com>
Here are some stats from Alta Vista:
Number of web pages mentioning:
SGML and not XML.......................109,790
XML and not SGML.........................5,083
SGML and XML.............................8,409
Neither.............................77,726,900
By this measurement, full SGML is more than three times as popular as
Monty Python (40,546 pages) and slightly more popular than even the
Spice Girls (105,228 pages).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From scott at iguana.co.nz Tue Nov 4 23:43:22 1997
From: scott at iguana.co.nz (Scott Cooper)
Date: Mon Jun 7 16:58:49 2004
Subject: new msxml behaviour
Message-ID: <3.0.1.32.19971105124130.00a8d220@mail.iguana.co.nz>
the new msxml contains this code within getText of ElementImpl which
changes the behaviour of entity expansions from the previous version:
for (Enumeration en = children.elements(); en.hasMoreElements(); )
{
if (sb.length() > 0)
sb.append(' ');
sb.append(((Element)en.nextElement()).getText());
}
return sb.toString();
notice the appending of a space. is this appropriate? it means constructs
like 'abc&SOME.ENTITY;def' expand to 'abc SOME.ENTITY.CONTENTS def' rather
than 'abcSOME.ENTITY.CONTENTSdef' like it used to which really stuffs my
application.
the last time i tried to 'improve' msxml the damn thing proved incredibly
difficult to recompile due to some ridiculuous circular dependancies among
the files - god knows how ms compiled it in the first place - anyway i now
see this awful dll rubbish in there so before i attempt to make a makefile
(*please* supply one next time ms :) ) is this in fact a problem? or should
i change my approach.
i was also wondering whether defaults for attributes should appear to the
application if the attribute isn't explicitly given in the markup. right
now i've added a function to traverse the tree and insert all attribute
defaults (if needed) before i start processing the document - what do you
think of that?
the msxml api was awful for getting schema information such as default
values. now it has a 'toSchema' function which returns an element with
child elements for each attribute. the child element's tag is 'ATTRIBUTE'
and it contains attributes such as 'XML:ID' containing the attribute name
and 'XML:DEFAULT' containing the default, for instance. this is an
incredibly convoluted method for accessing such information - are there any
other xml parsers out there that attach schema information to the markup
element itself - like element.getAttribute("xyz").getDefaultValue() rather
than
document.getElementDecl("abc").getChild("xyz").getAttribute("XML:DEFAULT").
finally (i've been saving up questions) i'd like this construct to be
parsed as a element...
blah blah
]]>'>
...
&foo;
but instead, &foo; is processed as PCDATA (by msxml). is this correct
behaviour? section 4.4 of the xml ref contains the following: '6.For an
internal (text) entity, the processor must include the entity; that is,
retrieve its replacement text and process it as a part of the document
(i.e. as content or AttValue, whichever was being processed when the
reference was recognized), passing the result to the application in place
of the reference. The replacement text may contain both text and markup,
which must be recognized in the usual way...'
well i haven't received any messages from the list today (maybe you're all
in bed on US time) so how about chewing on that for me 'cos i must say it's
a pain to rewrite your code when you had to hack it in the first place to
work.
p.s. i'm using xml to define the syntax and byte data of a peer-to-peer
network interaction over pacnet and there aren't any PIs. if anyone would
like to check out what i've done i'd greatly appreciate any opinions.
---------------------------------------------------------------------
Iguana Information Services Ph +64 4 499 9782
PO Box 10 609 Fax +64 4 499 4439
Wellington Email scott@iguana.co.nz
New Zealand HTTP http://www.iguana.co.nz
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Ingo.Macherius at TU-Clausthal.de Wed Nov 5 04:16:08 1997
From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius)
Date: Mon Jun 7 16:58:49 2004
Subject: announce: slides for talk on xml
Message-ID: <199711050416.FAA07513@sinfonix.rz.tu-clausthal.de>
Hello,
there are PostScript and PowerPoint versions of a 23-slides talk on
XML available at
http://www.heim9.tu-clausthal.de/~inim/xml/dfn-bt-97/
I'm very interested in feedback (including spelling and translation
errors in the english version). The URL is not yet permanent. Enjoy.
++im
--
Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Wed Nov 5 08:35:32 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:58:49 2004
Subject: Entity processing (was new msxml behaviour)
Message-ID: <3.0.32.19971104214605.00b621d8@pop.intergate.bc.ca>
At 12:41 PM 05/11/97 +1300, Scott Cooper wrote:
>the new msxml contains this code ...
> if (sb.length() > 0)
> sb.append(' ');
>notice the appending of a space. is this appropriate?
Recent WG decisions require that parameter entity expansions (outside
of entitity values) should be forced to match an even number tokens
simply by appending & prepending spaces to their expansion; I
hypothesize that this is what the msxml code is doing.
Of course, this can't be done when building the replacement text
of an internal text entity. To aid in sorting this, out, I attach
a couple of my test files; credit is due to Henry Thomson, Michael
Sperberg-McQueen, and likely others for helping cook these up.
I *think* that the behavior of Lark 0.97 on these is per the spec.
But it's there's enough hair on this set of problems that there's
lots of ways I could be wrong. -Tim
-------------- next part --------------
' >
%xx;
An ampersand (&) may be escaped
numerically (&#38;) or with a general entity
(&).
" >
]>
This sample shows a &tricky; method.&example;&book;
-------------- next part --------------
'>
%bazatt;
]>
&weird;
From tbray at textuality.com Wed Nov 5 08:36:07 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:58:49 2004
Subject: XML processing experiments
Message-ID: <3.0.32.19971105003221.00b6529c@pop.intergate.bc.ca>
First off, thanks to James for a some very thought-provoking work.
At 07:03 PM 04/11/97 +0700, James Clark wrote:
>If all you want to do is be able to
>correctly parse well-formed XML, and you don't care about detecting
>whether or not it is well-formed, how much code does it take and is it
>significantly faster than using an XML parser ...
>Lark: 10.5 seconds .. MSXML: 24 .. nsgmlsu: 8 .. sgcount:11 ..
>xmlec (C): 0.5 seconds .. (Java): 1.5 seconds.
[BTW, when I got Lark to run "almost as fast as SP", I decided that
was qualitatively fast enough for now].
>I was quite surprised that there was such a big performance difference
No kidding.
Discussions here are a bit dangerous, since in the Java domain, we are
kind of operating in the dark; we don't have profiling tools
with really good granularity. This is my excuse for engaging in
performance analysis based on intuition, something for which I have
personally fried more than one junior programmer.
Let's look at James' code eating up a "-quoted literal, where characters
are in the byte array buf[], start and end being integer indices therein:
case (byte)'"':
{
for (++start; start != end; ++start) {
if (buf[start] == (byte)'"') {
nextTokenIndex = start + 1;
return TOK_LITERAL;
}
The following are candidates for why a program like Lark or MSXML
might run slower.
- works with Java char rather than byte variables
- does a method dispatch (or at least a few conditionals) per
character processed for at least two reasons: to manage the entity
stack, and to have a place to put the different character encoding
processing modules.
[Note: A look at James' code makes me wonder if this is
*really* as necessary as I thought]
- does quite a bit more work upon recognizing some markup
constructs; in particular for a start tag it pulls
appart the attribute list and packages up the element type
& attributes in a nice structure convenient for an API user
I went and looked at Lark's main loop, and for a 'typical' character
processing mode, i.e. it's not the begin or end of a tag or attribute or
something and no buffers run out but the text is being saved, it ends up
executing 25 lines of Java including one getXmlCharacter() method
dispatch; none of them are monster conditionals or anything.
James' code above, in the equivalent case, is executing 3 I think.
so while lines-of-code is very shaky yardstick indeed, the difference is
8 or 9 to 1, which is not out of line with the observed performance
difference.
My intuition is that what's holding Lark back is
(a) the per-char dispatching, and
(b) turning the DFA crank, which requires a 2D array reference, then
a shift & mask
I have some ideas on how to fix both, but first I have to make Lark
do conditional sections and validate (neither should slow it down
significantly).
One other experiment would be useful, that might shed light from
a different angle. James, how about doing element counts per type;
i.e. actually *using* some of the info come back from the tokenizer,
nothing fancy, just use a java.util.Hashtable or some such; should be
able to run very similar code on Lark and your TokenStream thing; I
wonder if it would change the numbers. I'll get around to this sometime
if nobody else does, but not for the next 2-3 weeks. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Wed Nov 5 15:37:15 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:58:49 2004
Subject: new msxml behaviour
In-Reply-To: <3.0.1.32.19971105124130.00a8d220@mail.iguana.co.nz> (message
from Scott Cooper on Wed, 05 Nov 1997 12:41:30 +1300)
Message-ID: <199711051541.KAA00205@geode.ora.com>
> notice the appending of a space. is this appropriate? it means
> constructs like 'abc&SOME.ENTITY;def' expand to 'abc
> SOME.ENTITY.CONTENTS def' rather than 'abcSOME.ENTITY.CONTENTSdef'
> like it used to which really stuffs my application.
That's absolutely uncool. My résumé is NOT a r é sum é .
This will, I hope, be fixed before the actual release.
-Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jimg at digitalthink.com Wed Nov 5 17:28:49 1997
From: jimg at digitalthink.com (Jim Gindling)
Date: Mon Jun 7 16:58:49 2004
Subject: Entity processing (was new msxml behaviour)
Message-ID: <01BCE9CD.0E694A00.jimg@digitalthink.com>
Hi all,
Could somebody post the result of parsing the files Tim Bray posted according
to spec since there seems to be some question as to whether or not msxml is
doing it properly.
Thanks in advance.
Jim
On Wednesday, November 05, 1997 12:36 AM, Tim Bray [SMTP:tbray@textuality.com]
wrote:
> At 12:41 PM 05/11/97 +1300, Scott Cooper wrote:
> >the new msxml contains this code ...
> > if (sb.length() > 0)
> > sb.append(' ');
> >notice the appending of a space. is this appropriate?
>
> Recent WG decisions require that parameter entity expansions (outside
> of entitity values) should be forced to match an even number tokens
> simply by appending & prepending spaces to their expansion; I
> hypothesize that this is what the msxml code is doing.
>
> Of course, this can't be done when building the replacement text
> of an internal text entity. To aid in sorting this, out, I attach
> a couple of my test files; credit is due to Henry Thomson, Michael
> Sperberg-McQueen, and likely others for helping cook these up.
>
> I *think* that the behavior of Lark 0.97 on these is per the spec.
> But it's there's enough hair on this set of problems that there's
> lots of ways I could be wrong. -Tim << File: EntVal.xml >> << File:
> EntVal2.xml >>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at cogsci.ed.ac.uk Wed Nov 5 17:39:07 1997
From: richard at cogsci.ed.ac.uk (Richard Tobin)
Date: Mon Jun 7 16:58:50 2004
Subject: Entity processing (was new msxml behaviour)
In-Reply-To: Jim Gindling's message of Wed, 5 Nov 1997 09:27:36 -0800
Message-ID: <199711051738.RAA04357@stevenson.cogsci.ed.ac.uk>
>Could somebody post the result of parsing the files Tim Bray posted according
>to spec since there seems to be some question as to whether or not msxml is
>doing it properly.
Well here's what my XML parser makes of them. It's *intended* to
parse according to spec :-)
-- Richard
EntVal.xml:
' >
%xx;
An ampersand (&) may be escaped
numerically (&#38;) or with a general entity
(&)." >
]>
This sample shows a error-prone method.
An ampersand (&) may be escaped
numerically (&) or with a general entity
(&).
" (which we
certainly want to be legal) the last character of the pcdata is in a
different entity from the first.
-- Richard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jarle.stabell at dokpro.uio.no Fri Nov 7 17:08:51 1997
From: jarle.stabell at dokpro.uio.no (Jarle Stabell)
Date: Mon Jun 7 16:58:51 2004
Subject: XML processing experiments
Message-ID: <3.0.32.19971107181156.00849e80@hedvig.uio.no>
Richard Tobin wrote:
>I don't see how that excludes my example. The tags and elements *do*
>begin and end in the same entity.
Sorry, the sentence "it may end in a different entity from the one it
started in" "tricked" me into not reading your example fully, I thought I
saw "ac" and not "ac" (as you stated).
>(suppose foo is defined as "ac";
>then the first bit returned from "x&foo;y" is "xa".
Ok. My current design will first return PCData="x", then entity ref="foo",
and (if the client want entities expanded: PCData="a" followed by
EmptyElement="b" and then PCData="c".)
ie it may return two consecutive PCData's, with perhaps some
EntityExpansionStart and -End signals between them.
(Is this design flawed?)
Cheers,
Jarle
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dmck at cogsci.ed.ac.uk Fri Nov 7 17:09:33 1997
From: dmck at cogsci.ed.ac.uk (David McKelvie)
Date: Mon Jun 7 16:58:51 2004
Subject: XML processing experiments
In-Reply-To: <199711071648.QAA07983@stevenson.cogsci.ed.ac.uk> (message from
Richard Tobin on Fri, 7 Nov 1997 16:48:16 GMT)
Message-ID: <4468.199711071708@scotus.cogsci.ed.ac.uk>
>> " ...
my name is &name;
"
It's worth pointing out that Richard wants ALL of the PCDATA of the
element to be returned as one string of characters "my name is
Richard", rather than as two strings "my name is " and "Richard".
David
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at cogsci.ed.ac.uk Fri Nov 7 17:24:14 1997
From: richard at cogsci.ed.ac.uk (Richard Tobin)
Date: Mon Jun 7 16:58:51 2004
Subject: XML processing experiments
In-Reply-To: Jarle Stabell's message of Fri, 07 Nov 1997 18:11:57 +0100
Message-ID: <199711071724.RAA08821@stevenson.cogsci.ed.ac.uk>
> Ok. My current design will first return PCData="x", then entity ref="foo",
> and (if the client want entities expanded: PCData="a" followed by
> EmptyElement="b" and then PCData="c".)
> ie it may return two consecutive PCData's, with perhaps some
> EntityExpansionStart and -End signals between them.
> (Is this design flawed?)
This is reasonable, it's just not what we wanted to do, because we
have existing programs (which previously processed "normalised SGML")
which (a) expect to see all entities fully expanded and (b) expect to
see pcdata including references put together into a single bit. For
example, our grep-like program should match the example when searching
for the text "xa".
-- Richard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jarle.stabell at dokpro.uio.no Fri Nov 7 17:39:16 1997
From: jarle.stabell at dokpro.uio.no (Jarle Stabell)
Date: Mon Jun 7 16:58:51 2004
Subject: XML processing experiments
Message-ID: <3.0.32.19971107184336.00844660@hedvig.uio.no>
David McKelvie wrote:
>>> " ...
my name is &name;
"
>
>It's worth pointing out that Richard wants ALL of the PCDATA of the
>
element to be returned as one string of characters "my name is
>Richard", rather than as two strings "my name is " and "Richard".
Yes. But this requires one to copy (at least the first string) and a
concatenation.
Some applications may be more interested in the speedup which may result
from not doing this copying/concatenation, and happily accept the small
increase in complexity handling it.
I'm playing with a design involving two pluggable "ESIS-handlers", one
"low-level", where GI's, attribute names, attribute values, comments etc
points directly into the source. (typically via a filemapping or an
in-memory-buffer)
The "low-level" ESIS-handler may copy the data into "real" strings,
concatenate the consecutive PCDATA sections , build the tree, do validation
etc and pass the events to an optional "higher-level" ESIS-handler.
I think/hope the layer which triggers the low-level events won't be very
different from Mr Clark's "quick and dirty" parser.
(Not sure yet whether the low-level handler should just receive events, or
whether it should query for the next event/token.)
Cheers,
Jarle
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Fri Nov 7 17:40:10 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:58:52 2004
Subject: XML processing experiments
Message-ID: <3.0.32.19971107093943.00a52858@pop.intergate.bc.ca>
At 06:11 PM 07/11/97 +0100, Jarle Stabell wrote:
>Ok. My current design will first return PCData="x", then entity ref="foo",
>and (if the client want entities expanded: PCData="a" followed by
>EmptyElement="b" and then PCData="c".)
>ie it may return two consecutive PCData's, with perhaps some
>EntityExpansionStart and -End signals between them.
>(Is this design flawed?)
If "foo" is an *internal* entity, the spec clearly requires your
parser to expand it for the application. But letting the app know
that the ref was encountered is also fine.
However, the spec says nothing that would require you to merge the
text from a variety of entities. For example, Lark's event-stream
API will generate a series of Text object events in just this
situation. On the other hand, once you've seen the end of the element,
Lark has an API just to get all the text. This is strictly a matter
of a design choice; as Richard points out, if you want to support a
"grep" application, you'd probably like to have entity replacements
merged for you. On the other hand, if you're building a full-text
index, you probably need to have the separate chunks made visible
so that you know what to point at from the index.
As James has pointed out more than once, there is no universal
document API that meets everybody's application needs. One of
the nice things about XML is that if you can't find a parser that
has the API you need, you can go build your own without excessive
pain. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dmck at cogsci.ed.ac.uk Fri Nov 7 18:04:28 1997
From: dmck at cogsci.ed.ac.uk (David McKelvie)
Date: Mon Jun 7 16:58:52 2004
Subject: XML processing experiments
In-Reply-To: <3.0.32.19971107184336.00844660@hedvig.uio.no> (message from
Jarle Stabell on Fri, 07 Nov 1997 18:43:37 +0100)
Message-ID: <4610.199711071804@scotus.cogsci.ed.ac.uk>
>> Some applications may be more interested in the speedup which may result
>> from not doing this copying/concatenation, and happily accept the small
>> increase in complexity handling it.
As Tim Bray says that is another fine way to do it.
>> I'm playing with a design involving two pluggable "ESIS-handlers", one
>> "low-level", where GI's, attribute names, attribute values, comments etc
>> points directly into the source. (typically via a filemapping or an
>> in-memory-buffer)
We started off doing something like this in LTNSL, but stopped doing
filemapping (a) because it wasn't very portable and (b) either you do
some tricky decisions about when you free these pointers into the
source or it makes reading huge corpora like the 2 gigabyte BNC corpus
impossible which we wanted to be able to do.
David
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Fri Nov 7 21:46:49 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:58:52 2004
Subject: XML processing experiments
Message-ID: <41135C785691CF11B73B00805FD4D2D703E4FD2F@red-17-msg.dns.microsoft.com>
The Object Model in MSXML handles this by providing a convenience getText()
function on all Element nodes that returns the concatenated text. If
someone really wants to see the entity ref nodes, they can enumerate the
child nodes and find them. This way the client decides what they want.
> -----Original Message-----
> From: Jarle Stabell [SMTP:jarle.stabell@dokpro.uio.no]
> Sent: Friday, November 07, 1997 9:44 AM
> To: xml-dev@ic.ac.uk
> Subject: Re: XML processing experiments
>
> David McKelvie wrote:
> >>> " ...
my name is &name;
"
> >
> >It's worth pointing out that Richard wants ALL of the PCDATA of the
> >
element to be returned as one string of characters "my name is
> >Richard", rather than as two strings "my name is " and "Richard".
>
> Yes. But this requires one to copy (at least the first string) and a
> concatenation.
>
> Some applications may be more interested in the speedup which may result
> from not doing this copying/concatenation, and happily accept the small
> increase in complexity handling it.
>
> I'm playing with a design involving two pluggable "ESIS-handlers", one
> "low-level", where GI's, attribute names, attribute values, comments etc
> points directly into the source. (typically via a filemapping or an
> in-memory-buffer)
> The "low-level" ESIS-handler may copy the data into "real" strings,
> concatenate the consecutive PCDATA sections , build the tree, do
> validation
> etc and pass the events to an optional "higher-level" ESIS-handler.
>
> I think/hope the layer which triggers the low-level events won't be very
> different from Mr Clark's "quick and dirty" parser.
>
> (Not sure yet whether the low-level handler should just receive events, or
> whether it should query for the next event/token.)
>
>
> Cheers,
> Jarle
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Sat Nov 8 06:03:35 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:58:52 2004
Subject: XML processing experiments
References: <4610.199711071804@scotus.cogsci.ed.ac.uk>
Message-ID: <3463F857.31040CF6@jclark.com>
David McKelvie wrote:
> We started off doing something like this in LTNSL, but stopped doing
> filemapping (a) because it wasn't very portable and
What systems did you have problems with? Win32 supports it and I thought
most modern Unix systems now did.
> (b) either you do
> some tricky decisions about when you free these pointers into the
> source or it makes reading huge corpora like the 2 gigabyte BNC corpus
> impossible which we wanted to be able to do.
Yes, I can see that's a problem. How common do people think XML files
bigger than 1 gigabyte or so are going to be? How hard would it be do
use external entity references to split it up into files smaller than 1
gigabyte?
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Sat Nov 8 12:22:47 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:58:52 2004
Subject: XML processing experiments
References: <3.0.32.19971107093943.00a52858@pop.intergate.bc.ca>
Message-ID: <346456E7.88135CDA@jclark.com>
Tim Bray wrote:
>
> At 06:11 PM 07/11/97 +0100, Jarle Stabell wrote:
> >Ok. My current design will first return PCData="x", then entity ref="foo",
> >and (if the client want entities expanded: PCData="a" followed by
> >EmptyElement="b" and then PCData="c".)
> >ie it may return two consecutive PCData's, with perhaps some
> >EntityExpansionStart and -End signals between them.
> >(Is this design flawed?)
>
> If "foo" is an *internal* entity, the spec clearly requires your
> parser to expand it for the application. But letting the app know
> that the ref was encountered is also fine.
I think it's also fine to give the app control over when the parser
performs the expansion. One reason to do this is that the internal
entity may be defined in an external parameter entity or external DTD
subset. An app may not want to wait to retrieve this when it could be
continuing to parse the entity in which the reference occurs.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sat Nov 8 16:25:31 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:58:52 2004
Subject: XML processing experiments
Message-ID: <3.0.32.19971108082338.00a2105c@pop.intergate.bc.ca>
At 07:11 PM 08/11/97 +0700, James Clark wrote:
>> If "foo" is an *internal* entity, the spec clearly requires your
>> parser to expand it for the application. ...
>
>I think it's also fine to give the app control over when the parser
>performs the expansion.
This may be the case, but it's not what the spec says today. From
4.4 in the 970807 version:
For an internal (text) entity, the processor must include the entity;
that is, retrieve its replacement text and process it as a part
of the document (i.e. as content or AttValue, whichever was being
processed when the reference was recognized), passing the
result to the application in place of the reference.
>One reason to do this is that the internal
>entity may be defined in an external parameter entity or external DTD
>subset. An app may not want to wait to retrieve this when it could be
>continuing to parse the entity in which the reference occurs.
I think we're OK on this one. I think we voted that entities whose
declarations are not available because they were in an external part
of the DTD and the processor skipped that part (as it's allowed to)
are treaded as external entity refs and may be skipped even if they
happened to be internal entities. -T.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jarle.stabell at dokpro.uio.no Sun Nov 9 22:53:07 1997
From: jarle.stabell at dokpro.uio.no (Jarle Stabell)
Date: Mon Jun 7 16:58:52 2004
Subject: XML processing experiments
Message-ID: <01BCED6A.8AD2D340@xyplex04.uio.no>
At 07:11 PM 08/11/97 +0700, James Clark wrote:
>> If "foo" is an *internal* entity, the spec clearly requires your
>> parser to expand it for the application. ...
>
>I think it's also fine to give the app control over when the parser
>performs the expansion.
This may be the case, but it's not what the spec says today. From
4.4 in the 970807 version:
For an internal (text) entity, the processor must include the entity;
that is, retrieve its replacement text and process it as a part
of the document (i.e. as content or AttValue, whichever was being
processed when the reference was recognized), passing the
result to the application in place of the reference.
[JS] Some apps needs entity expansion *not* to happen, so I think the spec
shouldn't forbid the processor to let the app decide upon this. (I can't see any harm in this, just a *very* useful feature for those apps which needs it.)
F.i. authoring tools which loads the documents into some sort of "structured editor" shouldn't "flatten" the document if the user doesn't want this.
Same applies to tools which updates documents, f.i. synchronizing documents with respect to other data (data in a database etc).
(Converters may also want entity expansion not to happen)
Of course, one has to add some special logic to the processor in order to fully validate/check the document in this case (or just validate it up front with a "normal" parse (if validation is necessary at all), followed by "semi-parsing" it with "no-expansion")
Cheers,
Jarle
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From thoki at csi.com Mon Nov 10 11:53:55 1997
From: thoki at csi.com (Thorsten Kitz)
Date: Mon Jun 7 16:58:52 2004
Subject: List of possible choices in a DTD
Message-ID: <01bcedcf$08216230$0100007f@potter>
Hey,
I have a really simple problem: I like to define a choice list in a DTD, eg
for element a list of possible values like "Monday", "Tuesday",
etc.
How can I do this? From my point of knowledge, it can't be done with an
entity
declaration, because it is just like a text replacement and it can't be done
with
an element declaration either (maybe I have overseen something).
Thanks for any help,
Thorsten.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Ingo.Macherius at TU-Clausthal.de Mon Nov 10 12:19:52 1997
From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius)
Date: Mon Jun 7 16:58:52 2004
Subject: List of possible choices in a DTD
In-Reply-To: <01bcedcf$08216230$0100007f@potter>
Message-ID: <199711101219.NAA03392@sinfonix.rz.tu-clausthal.de>
> From: "Thorsten Kitz"
> Subject: List of possible choices in a DTD
> I have a really simple problem: I like to define a choice list in a
> DTD, eg for element a list of possible values like
> "Monday", "Tuesday", etc.
So the name of the weekday is data content, not structural
information. Consider how this is done with HTML 4:
so a typical instance would be
Display of the list is a processing semantic, that can't be expressed
in the DTD. It's up to your application to make this a choice list,
eiher like a pulldown menu, a item list, etc.
Of course you could use specific names for your list, like
Monday
[...]
Sunday
or even
[...]
[...]
[...]
but the more tags you have, the more compicated your stylesheets etc.
become. To me names of days are data, not structure. Anyway, the last
example is different, as it may contain information *about* the
weekdays, e.g a hourly schedule.
Alternatively, weekdays could be attributes
[...]
[..]
[...]
Choice is up to you, but nowhere I can see a need for entities.
++im
BTW: XSL does not say anything about forms ! Should there be a
standard forms set, just like there are CALS tables and MathML ?
--
Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From pam.gennusa at DPSL.CO.UK Mon Nov 10 14:04:33 1997
From: pam.gennusa at DPSL.CO.UK (Pam Gennusa)
Date: Mon Jun 7 16:58:52 2004
Subject: XML product survey
Message-ID: >
On 18 November 1997, Technology Appraisals will again be hosting a one-day
seminar on XML in the UK (the first one was last April). I have been asked
to present a survey of the work to date on XML products and tools.
The presentation will not include any evaluation of the products or tools
mentioned. However, I would like to be able to give the following
information:
Vendor or independent developer's name
Contact information for vendor or developer (if desired)
Name of tool or product
General catelgory of tool or product
Status (released, in beta, etc.)
Commercial details (price, public domain, etc.)
Brief description of product or tool highlighting distinguishing
characteristics.
If you have not got an XML offering yet, can you please let me know if your
company:
a) has taken a position on XML product support and if so what
b) has made any announcements about XML product support and if so what (also
any caveats that apply)
c) is planning any XML product support that you are comfortable talking
about at this time
I appreciate any information you can supply. Ideally, I would like to get
the information by Thursday 13 November (earlier would be delightful). If
you intend to respond, but cannot by that date, please let me know as well.
Kind regards,
Pam
P.S. On another topic, please note that the SGML/XML Europe '98 Call for
Papers is out. It is available on the GCA website at www.gca.org or you can
contact their office at +1 703 519 8167 to send a copy of the brochure. The
closing date for abstract submission is 19 December 1997. We are planning a
very high profile for XML at this conference including a new technologies
track.
****************************************************************************
*********
Pamela L. Gennusa, Managing Director, email: Pam.Gennusa@dpsl.co.uk
Database Publishing Systems Ltd, 608 Delta Business Park
Great Western Way, Swindon, Wiltshire SN5 7XF, UK
Tel:+44 1793 512 515;fax +44 1793 512 516 URL:www.dpsl.co.uk
****************************************************************************
*********
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Mon Nov 10 14:12:44 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:58:52 2004
Subject: List of possible choices in a DTD
Message-ID: <199711101408.BAA19941@jawa.chilli.net.au>
Is this what you are asking?
USING ELEMENT TYPES
-------------------
In standard SGML you can select days using parameter
entities like this:
In XML, I think you may have to give a different ELEMENT declaration
for each day (I cannot remember what was decided, sorry.
USING ATTRIBUTES
----------------
Again, in XML I think you may have to dereference the entity yourself.
(Even if you don't, it is probably good practise since parameter entities
will not be the first things implemented in beta XML parsers.)
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From thoki at csi.com Mon Nov 10 15:18:32 1997
From: thoki at csi.com (Thorsten Kitz)
Date: Mon Jun 7 16:58:52 2004
Subject: Escaping in entities
Message-ID: <01bcedeb$99037790$0100007f@potter>
Hello,
I have another question concerning Entities. My problem is, that I like
to generate a HTML-file out of a XML-document with German "Umlaute".
Normally, an "?" (ue) in HTML is written as ü. I tried the following
Entitiy-declaration in my DTD,
The result was just "ü". Then I tried
and
Both resulted in a Jade-error, arguing, that no entity "uuml" is defined.
How do I define an entitiy, that the correct HTML code is used?
Thanks,
Thorsten.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Mon Nov 10 16:12:54 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:58:52 2004
Subject: List of possible choices in a DTD
References: <01bcedcf$08216230$0100007f@potter>
Message-ID: <346732D7.9E1EA710@technologist.com>
Thorsten Kitz wrote:
>
> Hey,
>
> I have a really simple problem: I like to define a choice list in a DTD, eg
> for element a list of possible values like "Monday", "Tuesday",
> etc.
Try this:
]>
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dalapeyre at mulberrytech.com Mon Nov 10 19:57:57 1997
From: dalapeyre at mulberrytech.com (Deborah Aleyne Lapeyre)
Date: Mon Jun 7 16:58:52 2004
Subject: Please do an XML Poster at SGML/XML'97!!!!
Message-ID:
Dear Developer's List,
Posters are a great way to advertise your product at the
SGML/XML'97 Conference! (This year in Washington D.C. USA
on December 8-11) If you are coming to the conference
anyway, it's FREE advertising. So is the New Technology
Nursery in the Exhibit Hall! (If you aren't registered,
I can sneak you in for ONE day only to do a poster.)
I would also really like a few XML case studies and
a few XML technical posters, Please! The average SGMLer
is very curious as to what is going on in actual XML
development, and is also very afraid that XML is a
dream and not real. This is your chance to tell them.
Don't know what a poster is? Drop me a private email
and I'll tell you. Know all about them? The technical
details are given below.
--Debbie (Co-chair of SGML/XML'97)
USA Phone: 301/315-9633
****** SGML/XML'97 POSTER GUIDELINES ******
----------------------------------------------------------------------
WHAT YOU SEND for the POSTER PROGRAM
(Deadline November 24, 1997)
(E-mail to : Melanie Yunk )
1. Title of your poster presentation
2. Poster Abstract (1-3 short paragraphs)
3. Your name(s) and address(s) (including email)
----------------------------------------------------------------------
WHAT YOU BRING TO SGML/XML'97 (or ship)
(Deadline December 7/8, 1997)
(To post on a 4 foot by 8 foot cork board)
1. Poster(s) --
Text big enough to read from 4 or 5 feet away.
Size approximately 22 x 28" (56 by 71 cm).
(22 x 26" is fine.) Thin paper, not foam core.
2. Handouts (Optional)
----------------------------------------------------------------------
POSTER CATEGORIES
1. Technical poster (case study or technical topic)
2. Vendor posters (free advertising))
----------------------------------------------------------------------
*** FREE ENLARGING ***
(Deadline: Received BEFORE November 12, 1997)
Send 8 1/2 x 11 or A4 paper to GCA and they will
enlarge to poster size for free. GCA's address:
Graphic Communications Association; Poster Submission;
ATTN: Tanya Bose; 100 Daingerfield Road; Alexandria, VA
USA 22314-2888
----------------------------------------------------------------------
Don't know what a poster is, want to know if there is a
reward in all this, other questions, comments or for
sending title/abstract/name by email:
Melanie Yunk
---- end ----
======================================================================
Deborah A. Lapeyre Phone: 301/315-9631
Mulberry Technologies, Inc. Fax: 301/315-8285
17 West Jefferson Street, Suite 207 E-mail: dalapeyre@mulberrytech.com
Rockville, MD 20850 WWW: http://www.mulberrytech.com
======================================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter.bergstrom at eurostep.se Wed Nov 12 13:19:30 1997
From: peter.bergstrom at eurostep.se (Peter Bergstrom)
Date: Mon Jun 7 16:58:52 2004
Subject: Software for MathML?
Message-ID: <01BCF1D1.ACAFEA00@WIN95.swipnet.se>
I'm trying to find software that works with MathML, especially browsers for the display part of the language. Can someone please point me at something?
Peter
--
Peter Bergstrom EuroSTEP AB
mobile phone: +46 708 111 966 Drottninggatan 71 D
mobile fax: +46 708 111 965 S-111 36 Stockholm
Sweden
http://www.eurostep.se/
Open solutions for open organisations and people
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Wed Nov 12 15:18:28 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:58:52 2004
Subject: Software for MathML?
In-Reply-To: <01BCF1D1.ACAFEA00@WIN95.swipnet.se> (message from Peter
Bergstrom on Sat, 15 Nov 1997 14:05:43 +-100)
Message-ID: <199711121522.KAA24649@geode.ora.com>
[Peter Bergstrom]
> I'm trying to find software that works with MathML, especially
> browsers for the display part of the language. Can someone please
> point me at something?
>From _World Wide Web Journal_, Volume 2, Issue 4, "XML: Principles,
Tools, and Techniques", p. 85 "HTML-Math":
There are already two early rendering prototypes:
o WebEQ, a Java development, from the Geometry Center at the
University of Minnesota
o An inclusion in the Techexplorer product from the Interactive
Document labs of IBM
HTH,
Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Fri Nov 14 03:47:31 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:53 2004
Subject: Query Languages for XML
Message-ID: <3.0.5.32.19971113224751.00937100@pop.access.digex.net>
What query languages are under development for use with XML
documents? I'm talking about data management query languages
akin to SQL and OQL. I can envision immense repositories of
highly dynamic content. It seems to me that such a query
language will eventually become necessary to access or change
XML documents that are shared among many users. The language
would also need to be a standard to ensure that the clients
and servers of different vendors will interoperate.
Designing a standard that queries a single repository might
be relatively straightforward. Designing a standard that
allows queries across multiple repositories might be a bit
more of a challenge. (Think of a future internet in which
documents are related by extended links that assign roles
to everything, and imagine performing a read-only query
across the whole mesh of globally distributed documents.)
I am aware of the SgmlQL and SDQL languages, although I know
only what can be gleaned from an hour's browsing on the web.
(See http://www.lpl.univ-aix.fr/projects/SgmlQL/ for info on
SgmlQL.) I'd rather see something more object-oriented, like
ODMG's OQL, or something that uses XML to specify queries.
BTW, Microsoft's XML-Data would be quite a boon for such a
large XML repository. Clients could use XML to specify new
document types or to change existing document types, and
the whole DTD schema could itself reside in the repository.
One query language could be used to maintain both the data
and the DTDs. If the query language were in XML, it itself
would be extensible.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Ingo.Macherius at TU-Clausthal.de Fri Nov 14 04:19:10 1997
From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius)
Date: Mon Jun 7 16:58:53 2004
Subject: Query Languages for XML
In-Reply-To: <3.0.5.32.19971113224751.00937100@pop.access.digex.net>
Message-ID: <199711140418.FAA01837@sinfonix.rz.tu-clausthal.de>
> Date: Thu, 13 Nov 1997 22:47:51 -0500
> From: Joe Lapp
> Subject: Query Languages for XML
Joe asks many questions I've asked myself, let me add some more.
> I am aware of the SgmlQL and SDQL languages, although I know
> only what can be gleaned from an hour's browsing on the web.
IMO there are three query languages, for each xml-part:
1) In XLL there are XLinks
2) In XSL there are the pattern parts of a rule
3) In DOM there a navigation functions that query parts of the grove
To me all those are similar in a high degree. So why was the DSSSL
approach to have a single SDQL abadoned ? Why there isn't a
"XML-query" draft, which is mapped to a concrete syntax by XLL, XSL
and DOM ? There is much redundancy in this.
> BTW, Microsoft's XML-Data would be quite a boon for such a
> large XML repository.
Aren't XML-Data and MCF superseded by RDF (resource description
framework) ? Are there features in XML-Data and MCF that are not to
become part of RDF ?
> If the query language were in XML, it itself would be extensible.
Agreed. This approach was taken by XSL. This is a strong feature, as
one may use the same tools on document and meta level. There should
be a query language in XML syntax, and it should be modularized. This
query module should be imported by XSL, XLL and DOM.
The main obstacle is the fact thas XLinks and DOM API functions don't
use XML syntax, for obvious reasons. But this feature is closely related to
namespaces (or architectural forms) questions, because ideally names
need to be changed to fit the conventions of the importing language.
This ain't easy, because DOM ist a programming language. In XML
terseness matters, so do characters that have to be escaped in URL.
How can functionality and/or semantics of XML languages be mapped
non-xml languages ? Do architectectural forms offer such
functionality ?
Clueless,
++im
--
Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Patrice.Bonhomme at loria.fr Fri Nov 14 08:17:27 1997
From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme)
Date: Mon Jun 7 16:58:53 2004
Subject: case insensitive and ID
Message-ID: <199711140817.JAA09810@chimay.loria.fr>
Hi,
A very short question.
Are "P1S1" and p1s1" are the same ID attributes within an XML document?
Thanks
Pat.
--
==============================================================
bonhomme@loria.fr | Office : B.228
http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37
--------------------------------------------------------------
* Projet Aquarelle : http://aqua.inria.fr
* Serveur Silfide : http://www.loria.fr/Projet/Silfide
==============================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Fri Nov 14 14:59:43 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:53 2004
Subject: Query Languages for XML
In-Reply-To: <199711141437.OAA15051@nathaniel.eps.inso.com>
References: <199711140418.FAA01837@sinfonix.rz.tu-clausthal.de>
Message-ID: <3.0.5.32.19971114095951.0093b2e0@pop.access.digex.net>
At 02:37 PM 11/14/1997 GMT, you wrote:
>>To me all those are similar in a high degree. So why was the DSSSL
>>approach to have a single SDQL abadoned ? Why there isn't a
>>"XML-query" draft, which is mapped to a concrete syntax by XLL, XSL
>>and DOM ? There is much redundancy in this.
>
>I have been trying to get the DOM WG to realise this, and for us
>to work on a standard DOM API to queries (including syntax, and return
>result).
Is there any interest in putting a draft together to present to
the WG? Having a draft to work with could jump-start things.
I am currently on a self-funded sabbatical, and I have a lot of
time to devote to such an effort. I wouldn't mind coordinating
the process so long as we can gather a pool of people who are
willing to put some time into thinking about the issues.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From wendling at ganymede.isdn.uiuc.edu Fri Nov 14 16:50:51 1997
From: wendling at ganymede.isdn.uiuc.edu (Bill Wendling)
Date: Mon Jun 7 16:58:53 2004
Subject: Query Languages for XML
In-Reply-To: <3.0.5.32.19971114095951.0093b2e0@pop.access.digex.net>
Message-ID:
On Fri, 14 Nov 1997, Joe Lapp wrote:
}
}Is there any interest in putting a draft together to present to
}the WG? Having a draft to work with could jump-start things.
}
}I am currently on a self-funded sabbatical, and I have a lot of
}time to devote to such an effort. I wouldn't mind coordinating
}the process so long as we can gather a pool of people who are
}willing to put some time into thinking about the issues.
}
Excuse me for jumping in in the middle of a thread, but could someone tell
me what kind of query language is being asked for? I'm working in a group
which is trying to develop just such a thing in XML...
|| Bill Wendling wendling@ncsa.uiuc.edu
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at cogsci.ed.ac.uk Fri Nov 14 17:21:23 1997
From: richard at cogsci.ed.ac.uk (richard@cogsci.ed.ac.uk)
Date: Mon Jun 7 16:58:53 2004
Subject: Alpha-test release of RXP
Message-ID: <23145.199711141721@pitcairn.cogsci.ed.ac.uk>
To celebrate XML's first birthday, I am releasing an alpha-test version
of RXP, an XML parser in C. RXP will be the parser in the next release
of the LT XML system.
RXP goes some way to addressing the concerns about XML processing
speed raised on this mailing list. It can parse ot.xml in 0.8 seconds
on a 233MHz Pentium II.
This is not a public release, so please don't redistribute the system.
RXP is available (in source form only) at
ftp://ftp.cogsci.ed.ac.uk/pub/richard/rxp.tar.gz
-- Richard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eliot at isogen.com Fri Nov 14 18:01:02 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 16:58:53 2004
Subject: Query Languages for XML
References:
Message-ID: <346C90F2.A4566219@isogen.com>
Any query language for XML would, by necessity, be a syntax for
accessing the properties of XML objects (where the object schema would
come from either the SGML property set, the DOM, or something derived
from one or both). Certainly SDQL provides this. Any new language
would, I think, mostly be an exercise in syntax definition, most of
which is already inherent in the development of XSL (which is nothing
more than a language for applying processes to the results of queries on
XML groves).
>From another point of view, it's not possible to have *an* XML query
language because there are too many different ways that you might want
to access XML data: as nodes in groves ala SDQL, as full text using some
full-text index, as semantic-specific objects using some domain-specific
query mechanism, etc.
A language like SDQL coupled with an XML property set (that is, the
subset of the SGML property set needed to represent XML documents)
provides a complete set of operations for querying XML documents
represented as groves. These operations can be used either as
primitatives from which more specialized languages are created or as a
design spec that drives the development of a new syntax for expressing
the equivalent queries.
So my question is: is what is desired only a new *syntax* or is there a
requirement for a fundamentally different query mechanism? Or have I
entirely missed the point of the original question?
Cheers,
Eliot
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Fri Nov 14 19:01:30 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:53 2004
Subject: MSXML is tied to Windows
Message-ID: <3.0.5.32.19971114094153.00917d30@pop.access.digex.net>
I've been looking at Microsoft's MSXML. Although it is written
in Java, it is tied to the Windows platform. Instead of using
Java's URL facilities in java.net.*, it provides an ActiveX
control to do the job. Class com.ms.xml.util.XMLInputStream
relies on COM interface IXMLStream, which MSXML provides as a
DLL written in C++ (see com\ms\xml\XMLStream\XMLURLStream).
I'm looking for a 100% pure Java XML parser that is being
actively maintained. I've got a few projects up my sleeve,
and I want to be sure that the code I write is cross-platform.
If I write to MSXML, I tie myself into Microsoft's API. Given
that the only implementation of that API works only on Windows,
to write to MSXML would be to tie my Java tool to Windows.
It seems that Microsoft has the most complete implementation
of an XML parser, so Microsoft is doing a very good job of
trying to get me to write Java that works only on Windows.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Fri Nov 14 19:18:56 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:58:53 2004
Subject: MSXML is tied to Windows
Message-ID: <01bcf131$bdda8ec0$0100007f@localhost>
Joe,
Windows dependency of MSXML is minimal. All you have to do is following:
1. remove com.ms.xml.dso package.
Delete the class files from the jar and/or comment it out of the makefile.
DSO is accessed by some of the samples but none of the other MSXML packages.
2. remove dependency on com.ms.xml.xmlstream package.
Latest version of MSXML includes an alternate XMLInputStream class located
inside the 'make' directory. Replace com.ms.xml.util.XMLInputStream with
the alternate version to remove dependency on com.ms.xml.xmlstream package.
WIth above two changes, you will end up with a pure-Java version of MSXML.
MSXML is the most complete XML parser available right now and you get the
source code on top of it. I would be smiling by now if I were you :-)
Don Park
-----Original Message-----
From: Joe Lapp
To: xml-dev@ic.ac.uk
Date: Friday, November 14, 1997 11:02 AM
Subject: MSXML is tied to Windows
>I've been looking at Microsoft's MSXML. Although it is written
>in Java, it is tied to the Windows platform. Instead of using
>Java's URL facilities in java.net.*, it provides an ActiveX
>control to do the job. Class com.ms.xml.util.XMLInputStream
>relies on COM interface IXMLStream, which MSXML provides as a
>DLL written in C++ (see com\ms\xml\XMLStream\XMLURLStream).
>
>I'm looking for a 100% pure Java XML parser that is being
>actively maintained. I've got a few projects up my sleeve,
>and I want to be sure that the code I write is cross-platform.
>If I write to MSXML, I tie myself into Microsoft's API. Given
>that the only implementation of that API works only on Windows,
>to write to MSXML would be to tie my Java tool to Windows.
>
>It seems that Microsoft has the most complete implementation
>of an XML parser, so Microsoft is doing a very good job of
>trying to get me to write Java that works only on Windows.
>--
>Joe Lapp (Java Apps Developer/Consultant)
>Unite for Java! - http://www.javalobby.org
>jlapp@acm.org
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From andrewl at microsoft.com Fri Nov 14 19:47:36 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun 7 16:58:53 2004
Subject: MSXML is tied to Windows
Message-ID: <7BB61B44F197D011892800805FD4F79201CD6571@red-03-msg.dns.microsoft.com>
I believe that the dependency on the XMLInputStream interface is to avoid
some bugs in the JDK 1.1 libraries that do not handle byte ordering
correctly on Apple platforms. That is my memory; you could alter the code to
use the JDK packages and test on Apple if you like.
The packages com.ms.xml.xmlstream and the alternate version are functionally
equivalent, but the Windows-specific one has much higher performance. Choose
portable or fast depending on your needs.
--Andrew Layman
AndrewL@microsoft.com
> -----Original Message-----
> From: Don Park [SMTP:donpark@quake.net]
> Sent: Friday, November 14, 1997 11:16 AM
> To: Joe Lapp; xml-dev@ic.ac.uk
> Subject: Re: MSXML is tied to Windows
>
> Joe,
>
> Windows dependency of MSXML is minimal. All you have to do is following:
>
> 1. remove com.ms.xml.dso package.
>
> Delete the class files from the jar and/or comment it out of the makefile.
> DSO is accessed by some of the samples but none of the other MSXML
> packages.
>
> 2. remove dependency on com.ms.xml.xmlstream package.
>
> Latest version of MSXML includes an alternate XMLInputStream class located
> inside the 'make' directory. Replace com.ms.xml.util.XMLInputStream with
> the alternate version to remove dependency on com.ms.xml.xmlstream
> package.
>
> WIth above two changes, you will end up with a pure-Java version of MSXML.
> MSXML is the most complete XML parser available right now and you get the
> source code on top of it. I would be smiling by now if I were you :-)
>
> Don Park
>
> -----Original Message-----
> From: Joe Lapp
> To: xml-dev@ic.ac.uk
> Date: Friday, November 14, 1997 11:02 AM
> Subject: MSXML is tied to Windows
>
>
> >I've been looking at Microsoft's MSXML. Although it is written
> >in Java, it is tied to the Windows platform. Instead of using
> >Java's URL facilities in java.net.*, it provides an ActiveX
> >control to do the job. Class com.ms.xml.util.XMLInputStream
> >relies on COM interface IXMLStream, which MSXML provides as a
> >DLL written in C++ (see com\ms\xml\XMLStream\XMLURLStream).
> >
> >I'm looking for a 100% pure Java XML parser that is being
> >actively maintained. I've got a few projects up my sleeve,
> >and I want to be sure that the code I write is cross-platform.
> >If I write to MSXML, I tie myself into Microsoft's API. Given
> >that the only implementation of that API works only on Windows,
> >to write to MSXML would be to tie my Java tool to Windows.
> >
> >It seems that Microsoft has the most complete implementation
> >of an XML parser, so Microsoft is doing a very good job of
> >trying to get me to write Java that works only on Windows.
> >--
> >Joe Lapp (Java Apps Developer/Consultant)
> >Unite for Java! - http://www.javalobby.org
> >jlapp@acm.org
> >
> >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> >(un)subscribe xml-dev
> >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> >subscribe xml-dev-digest
> >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
> >
> >
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Per-Ake.Ling at uab.ericsson.se Fri Nov 14 20:03:34 1997
From: Per-Ake.Ling at uab.ericsson.se (Per-Ake Ling)
Date: Mon Jun 7 16:58:53 2004
Subject: MSXML is tied to Windows
Message-ID: <199711142001.VAA15294@uabs19c26.eua.ericsson.se>
> From: Andrew Layman
...[snip]
> I believe that the dependency on the XMLInputStream interface is to avoid
> some bugs in the JDK 1.1 libraries that do not handle byte ordering
> correctly on Apple platforms. That is my memory; you could alter the code to
> use the JDK packages and test on Apple if you like.
>
> The packages com.ms.xml.xmlstream and the alternate version are functionally
> equivalent, but the Windows-specific one has much higher performance. Choose
> portable or fast depending on your needs.
I can accept the second paragraph but the first one is very confusing: if
the portable code may have problems on Apple, use the Windows-specific
code so it can't run on Apple or on any other platform ?
Per-Åke
--
Per-Åke Ling (note: Per-Åke, transliteration Per-Ake)
email: Per-Ake.Ling@uab.ericsson.se phone: +46 8 727 5674
Ericsson Utvecklings AB mobile: +46 70 790 2446
AXE Research and Development fax: +46 8 727 3463
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Fri Nov 14 20:05:17 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:53 2004
Subject: MSXML is tied to Windows
In-Reply-To: <01bcf131$bdda8ec0$0100007f@localhost>
Message-ID: <3.0.5.32.19971114150447.0093ec10@pop.access.digex.net>
At 11:15 AM 11/14/1997 -0800, you wrote:
>Windows dependency of MSXML is minimal. All you have to do is following:
>[...]
>WIth above two changes, you will end up with a pure-Java version of MSXML.
>MSXML is the most complete XML parser available right now and you get the
>source code on top of it. I would be smiling by now if I were you :-)
How 'bout that! Microsoft's EULA even grants us the right to redistribute
such modified code. Quite generous of them, I must say. Microsoft just
went up a point in my rating system. I am indeed smiling now. :-)
My apologies to the MSXML team.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From andrewl at microsoft.com Fri Nov 14 21:39:16 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun 7 16:58:53 2004
Subject: MSXML is tied to Windows
Message-ID: <7BB61B44F197D011892800805FD4F79201CD657B@red-03-msg.dns.microsoft.com>
Maybe I should have been more clear.
The parser uses a newly-defined Interface to a stream library that is
specific to XML. The parser does not use the implementations of streams
provided in the JDK 1.1 packages for the internet. I believe that this has
to do with byte-ordering problems in those implementations. I have not
checked this for myself.
The interface per se has no platform dependencies. It is shipped with two
implementations. One implementation is specific to Windows, the other is
generic Java using JDK packages. Neither has the byte-order flaw. You may
use whichever one you prefer. Both work. The generic one has lower
performance.
--Andrew Layman
AndrewL@microsoft.com
> -----Original Message-----
> From: Per-Ake.Ling@uab.ericsson.se [SMTP:Per-Ake.Ling@uab.ericsson.se]
> Sent: Friday, November 14, 1997 12:02 PM
> To: xml-dev@ic.ac.uk
> Subject: RE: MSXML is tied to Windows
>
>
> > From: Andrew Layman
> ...[snip]
> > I believe that the dependency on the XMLInputStream interface is to
> avoid
> > some bugs in the JDK 1.1 libraries that do not handle byte ordering
> > correctly on Apple platforms. That is my memory; you could alter the
> code to
> > use the JDK packages and test on Apple if you like.
> >
> > The packages com.ms.xml.xmlstream and the alternate version are
> functionally
> > equivalent, but the Windows-specific one has much higher performance.
> Choose
> > portable or fast depending on your needs.
>
> I can accept the second paragraph but the first one is very confusing: if
> the portable code may have problems on Apple, use the Windows-specific
> code so it can't run on Apple or on any other platform ?
>
> Per-Åke
> --
> Per-Åke Ling (note: Per-Åke, transliteration Per-Ake)
> email: Per-Ake.Ling@uab.ericsson.se phone: +46 8 727 5674
> Ericsson Utvecklings AB mobile: +46 70 790 2446
> AXE Research and Development fax: +46 8 727 3463
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From andrewl at microsoft.com Fri Nov 14 22:06:05 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun 7 16:58:53 2004
Subject: How best to represent unrepresentable characters in NAME toke
ns?
Message-ID: <7BB61B44F197D011892800805FD4F79201CD6581@red-03-msg.dns.microsoft.com>
Thank you all for the suggestions you have made to me (many privately)
regarding this question. Here is the policy I intend to follow and to
recommend:
Sometimes you will want to use a character in a name, but that character is
not an XML NameChar. In that case, encode it, using a sequence such as
"_#xHHHH_" where "HHHH" is a hexadecimal rendition of the Unicode character.
For example "Two Words" would encode as "Two_#x0020_Words". Such encoding
(and subsequent decoding) is an application function, not part of the XML
specification per-se.
(This is the closest mapping I could make to using character entities in
names.)
--Andrew Layman
AndrewL@microsoft.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at light.demon.co.uk Sat Nov 15 08:08:28 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun 7 16:58:53 2004
Subject: Query Languages for XML
In-Reply-To: <346C90F2.A4566219@isogen.com>
Message-ID:
In message <346C90F2.A4566219@isogen.com>, "W. Eliot Kimber"
writes
>
>From another point of view, it's not possible to have *an* XML query
>language because there are too many different ways that you might want
>to access XML data: as nodes in groves ala SDQL, as full text using some
>full-text index, as semantic-specific objects using some domain-specific
>query mechanism, etc.
>
>So my question is: is what is desired only a new *syntax* or is there a
>requirement for a fundamentally different query mechanism? Or have I
>entirely missed the point of the original question?
One important thing about "Standard Query Language" is that it doesn't
just query. It is actually a complete language for "defining, accessing
and otherwise managing relational databases".
I think that anyone coming from an SQL background would find SDQL very
restricted, _in the sense that_ it provides a set of 'read-only'
functions that you can carry out on SGML documents which are, magically,
already there. Unlike SQL, SDQL provides you with no means to:
- create a schema;
- create a new document;
- edit an existing document;
- delete a document;
- manage access to documents;
- etc.
If you want to use SDQL as the basis for document management, you would
have a very hard time of it. And yet, surely that is what someone
looking to create and manage XML repositories is going to be interested
in having?
Obviously this is not just an XML problem: it applies equally to SGML,
which is in effect a "read only" standard. One example of this is in
the style language's (DSSSL or XSL) online support. As far as I am
aware, there is no support for any features of forms. Yet, if SDQL had
primitives such as (insert-node), (replace-node) and (add-text-to-node)
it wouldn't be too hard to add an "input-line" flow object type. And
the implications of even that simple addition would be pretty far-
reaching.
Richard.
Richard Light
SGML/XML and Museum Information Consultancy
richard@light.demon.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sat Nov 15 09:59:12 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:58:53 2004
Subject: case insensitive and ID
Message-ID: <3.0.32.19971114113638.00a9156c@pop.intergate.bc.ca>
At 09:17 AM 14/11/97 +0100, Patrice Bonhomme wrote:
>Are "P1S1" and p1s1" are the same ID attributes within an XML document?
No. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Nov 15 17:55:21 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:58:54 2004
Subject: and BEHAVIOR: a meta-proposal and a proposal
In-Reply-To: <3.0.5.32.19971114094153.00917d30@pop.access.digex.net>
Message-ID: <3.0.1.16.19971115184507.1fafd4be@pop3.demon.co.uk>
In this posting I make a proposal for the treatment of a certain class of
XML files. I offer this in the belief that there is a subset of the XML
community who will find this proposal useful and may wish to refine it
gently. There is also a meta-proposal for the use of a PI-target
associated in a general way with this list. This PI-target can in principle
could be used for a wide class of applications. If there are people who
believe that the *meta-proposal* is harmful or beneficial to the XML
community, I'd be grateful for their views posted to the list. If they
believe that the meta-proposal is acceptable, but they don't like the
proposal, then they can delete subsequent correspondence on the proposal
before reading. Constructive and Destructive criticism of the
meta-proposal is appropriate; destructive criticism of the proposal is
pointless.
A considerable amount of decision-making in XML is left 'to the
application' (i.e. some or all of the processing software after the
document has been parsed. In some cases the whole
authoring/distribution/parsing/application process is under the effective
control of some 'organisation'. They will develop their applications to be
consistent with the authoring tools and the document instances; this need
not concern us.
A number of groups and individuals are, however, proposing XML
'applications' where there is unlikely to be a single 'application' for
processing. Moreover, many of these may be DTD-less in some way and may
also not use style sheets. There is often an implied need for these
'applications' to set constraints on the processing software in ways that
are not covered, and not likely to be covered, by the formal specs.
In other cases the specs provide syntax, but no semantics, for certain
important operations. I believe that there may be cases where many people
want a particular generic behaviour where a broad consensus can be obtained
and which need not affect the formal spec development.
In any of these cases there is no general solution acceptable to
everyone
If no attempt is made to address these problems we shall either
end up with a Babel of incompatible solutions, or wait feebly for some
powerful autonomous entities to dictate a limited set of actions.
We have to be careful to avoid the 'only processable with
software X' syndrome There is a critical mass of readers of this list who feel the
need to address the problem. Anyone can use any PIs they like in their documents for whatever
purposes they like without breaking the spirit of XML. That processing software need not (and so far won't) take any
notice of these (or perhaps any) PIs
If a few people find a way of doing something that works for
them, and isn't against the spirit of the XML specs, then flaming their
ideas is pointless.The proposal I really want to address is, like Month Python's joke,
so potentially dangerous that I dare not reveal it yet. The proposal here
is also important to me - perhaps to others - and I hope servers as a
useful example. It is NOT in a finalised form, but as can be seen from the
meta-proposal, there is a method for referring to the a 'pseudo-final' form
that is, at least, usable.
That a PI of the form is 'reserved' by members of this list for
PI-based proposals on this list. [We cannot use XML-DEV as 'XML' is rightly
reserved.]
That anyone can post a proposal to this list for the use of this PI.
That any author can include an instance of such a proposed PI in their
document.
That any writer of application software can write software to process such
a PI.
That both of these should refer to an appropriate URL on this list's
archive discussing outlining the use of this PI.
That if someone doesn't approve of a proposal they ignore it rather than
flaming it. The fittest ideas will survive.
elements. There is much discussion on XML-DEV.
FatherBear's views are too hot and do not find favour.
MotherBear proposes an alternative. There is much more discussion but it
doesn't get much further. MotherBear is cool, but too cool.
BabyBear makes a third proposal. This is 'just right' for many people (but
not everyone, of course). Various people suggest that they could work along
with BabyBear's proposal. BabyBear and others hack it into shape. A set of
suggested guidelines is posted to XML-DEV.
Goldilocks E-pubCorp says that its userAgent will now support BabyBear's
proposal, and bolts it into their authoring tool.
[None of this need come anywhere near the XML-WG, XML-SIG, W3C or anything
else.]
The documents authored according to the BabyBear proposal might look like:
kilograms100
NULL
The JUMBO porridge-cooker (from all good e-marts) is XML-aware and
recognises the XDEV PI. The author of PORRIDGE.java makes sure that the
software is compliant with the proposal in BabyBear's posting to XML-DEV.
The JUMBO corp publishes some amendments to the cooking process (e.g.
).
That's all, folks. Nobody *has* to do any of this. FatherBear teams up
with the BigBadWolf corp. Their ideas do not flourish. They simply miss out
on porridge.
]]>
Now a real proposal.
I wish to display objects on the screen in a way not supported by XLL and
XSL. Specifically I have an element (object) which may be displayable on
its own ('standalone') or may be displayable in the context of another
object (perhaps a parent container). An example might be a PERSON in an
ORGCHART. (Actually I want to display ATOMS in MOLecules, of course). I
might wish to create an XML-LINK to a PERSON which displayed that PERSON.
Alternatively I might wish to create a link to that PERSON for display in
the context of the org-chart (i.e. when I actuate that link, the org-chart
is displayed and all other linked PERSONs.
I wish to use the BEHAVIOR attribute of XLL for this. No values for this
attribute are defined at present, and some XML-SIGers have suggested that
there will never be a definitive list. If values are chosen at random by
the community, then in a year's time we shall have chaos on the behaviour
attribute. So this proposal can be seen as suggesting a wider discussion of
possible values for BEHAVIOR.
Note that we don't all have to agree :-). It is perfectly possible that two
incompatible proposals appear. Both can use XDEV, but point to different
URLs (and hopefully have different mnemonics). No problem. A user agent can
implement on, both or neither. What I want to avoid is nine-and-sixty ways
that BEHAVIOR is used with no public specification of any semantics.
That two attribute values for XML-LINK's BEHAVIOR attribute be recognised
through an XDEV PI:
BEHAVIOR="DisplayStandAlone"
BEHAVIOR="DisplayInContext"
That for the second option an additional attribute CONTEXTREF is required,
whose value is a valid URL and points to the XML element providing the
display context of the current element.
The actual details of display are application (and possibly stylesheet)
dependent.
*This* proposal is identifiable through
HREF="http://www.lists.ic.ac.uk/hypermail/xml-dev/9711/xxxx.html"
(where xxxx represents the actual address of *this* hypermail (e.g. 6789 :-)
A user agent can be given the option of operating these semantics by a PI
of the form:
and can revert to the default or previous semantics by:
Note that the BEHAVIOR attribute is 'inherited' by all the PERSONs (see XLL
spec).
]]>
The display attribute for boss.xml is not activated by this proposal (there
may be a default local protocol for displaying bosses.) The PI switches on
a display protocol whereby the team members Fred, Wilma and Sally are
displayed in the context of their org-charts (in this case different). Sue
is displayed standalone. [The user agent knows how to display objects
standalone and in the context of other objects. Note that this can be a
fairly generic mechanism - JUMBO acts on any objects which provide a
display() method - not just PERSON. It also has/will_have a
highlightInContext(Node n) method for displaying *this* in the context of
Node n.]
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Sat Nov 15 19:58:35 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:58:54 2004
Subject: Query Languages for XML
References:
Message-ID: <346DFFA1.8C0F1ECD@technologist.com>
Richard Light wrote:
> Obviously this is not just an XML problem: it applies equally to SGML,
> which is in effect a "read only" standard. One example of this is in
> the style language's (DSSSL or XSL) online support. As far as I am
> aware, there is no support for any features of forms.
I do not believe that this is true. As I understand it, you can create a
"HTML form element" flow object and an "HTML input element" flow object
within it.
> Yet, if SDQL had
> primitives such as (insert-node), (replace-node) and (add-text-to-node)
> it wouldn't be too hard to add an "input-line" flow object type.
DSSSL has no provisions for adding flow object types in DSSSL code. So
we are essentially talking about the DSSSL implementation language (Java
or C, probably) It is as easy in these languages to define and implement
an "input field" flow object as it is to make a "hyperlink" flow object.
There is no need to extend SDQL. The input text does not have to be part
of the flow object tree. The "input text" flow object can handle the
interactivity just as the "hyperlink" flow object does.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Sat Nov 15 21:17:44 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:58:54 2004
Subject: Query Languages for XML
References:
Message-ID: <346E1234.D549A734@technologist.com>
Graydon Hoare wrote:
> If you run jade in sgml-to-sgml mode, you can make HTML out of arbitrary
> XML, but I don't think there are flow-objects representing forms.
Sorry I was talking about XSL. As I understand it, XSL will allow you to
create any HTML element.
> What would it mean to take a form
> flow object and render it through a TeX backend? The "interactive" nature
> is gone. What happens to a combo-box?
About the same as the printed rendition of a link or scroll flow object.
It would be completely useless. Stylesheets are tied to a particular
medium. Online stylesheets should have elements (link, input, scroll)
that allow interactivity and print-oriented stylesheet languages should
have elements that describe pages etc.
> I think the question being asked is whether you could make an input-text
> flow object which had a clearly defined semantics in altering your XML
> grove, not your flow object tree. For this, you would need an abstraction
> for the form submission/editing cycle, and such SDQL primitives as richard
> was mentionning.
Only if you take the approach that DSSSL code must manage the form
interactivity process. I don't see why it must. It seems simplest to
methat it should do the moral equivalent of "put a button here" and
leave the processing of the button click to JavaScript, Java or C++.
> It makes sense -- he's basically talking about getting
> full grove-manipulation into the query language so you can consider the
> grove a simple object database. OMDG OQL is probably worth looking over.
I can see that, but I don't think it necessarily has anything to do with
form input. The SQL model (I'm not familiar with OQL) is that a host
language (COBOL, PowerScript, JavaScript, Java, whatever) handles the
interactivity and issues data model update instructions. SQL does not
handle the user interface itself.
In other words, the vast majority of forms will have nothing to do with
the document grove itself. They may be forms designed to talk to
relational databases or object databases or CGI or whatever. We can
create these forms immediately, without touching SDQL. Yes, it would be
cool if SDQL allowed grove updates, and of course we expect that if it
did, you would be able to call it from the code that handles your
button, just as you could call SQL or OQL etc.
Anyhow, I think that the DOM allows updates, so if you use DOM functions
as your "query language" and the DOM model as your grove, then you will
have a read/write document query language.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Sun Nov 16 03:08:58 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:58:54 2004
Subject: and BEHAVIOR: a meta-proposal and a proposal
Message-ID: <199711160305.OAA25372@jawa.chilli.net.au>
> From: Peter Murray-Rust
>
> That two attribute values for XML-LINK's BEHAVIOR attribute be recognised
> through an XDEV PI:
> BEHAVIOR="DisplayStandAlone"
> BEHAVIOR="DisplayInContext"
> That for the second option an additional attribute CONTEXTREF is required,
> whose value is a valid URL and points to the XML element providing the
> display context of the current element.
> The actual details of display are application (and possibly stylesheet)
> dependent.
>
Another approach might be to use the name prefix XDEV: on attribute
values, e.g.
BEHAVIOUR="XDEV:DisplayStandAlone"
and the contextref attribute you suggest, e.g.
BEHAVIOUR="XDEV:DisplayInContext"
XDEV:CONTEXTREF="saltmines.xml#DESCENDANT(1,ORGCHART)"
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Nov 16 10:51:21 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:58:54 2004
Subject: and BEHAVIOR: a meta-proposal and a proposal
In-Reply-To: <199711160305.OAA25372@jawa.chilli.net.au>
Message-ID: <3.0.1.16.19971116114140.2d8f7cae@pop3.demon.co.uk>
At 14:09 16/11/97 +1100, Rick Jelliffe wrote:
>
>
>> From: Peter Murray-Rust
>
>>
>> That two attribute values for XML-LINK's BEHAVIOR attribute be recognised
>> through an XDEV PI:
>> BEHAVIOR="DisplayStandAlone"
>> BEHAVIOR="DisplayInContext"
>> That for the second option an additional attribute CONTEXTREF is required,
>> whose value is a valid URL and points to the XML element providing the
>> display context of the current element.
>> The actual details of display are application (and possibly stylesheet)
>> dependent.
>>
>
>Another approach might be to use the name prefix XDEV: on attribute
>values, e.g.
>
> BEHAVIOUR="XDEV:DisplayStandAlone"
I hadn't thought of these possibilities, thanks Rick.
This one is fine and legal, but requires the processor (all friendly
processors) to look for namespaces in attribute values. Since attributes
can have colons for many other reasons I suspect this approach will cause
problems. For example:
WAKE-UP-TIME="12:00"
>
>and the contextref attribute you suggest, e.g.
>
> BEHAVIOUR="XDEV:DisplayInContext"
> XDEV:CONTEXTREF="saltmines.xml#DESCENDANT(1,ORGCHART)"
This relies on the namespace proposal being adopted for attribute names. I
don't know where that has got to, and its probably confidential to XML-SIG.
I can see its attraction in cases like this.
The namespace allows the pre-colon prefix to be mapped to a schema file,
which could - in turn - contain a reference to the XML-DEV posting(s).
Something like XML:BEHAVIOR would *not* be a good idea because it would be
a different attribute from BEHAVIOR. So if only the BEHAVIOR attribute were
altered (BEHAVIOR="XDEV:BLINK") there would be no formal method of picking
it up.
I cannot remember whether PIs can be linked to schema files, i.e. something
like:
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From lauren at sqwest.bc.ca Sun Nov 16 18:48:36 1997
From: lauren at sqwest.bc.ca (Lauren Wood)
Date: Mon Jun 7 16:58:54 2004
Subject: Query Languages for XML
In-Reply-To: <346E1234.D549A734@technologist.com>
Message-ID:
> From: Paul Prescod
> Anyhow, I think that the DOM allows updates, so if you use DOM
> functions as your "query language" and the DOM model as your grove,
> then you will have a read/write document query language.
And the DOM group itself does not wish to use a different syntax for
the generalized queries if we can find one already developed that
meets our needs. So we will be looking at XLL Xpointer syntax as
well as whatever XSL does, and watching what the RDF group
chooses since they are also talking about query interfaces. I
personally would be happy if everyone could use the same syntax, as
long as it meets the DOM needs.
cheers,
Lauren
--
Lauren Wood, SoftQuad, Inc
Chair, W3C DOM Activity
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Sun Nov 16 19:35:59 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:54 2004
Subject: Query Languages for XML
In-Reply-To: <346C90F2.A4566219@isogen.com>
References:
Message-ID: <3.0.3.32.19971116132158.00953c30@pop.access.digex.net>
At 09:57 AM 11/14/1997 -0800, you wrote:
W. Eliot Kimber wrote:
>[...] Any new language
>would, I think, mostly be an exercise in syntax definition, most of
>which is already inherent in the development of XSL (which is nothing
>more than a language for applying processes to the results of queries on
>XML groves).
The language would also consist of semantics -- a standard
interpretation of the syntax. Some application will have to
make sense of the query based on its semantics. The syntax
and the semantics of that syntax would have to be standardized
so that the applications that are developed to interpret the
queries all interpret the queries identically.
XSL may be thought of as a query-only tool or as a translation
tool. As Richard Light explained, it doesn't provide a mechanism
for modifying the originally queried XML document. XSL could be
used to convert portions (or all) of an XML document from one
XML representation to another, provided that flow object types
were available for all element types in the XML document.
However, there is no standard mechanism in place to update the
original XML document. You might use XSL to create a replacement
document and then upload the replacement, but this is not
conducive to having many users concurrently querying and updating
the document (you'd have to lock the whole document).
>From another point of view, it's not possible to have *an* XML query
>language because there are too many different ways that you might want
>to access XML data: as nodes in groves ala SDQL, as full text using some
>full-text index, as semantic-specific objects using some domain-specific
>query mechanism, etc.
I agree that there will be many different ways to query a document
and that it is not possible to anticipate them all in advance. One
(read-only) query might be analogous to XSL's pattern rules, which
queries based on the physical structure of the document. Another
query might be analogous to an AltaVista-style word-based search.
Still another might operate by traversing XML's linking facilities
or by tranversing RDF's associations.
I think this is where XML's extensibility comes into play. We
would define a standard query language for the most common querying
activities, such as those in XSL patterns (XSL patterns might be the
basis of the language). If a user wishes to query an engine that
handles extensions, it is likely that the user will want to mix
standard query operations with the extended queries. Each query
(even each extended query) is likely to return a result set that
takes the form of an XML document (a virtual one). Such result sets
would be ameniable to additional querying via the XML-Query standard
(perhaps all within the same complex query statement).
Furthermore, every XML query engine would be able to parse every
query. Each would be able to identify the constructs that are not
available to the engine. There might be a way for the engine to
delegate the queries or operations associated with those constructs
to other (perhaps specialized) query engines. If not, the engine
could return a meaningful error message to the user (e.g. "Element
type 'full-word-index' not supported.")
>A language like SDQL coupled with an XML property set (that is, the
>subset of the SGML property set needed to represent XML documents)
>provides a complete set of operations for querying XML documents
>represented as groves. [...]
I don't know a thing about SDQL, and I'm having trouble finding
useful material on the net. Could someone please point me to
something that might be accessible to someone having no DSSSL
experience and having only a very rudimentary knowledge of LISP?
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Sun Nov 16 19:36:14 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:54 2004
Subject: Query Languages for XML
In-Reply-To: <346DFFA1.8C0F1ECD@technologist.com>
References:
Message-ID: <3.0.3.32.19971116143342.0095e950@pop.access.digex.net>
Paul Prescod wrote:
>[...] As I understand it, you can create a
>"HTML form element" flow object and an "HTML input element" flow object
>within it. [...]
>DSSSL has no provisions for adding flow object types in DSSSL code. So
>we are essentially talking about the DSSSL implementation language (Java
>or C, probably) [...]
I'm not sure that this approach addresses the need to have a standard
mechanism by which (server-side) XML documents are updated. We'd simply
be relegating the standard to being defined by OMG IDL interfaces, as is
done in DOM. We could rely on (future) DOM-defined query mechanisms,
except that the DOM approach does not provide the kind of flexibility
that an XML-based language would provide. For example, in the DOM
approach, our queries must be programs, whether they are written in Java,
C, C++, VB, or some script language. But then we'd wish we had defined
the script language. User's wouldn't have to learn a different language
for generating queries on each platform, and the queries themselves
would be transportable between platforms.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Sun Nov 16 19:36:33 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
In-Reply-To: <346E1234.D549A734@technologist.com>
References:
Message-ID: <3.0.3.32.19971116143400.0095e210@pop.access.digex.net>
At 04:20 PM 11/15/1997 -0500, you wrote:
>[...]
>> I think the question being asked is whether you could make an input-text
>> flow object which had a clearly defined semantics in altering your XML
>> grove, not your flow object tree. For this, you would need an abstraction
>> for the form submission/editing cycle, and such SDQL primitives as richard
>> was mentionning.
>
>Only if you take the approach that DSSSL code must manage the form
>interactivity process. I don't see why it must. It seems simplest to
>methat it should do the moral equivalent of "put a button here" and
>leave the processing of the button click to JavaScript, Java or C++.
A standard, platform-independent query language has its merits:
(1) Many people use SQL without knowing a thing about programming. It's
easier to learn a tiny language than to have to learn a big language in
order to make use of a tiny library. SQL is very useful as a filter-
specification language. This allows database administrators to manage
a database by specifying complex filters that the database tool uses
to select the elements to process. The user does not have to know the
language in which processes are defined. Imagine trying to manage a
database by having to write a different program (or plug-in) for each
query operation you wished to perform.
(2) If the query language were defined as APIs (interfaces or IDLs) for
use in an existing programming language, a person versed in manipulating
a database from one language may find his skills of less value when he's
asked to manipulate a database in a language he does not yet know. An
administrator's skills (or DB developer's skills) are much more valuable
if they are directly usable in many different environments.
(3) If a query must be expressed in a particular programming language,
that query will not be directly usable in other programming languages
or other environments. It is very likely that the query would have to
be embedded in a plug-in module (or COM component or JavaBean), and
that module will not be directly usable in any other environment --
perhaps not even outside the original application for which it was
intended. If a standard language were used, applications could share
queries, queries could be stored away for future retrieval, and users
could share each other's queries just by handing each other files.
>[...] The SQL model (I'm not familiar with OQL) is that a host
>language (COBOL, PowerScript, JavaScript, Java, whatever) handles the
>interactivity and issues data model update instructions. SQL does not
>handle the user interface itself.
OQL looks very much like SQL, except that it has extensions for
accessing object-oriented databases, and except that it throws out
the non-object-oriented update mechanisms of SQL. It still uses the
SELECT ... FROM ... WHERE syntax. However, both SELECT statements
and object methods can return result sets that can be further
operated on. OQL does not have the UPDATE or INSERT statements.
To perform equivalent actions you must use methods on objects. Such
objects might be individuals or the objects in collections retrieved
via the query semantics.
>In other words, the vast majority of forms will have nothing to do with
>the document grove itself. They may be forms designed to talk to
>relational databases or object databases or CGI or whatever. We can
>create these forms immediately, without touching SDQL. Yes, it would be
>cool if SDQL allowed grove updates, and of course we expect that if it
>did, you would be able to call it from the code that handles your
>button, just as you could call SQL or OQL etc.
I think there is a whole class of applications that could arise from
being able to manage XML documents from clients. Consider knowledge
repositories that retain data in a semantic form (in XML). Users
could perform semantic-based queries and updates and all participate
together in generating a semantic model and information warehouse.
It may be that existing applications won't have much use for the
kind of query language I'm proposing -- I'm looking toward the future.
>Anyhow, I think that the DOM allows updates, so if you use DOM functions
>as your "query language" and the DOM model as your grove, then you will
>have a read/write document query language.
This is a very significant point. I expect that DOM will define
query operations on its objects, so that via IDLs, programs will be
able to remotely manage persistent XML databases. However, for
reasons I've given in other posts, I think an XML-based query
language is necessary. The form of that query language might
mirror the form defined by DOM, but the query language will
necessarily provide constructs not named by DOM. DOM assumes the
existence of a Turing-complete programming language. Just as SQL
has, we would need to have mechanisms for piping filters through
each other and for performing operations on the result sets.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sun Nov 16 22:07:39 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:58:55 2004
Subject: XML and case-folding
Message-ID: <199711162207.RAA00749@unready.microstar.com>
I remember some postings recently wondering about the implications of
making elements (etc.) case-sensitive in XML. I remember only Tim's
followup about IDs -- apologies if I'm going over well-worn ground
here.
As I understand it, if you set both NAMECASE GENERAL and NAMECASE
ENTITY to "NO" in full SGML, then there will be no case substitution
anywhere. Since XML is an SGML application profile, that means that
you may use
, , , and
but NOT
, , , and
or
, , , and
Furthermore, all element type names, attribute names, notation names,
entity names, _and_ attribute values (of any type) are also case
sensitive. As a result, if you had this in your XML DTD:
and this in your XML document:
the parser should report an error. It also means that something like
this is legal (though pathologically wierd):
The contents of processing instructions are never subject to case
substitution anyway, though the validation of their contents is also
mostly beyond (full) SGML's mandate; for consistency, however, it
would make sense to require everything there to be in upper-case as
well. In other words,
would be acceptable, but not
Any comment on this last point?
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mtbryan at sgml.u-net.com Mon Nov 17 08:50:46 1997
From: mtbryan at sgml.u-net.com (Martin Bryan)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
Message-ID:
Joe Lapp wrote:
>You might use XSL to create a replacement
>document and then upload the replacement, but this is not
>conducive to having many users concurrently querying and updating
>the document (you'd have to lock the whole document).
There is no need to lock the whole document, just that part of the document
that consititutes an updatable record for the database it is being used to
update or being updated from. Such a record could consist of a a number of
contiguous fields, a set of discrete fields taken from appropriate parts of
a document, or even a single field. There in no reason why fields not likely
to be affected by change need to be locked in any way.
Martin Bryan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Mon Nov 17 11:17:25 1997
From: mecom-gmbh at mixx.de (james anderson)
Date: Mon Jun 7 16:58:55 2004
Subject: what's the reason for mixed data?
Message-ID: <199711171117.MAA19602@hermes.mixx.net>
greetings,
we're trying to understand why the XML spec (wrt
"http://www.w3.org/TR/WD-xml-970807.html")
specifies a special status for elements which contain mixed data. to make
it specific, why is
[43] cp ::= (Name | choice | seq) ('?' | '*' | '+')?
not
[43] cp ::= (#PCDATA | CDATA | Name | choice | seq) ('?' | '*' | '+')?
?
what's the reason to specify a form (mixed data) which must
permit repetition and
arbitrary order as soon as PCDATA is allowed?
to give an example of the problem, assume the following CLOS declarations:
(defClass class-1 ()
((slot-1 :type string)))
(defClass class-2 ()
((slot-2 :type (or string class-1))))
how would this be declared?
makes sense, but would seem to be disallowed by
[50] Mixed ::= '(' S? %( %'#PCDATA' (S? '|' S? %Mtoks)* ) S? ')*'
| '(' S? %('#PCDATA') S? ')'
which would appear to stipulate the repetition as soon as elements and
PCDATA appear together. on the other hand,
would not be a correct translation, since that is the equivalent of
(defClass class-2 ()
((slot-2 :type (list (or string class-1)))))
can anyone explain the ')*' requirement in [50]?
thanks, james.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Mon Nov 17 14:45:05 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
In-Reply-To:
Message-ID: <3.0.3.32.19971117094533.0094fce0@pop.access.digex.net>
"Martin Bryan" wrote:
>Joe Lapp wrote:
>>You might use XSL to create a replacement
>>document and then upload the replacement, but this is not
>>conducive to having many users concurrently querying and updating
>>the document (you'd have to lock the whole document).
>
>There is no need to lock the whole document, just that part of the document
>that consititutes an updatable record for the database it is being used to
>update or being updated from. Such a record could consist of a a number of
>contiguous fields, a set of discrete fields taken from appropriate parts of
>a document, or even a single field. There in no reason why fields not likely
>to be affected by change need to be locked in any way.
I agree that under the appropriate circumstances you wouldn't have
to lock the whole document. However, were you to do the trick with
what is currently XSL, it seems to me that you would have to create
a _replacement_ document and then replace the original document.
If in the time between reading the original and generating the
replacement another user reads the original, and if you the other
user posts his replacement after you post your replacement, then
your changes do not take.
Or maybe you are suggesting there is no need to replace the whole
document using an XSL approach. XSL or some other XML standard would
need to define a standard mechanism for identifying and modifying a
portion of a document. I am aware of some sort of 'chunking'
initiative, but I don't know exactly what the scope of the effort is.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mtbryan at sgml.u-net.com Mon Nov 17 16:06:07 1997
From: mtbryan at sgml.u-net.com (Martin Bryan)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
Message-ID:
Joe Lapp wrote:
>I agree that under the appropriate circumstances you wouldn't have
>to lock the whole document. However, were you to do the trick with
>what is currently XSL, it seems to me that you would have to create
>a _replacement_ document and then replace the original document.
This presumes that the "document" is the thing you want to remove. What if:
a) the document was built from a set of entities?
b) only part of the document consisted of updatable data fields?
The key factor is "what proportion of the data needs to be modified?"
>If in the time between reading the original and generating the
>replacement another user reads the original, and if you the other
>user posts his replacement after you post your replacement, then
>your changes do not take.
Always a problem with databases, but fields that are "temporarily locked"
can always be assigned an attribute that the presentation software can use
to indicate that the data is in a state of flux to read-only users of the
data during the update period.
>Or maybe you are suggesting there is no need to replace the whole
>document using an XSL approach. XSL or some other XML standard would
>need to define a standard mechanism for identifying and modifying a
>portion of a document.
The XML/EDI crew will be looking into this problem as it is key to running
an electronic business using XML.
> I am aware of some sort of 'chunking'
>initiative, but I don't know exactly what the scope of the effort is.
Why not join the XML/EDI research teams (see http://www.xmledi.net for
details)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From simeons at allaire.com Mon Nov 17 16:07:51 1997
From: simeons at allaire.com (Simeon Simeonov)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
Message-ID: <01bcf373$8dcc8230$4a15b5cd@sim.allaire.com>
Joe Lapp wrote:
>This is a very significant point. I expect that DOM will define
>query operations on its objects, so that via IDLs, programs will be
>able to remotely manage persistent XML databases. However, for
>reasons I've given in other posts, I think an XML-based query
>language is necessary. The form of that query language might
>mirror the form defined by DOM, but the query language will
>necessarily provide constructs not named by DOM. DOM assumes the
>existence of a Turing-complete programming language. Just as SQL
>has, we would need to have mechanisms for piping filters through
>each other and for performing operations on the result sets.
The simpler operations of an XML-based query language can have profound
impact on the usability of data on the Web. The high demand for web
applications is drawing individuals with little to no programming experience
to web development. They may find it quite difficult to write a script-based
traversal algorithm using the XML DOM to extract some piece of information
from a document. However, experience from the client-server world tells us
that most people can easily learn how to formulate simple SELECT statements
in SQL.
I would speculate that, if a standard does not emerge by the next browser
releases, vendors will move to provide their own query mechanisms. Why?
Because they would like to make the consumption of arbitrary XML from within
HTML as easy as possible. (IE4 DSOs are a move in the right direction.)
Simeon Simeonov
Allaire
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Mon Nov 17 17:16:39 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
In-Reply-To:
Message-ID: <3.0.3.32.19971117121706.0094fc90@pop.access.digex.net>
"Martin Bryan" wrote:
>This presumes that the "document" is the thing you want to remove. What if:
>
>a) the document was built from a set of entities?
>b) only part of the document consisted of updatable data fields?
>
>The key factor is "what proportion of the data needs to be modified?"
I think I've just discovered that we are both arguing for the same
thing. My point is exactly that the _document_ is not the smallest
unit we care to change. I just meant to point out that because we
care for finer granularity, and because currently no standard exists
for updating at arbitrary granularity, we need a standard. I was
giving an example with XSL only to demonstrate that XSL does not
itself provide us with a way to work at that granularity. Currently,
using XSL alone, we'd be replacing the entire document -- which is
exactly what we _do_not_ want to do.
>The XML/EDI crew will be looking into this problem as it is key to running
>an electronic business using XML.
I am preparing a report on what I believe is the fundamental
issue, and I hope to post it before the day is out.
>Why not join the XML/EDI research teams (see http://www.xmledi.net for
>details)
I'll look into it as soon as I finish this report.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Mon Nov 17 17:28:35 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:58:55 2004
Subject: and BEHAVIOR: a meta-proposal and a proposal
In-Reply-To: <3.0.1.16.19971115184507.1fafd4be@pop3.demon.co.uk>
References: <3.0.5.32.19971114094153.00917d30@pop.access.digex.net>
Message-ID:
I want to respond to the "meta-proposal" a bit, because I disagree with
some of the axioms, and the proposed procedures. I don't have time or
energy right now to respond to the specific proposal, though I may well do
so later (based on my own, somewhat divergent, axioms).
At 6:45 PM -0000 11/15/97, Peter Murray-Rust wrote:
>
> In any of these cases there is no general solution acceptable to
>everyone
>
> If no attempt is made to address these problems we shall either
>end up with a Babel of incompatible solutions, or wait feebly for some
>powerful autonomous entities to dictate a limited set of actions.
>
Not necessarily. In fact, for many problems the correct response is to
ensure that the stylesheet ans processing specification langauges can
_implement_ each of the specific solutions desired, _without_ forcing the
specific solutions on whihc divergence of opinion may exist. More on this
with the "PI" axioms.
> We have to be careful to avoid the 'only processable with
>software X' syndrome
Yes. The way to do this is to _avoid_ PIs as much as possible. PIs that are
required to interpret a document correctly are _inherently_
anti-portability, since the rule for PIs is that _any application_ should
be free to ignore them without changin the meaning of the document. The use
of SGML's PI syntax in XML is a not a good model for the use of PIs in
general, since they are being used in XML as a syntactic "escape hatch" for
compatibility with SGML. It would not be necessary (or desirable) if XML
were not (to some very small extent) changing SGML facilitied (as with
specifying the character encoding of entitites in PIs, rather than an SGML
declaration).
If XML had been able to add declarations to SGML, that would have been done
instead of using the PI syntax.
> There is a critical mass of readers of this list who feel the
>need to address the problem.
Without a problem statement I'm not sure how to judge this, but it may well
be true.
> Anyone can use any PIs they like in their documents for whatever
>purposes they like without breaking the spirit of XML.
This is assuredly incorrect. PIs are intended for use in the case where a
practical _use_ of a document with _particular software_ requires
additional information that _should not_ have been indicated ina structural
description of the content. A paradigmatic example is the occasional need
to insert a page or column break in order to get acceptable formatting in a
particular processing situation (including: software, stylesheet, output
device). This is not information that _should_ be encoded in the abstract
representation of a document, but _may be essential_ for "getting the thing
to print right".
> That processing software need not (and so far won't) take any
>notice of these (or perhaps any) PIs
>
This is certainly essential. If you are saying something about you document
that you can imagine being useful to some software that you aren't using
right now -- then it should probably be in the markup. PIs are for things
that can be ignored without changing the interpretation of a document.
> If a few people find a way of doing something that works for
>them, and isn't against the spirit of the XML specs, then flaming their
>ideas is pointless.
Even this is not necessarily true -- attacking the dissemination of false
or bad ideas is _never_ pointless, in that dissemination of bad information
(even if it serves a local porpose adequately well) can seriously mislead
people. For instance the use of styles in word-processing programs is
usually a very good idea. The fact that in some instances direct formatting
may work out, or even work better, should not stop people from quarreling
with public assertions about the utility of stylesheets based on those
situations.
To the extent that these axioms seem to be intended to rule out
disagreement of the merits of future proposals, I must take immediate and
strong exception to them. It's not possible for a responsible discussant
who disagrees with a public proposal of working practice to remain silent
on the topic. "Flaming" is usually not responsible discussion, but
principled disagreements should be expressed so that the issues are clear
to all.
>
>The proposal I really want to address is, like Month Python's joke,
>so potentially dangerous that I dare not reveal it yet. The proposal here
>is also important to me - perhaps to others - and I hope servers as a
>useful example. It is NOT in a finalised form, but as can be seen from the
>meta-proposal, there is a method for referring to the a 'pseudo-final' form
>that is, at least, usable.
>
This makes me nervous
>
>That a PI of the form is 'reserved' by members of this list for
>PI-based proposals on this list. [We cannot use XML-DEV as 'XML' is rightly
>reserved.]
We can certainly do this -- but as I said above, there are good reasons to
oppose the use of PIs for _any_ use that affects the semantics of
documents.
For example, even the proposed namespace PI would be vulnerable on this
account, except for the facts that:
1. It's intended for use in _experiment_ with a proposed _extension_ of
XML. (In other words, the PI, should it be generally accepted for use with
all interested XML applications, would become part of XML).
2. The prefix can be processed (and thus, the semantic information
accessed) _without_ software having to be aware of the namespace PI. In
other words, the PI can be treated as equivalent to a comment describing
the proposed intent of the tags that share a prefix. (In other words, you
can ignore the namespace PI, and still detect the semantic distinctions in
the document)
>
>That anyone can post a proposal to this list for the use of this PI.
Anyone can post anything anywhere.
>That any author can include an instance of such a proposed PI in their
>document.
Again, any author can put anything they want anywhere, good idea or not.
>That any writer of application software can write software to process such
>a PI.
Again, how could anyone stop them?
>That both of these should refer to an appropriate URL on this list's
>archive discussing outlining the use of this PI.
Certainly not a bad idea..
>That if someone doesn't approve of a proposal they ignore it rather than
>flaming it. The fittest ideas will survive.
In the long run this may (or for a number of reasons may not) be true.
However, bad ideas that are initially plausible but unworkable in the long
term (e.g., from a related, but different doamin, the creation and
management of large structured information cropora in raw HTML) would get
an artificial (and community-harmful) boost if an effective social
convention forbidding disagreement were in effect.
I agree that polite, reasoned disagreement is better than flaming
(impolite, ad-hominem disagreement) but in the intellectual world the unfit
perish faster under the lash of criticism.
>
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eliot at isogen.com Mon Nov 17 17:48:41 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
Message-ID: <3.0.32.19971117112909.00bc757c@swbell.net>
At 08:04 AM 11/15/97 +0000, Richard Light wrote:
>One important thing about "Standard Query Language" is that it doesn't
>just query. It is actually a complete language for "defining, accessing
>and otherwise managing relational databases".
In other words, SQL, in addition to enabling *queries* (that is, request
for information about tables) is *also* an editor scripting language where
the documents are relational tables.
The SGML/XML world view can be thought of as a place where there are two
fundamental types of activity: query and edit. A query is always
read-only. An edit results in a new document. This also suggests that
there is no fundamental difference between editors and document management
systems that manage abstractions of documents (like Crystal's Astorial or
Texcels Information Manager). In other words, a document management system
is just a very beefy editor with a poor user interface or editors are weak
document management systems with poor persistence but good interfaces.
Thus, SDQL is a "pure" query language in that it's only purpose is to
return the results of queries on the properties of nodes in groves.
However, the DSSSL transformation language can be thought of as an editing
scripting language because the result of applying a DSSSL transformation to
a document is a new document.
Note that it doesn't matter how the creation of the new document is
*implemented*. Whether you literally generate an entirely new grove from
scratch or simply add and remove nodes and properties from the one you
have, the result is the same: a new grove, which means a new document.
DSSSL simplifies its abstract processing model by making groves static *in
the abstract*. However, implementations are free to make groves dynamic
*under the covers*.
Remember also that unless you're talking about SED scripts or Perl hacks,
it's not meaningful to talk about operations on XML documents--it's only
meaningful to talk about operations on abstractions of XML documents, i.e.,
groves. This is why both the DSSSL and HyTime standards are defined in
terms of operations on groves, not operations on SGML documents.
If we define "editing" as the process by which the abstraction of a
document is modified and a new document is created (here using the term
"document" as it's defined by SGML and XML, that is, a character string
conforming to the syntax defined by the standard), then *any process* that
creates a new document is an editor. The only question then is whether or
not the editor is interactive or batch, which is really a question of user
interface, not functionality.
All editing languages must include a query language because you must be
able to examine the properties of the objects the editor is manipulating,
but I think that it is confusing to call an editing language a query
language just because SQL is incorrectly called a query language.
Or said another way: given a robust query mechanism, such as SDQL, it is
possible to create an infinite number of editing languages that provide the
appropriate interaction and convenience characteristics needed for a
particular editing application. When the tasks of querying and editing are
kept separate, it becomes clear that it is not necessary to bind them
together (although doing so may have advantages in some environments).
Thus, the argument that SDQL is insufficient for complete XML processing
and is thus not useful misses the point that what was asked for was not a
query language at all, but an editing scripting language, which SDQL is
not. However, SDQL could be of service to any number of scripting
languages by providing a ready-made syntax and set of semantics that can be
used directly.
I can easily imagine creating a simple set of DSSSL expression language
functions that provide the grove manipulation actions needed: delete node,
add node, set property, delete property. Implementing these would be easy
enough to do once you had code that managed groves (i.e., a DOM-based
read-write browser), which we have in both Netscape and IE4 and will likely
have in SGML/XML editors in the near future.
Cheers,
Eliot
--
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eliot at isogen.com Mon Nov 17 17:49:19 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
Message-ID: <3.0.32.19971117113804.00bd0010@swbell.net>
At 04:20 PM 11/15/97 -0500, Paul Prescod wrote:
>Graydon Hoare wrote:
>> What would it mean to take a form
>> flow object and render it through a TeX backend? The "interactive" nature
>> is gone. What happens to a combo-box?
>
>About the same as the printed rendition of a link or scroll flow object.
>It would be completely useless. Stylesheets are tied to a particular
>medium. Online stylesheets should have elements (link, input, scroll)
>that allow interactivity and print-oriented stylesheet languages should
>have elements that describe pages etc.
There are many very useful static representations of forms, not least of
which is to document the design thereof. My first exposure to SGML was
writing a process to generate printed specifications for an online
application of several 100 (if not thousands) of interactive panels, all
created in SGML using a now-defunct language IBM developed for use in OS/2
(it may still live in CICS, I'm not sure--it was also used there for a
while). Because the documents that defined the panels included references
to variables, described branching and control structures, and on so, I was
able to generate both pictures of the panels (using character-based
graphics, no less) and generate lots of information about the panels. By
doing this, we eliminated the need to do screen snaps to document the
panels, which we estimated saved a minimum of two calandar weeks per rev of
the spec (that being the amount of time it would take to make the snaps and
assemble the document).
Likewise, hyperlinks can be represented in print in any number of ways
(witness the SGML handbook). The interactivity of hyperlinks is not what
distinquishes them, it is the relationship they represent. There are many
ways to present and make useful such relationships, of which interactive
traversal is only one (and not necessarily the most useful).
Cheers,
E.
--
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eliot at isogen.com Mon Nov 17 18:08:36 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
Message-ID: <3.0.32.19971117115001.00bc8a74@swbell.net>
At 12:17 PM 11/17/97 -0500, Joe Lapp wrote:
>"Martin Bryan" wrote:
>>This presumes that the "document" is the thing you want to remove. What if:
>>
>>a) the document was built from a set of entities?
>>b) only part of the document consisted of updatable data fields?
>>
>>The key factor is "what proportion of the data needs to be modified?"
>I think I've just discovered that we are both arguing for the same
>thing. My point is exactly that the _document_ is not the smallest
>unit we care to change. I just meant to point out that because we
>care for finer granularity, and because currently no standard exists
>for updating at arbitrary granularity, we need a standard.
A standard *does* exist for defining the objects you might want to update:
the SGML property set (possibly reflected through the DOM). Given this
definition, defining operations on it is a simple matter of programming.
Or said another way, you don't need a standard for the control language
(although it's useful to have one) if you have a standard for the data
model to be controlled.
Cheers,
E.
--
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Mon Nov 17 19:04:04 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
In-Reply-To: <01bcf373$8dcc8230$4a15b5cd@sim.allaire.com>
Message-ID: <3.0.3.32.19971117140421.0095ab80@pop.access.digex.net>
"Simeon Simeonov" wrote:
>[...]
>I would speculate that, if a standard does not emerge by the next browser
>releases, vendors will move to provide their own query mechanisms. Why?
>Because they would like to make the consumption of arbitrary XML from within
>HTML as easy as possible. (IE4 DSOs are a move in the right direction.)
I think there is another reason why we wouldn't need to wait for the
DOM spec to complete. I'm preparing an argument for why DOM cannot be
extended to do the job given the way it is currently architected.
Look for my upcoming post on the subject (I'm not done yet).
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Mon Nov 17 21:13:02 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
References: <3.0.32.19971117115001.00bc8a74@swbell.net>
Message-ID: <3470B42D.3A1FB5A9@technologist.com>
W. Eliot Kimber wrote:
> Or said another way, you don't need a standard for the control language
> (although it's useful to have one) if you have a standard for the data
> model to be controlled.
I would think that there are major optimization benefits to having a
standard query language. Each database vendor can take a complete query
describing a node and choose the quickest way to find the node, vs.
passively waiting for each query component (e.g. get this
node-list....now reverse it...now find the first node of type
element....now check its GI etc. etc.).
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From bmhughes at ozemail.com.au Mon Nov 17 21:58:01 1997
From: bmhughes at ozemail.com.au (Baden Hughes)
Date: Mon Jun 7 16:58:55 2004
Subject: ot.xml
Message-ID: <3.0.1.32.19971118083440.006cc664@ozemail.com.au>
Can someone tell me where I can pick up ot.xml ?
Thanks
Baden
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Mon Nov 17 22:00:42 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
Message-ID: <1.5.4.32.19971117220012.00a49368@pop.mindspring.com>
I'm not sure whether I understand the range of things that will be queried.
I would think that we would want to be able to do queries of at least the
following kinds:
1. Queries of non-markup data, with the goal of creating mark-up from
databases, e.g. relational or object-oriented databases. The end result is
to return a grove, but there's a lot that has to be defined in-between.
2. Full-text searches which return groves.
3. Structured document queries which return groves. There's an interesting
discussion of queries that need to be supported here:
"http://www.ceth.rutgers.edu/programs/TEI97/SESSIONS/GREGORY/search.sgm.html"
Standard database query languages like OQL and SQL are not very useful for
queries of type 3 unless we know the actual names of the data structures
used in a particular implementation. For instance, in a relational database,
what are the names of the tables and columns that must be used to create a
query for a given document structure?
Standard database query languages like OQL and SQL do not have full-text
search operators to allow them to do queries of type 2, though some people
have defined full-text operators as extensions of such languages. When it
comes to the return type for such a query, we have the same problem
mentioned in the previous paragraph.
I don't know much about SDQL. It is part of the DSSSL standard - is it
scheme based? Is it procedural? Is it based on SGML/XML document structure?
Can it be used for queries of types 1 and 2?
Jonathan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eliot at isogen.com Mon Nov 17 22:40:57 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 16:58:55 2004
Subject: Query Languages for XML
Message-ID: <3.0.32.19971117163812.00bd28c0@swbell.net>
At 05:00 PM 11/17/97 -0500, Jonathan Robie wrote:
>I don't know much about SDQL. It is part of the DSSSL standard - is it
>scheme based? Is it procedural? Is it based on SGML/XML document structure?
>Can it be used for queries of types 1 and 2?
SDQL is simply that part of the larger DSSSL expression language that
enables the accessing of properties of nodes in groves and the navigation
of groves. It uses the same syntax as the rest of DSSSL, that is a Scheme
variant. It is based on the basic grove data model (nodes and their
properties) but has some built-in functions related to SGML (e.g., "gi",
"att-string", etc.). All the built-in functions are or can be defined in
terms of primitives (e.g., node-property). It includes some basic
string-matching functions but does not attempt to provide any sort of
complete full-text facility (which would be outside the stated scope of
DSSSL in any case).
Note, however, that the syntax is largely arbitrary: what's important are
the semantics of grove access. Thus, you can expect XSL to include the
functional equivalent (more or less) of SDQL even though it may provide an
alternative syntax.
Cheers,
E.
--
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From bmhughes at ozemail.com.au Tue Nov 18 00:48:28 1997
From: bmhughes at ozemail.com.au (Baden Hughes)
Date: Mon Jun 7 16:58:56 2004
Subject: ot.xml
In-Reply-To: <199711172204.OAA01491@mehitabel.eng.sun.com>
Message-ID: <3.0.1.32.19971118114608.006ebf54@ozemail.com.au>
>I moved a copy over into a directory where you can get it:
...
>Perhaps someone can mirror a copy down in your part of the world.
Thanks for Murray Altheim for his reply ...
For those interested, the ot.xml file is now also online at:
http://fdnet.com.au/bmhughes/otxml.zip
Baden
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Jon.Bosak at eng.Sun.COM Tue Nov 18 01:37:37 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun 7 16:58:56 2004
Subject: ot.xml
In-Reply-To: <3.0.1.32.19971118114608.006ebf54@ozemail.com.au> (message from Baden Hughes on Tue, 18 Nov 1997 11:46:08 +1100)
Message-ID: <199711180136.RAA14414@boethius.eng.sun.com>
[Baden Hughes:]
| >I moved a copy over into a directory where you can get it:
| ...
| >Perhaps someone can mirror a copy down in your part of the world.
|
| Thanks for Murray Altheim for his reply ...
|
| For those interested, the ot.xml file is now also online at:
|
| http://fdnet.com.au/bmhughes/otxml.zip
I released that file into the world a long time ago, so I have no
legal claim over it, but as a courtesy I would appreciate it if people
would keep the set together. I went to the trouble of marking up the
Old Testament, the New Testament, the Book of Mormon, and the Quran
because I did not wish to be associated with a project that preferred
the scriptures of any particular religion over those of any other. At
the time (1992) I could not find any other scriptures in electronic
form, or I would have included them as well. I still feel this way.
If you think that I have contributed something useful, you would be
doing me a favor if you distributed only the entire set, which can be
found at
http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/religion.1.02.xml.zip
along with its mate,
http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/shakespeare.1.02.xml.zip
Thanks.
Jon
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From bmhughes at ozemail.com.au Tue Nov 18 04:57:29 1997
From: bmhughes at ozemail.com.au (Baden Hughes)
Date: Mon Jun 7 16:58:56 2004
Subject: [2] ot.xml
In-Reply-To: <199711180136.RAA14414@boethius.eng.sun.com>
References: <3.0.1.32.19971118114608.006ebf54@ozemail.com.au>
Message-ID: <3.0.1.32.19971118142939.00690024@ozemail.com.au>
Thanks to Jon for his contribution with regard to text markup and his
recent followup note ...
As per Jon's request, the entire file set of religion.1.02.xml.zip can be
found at:
http://fdnet.com.au/bmhughes/religion.1.02.xml.zip
Baden
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at light.demon.co.uk Tue Nov 18 08:18:02 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun 7 16:58:56 2004
Subject: Data manipulation languages for XML (was Query Languages ...)
In-Reply-To: <3.0.32.19971117115001.00bc8a74@swbell.net>
Message-ID:
In message <3.0.32.19971117115001.00bc8a74@swbell.net>, "W. Eliot
Kimber" writes
>>[Joe Lapp:]
>>I think I've just discovered that we are both arguing for the same
>>thing. My point is exactly that the _document_ is not the smallest
>>unit we care to change. I just meant to point out that because we
>>care for finer granularity, and because currently no standard exists
>>for updating at arbitrary granularity, we need a standard.
>
>A standard *does* exist for defining the objects you might want to update:
>the SGML property set (possibly reflected through the DOM). Given this
>definition, defining operations on it is a simple matter of programming.
>
>Or said another way, you don't need a standard for the control language
>(although it's useful to have one) if you have a standard for the data
>model to be controlled.
Said another way again: since we have a good, conceptually clear
standard for describing the objects we want to update, we are well-
placed to 'go the extra mile' and define a standard for updating those
objects.
May we return to SQL, as a precedent for the type of language Joe was
originally asking about? SQL's primary purpose is to support the use
and updating, of distributed database information, by multiple users, in
real time. Surely that is a reasonable expectation for XML information,
too?
If so, we need mechanisms to specify changes to existing documents. I
don't really buy the model that says that every change to an XML
document produces a completely new document. You will certainly have a
hard time selling that idea to an end-user who changes one word in a
document, or to a database vendor who has to take back the complete
document and work out for themselves what (if anything) has changed, in
order to update the relevant nodes.
Also, in the real world you need access control (c.f. GRANT in SQL).
The very nature of XML documents means that this control needs to be at
the node rather than the document level, if only to deal with entities.
Also, you need to know which parts of the document you are allowed to
change as you start editing - it is not good enough to be told some time
afterwards that certain changes should not have been made!
I agree that you can perfectly well define changes to an XML document
via its representation as a grove, but this grove needs to be linked
back to the physical objects that gave rise to it. For example, if you
edit a phrase that happens to be within an entity that is referenced
more than once within the document you are editing, then perform an
UPDATE, in principle _all_ references to that entity should be updated.
Richard Light.
Richard Light
SGML/XML and Museum Information Consultancy
richard@light.demon.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Tue Nov 18 11:44:11 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:58:56 2004
Subject: Query Languages for XML
References: <3.0.3.32.19971116143342.0095e950@pop.access.digex.net>
Message-ID: <3471801E.6F148BD6@technologist.com>
Joe Lapp wrote:
>
> Paul Prescod wrote:
> >[...] As I understand it, you can create a
> >"HTML form element" flow object and an "HTML input element" flow object
> >within it. [...]
> >DSSSL has no provisions for adding flow object types in DSSSL code. So
> >we are essentially talking about the DSSSL implementation language (Java
> >or C, probably) [...]
>
> I'm not sure that this approach addresses the need to have a standard
> mechanism by which (server-side) XML documents are updated.
It certainly does not. It wasn't intended to. My point was merely that
there is no real relationship between the need to be able to make form
elements and other interactive elements (hyperlinks, collapsable trees)
and the need to be able to make modifications to a document through a
query language. They are both good ideas -- they are just not
necessarily related ideas. We already have forms, and it doesn't require
an updatable SDQL.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Tue Nov 18 14:55:55 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:56 2004
Subject: Data manipulation languages for XML (was Query Languages
...)
In-Reply-To:
References: <3.0.32.19971117115001.00bc8a74@swbell.net>
Message-ID: <3.0.3.32.19971118095544.00955100@pop.access.digex.net>
Richard Light suggests that we use the term "Data Manipulation
Language" when talking about this query/edit language in order
to avoid further confusion.
In the computer security industry ("rainbow series" books), we
use the term "access" to denote any kind of interaction with
information objects. For example, we say, "Read access" or
"Write access." The term "DAC" ("Discretionary Access Control")
uses "access" in this sense to describe the security policy
that may be in place to protect information objects.
I like the term "Data Access Language" or just "Access Language"
a bit more. This is partly because of my security background
and partly because it is quite shorter. Besides, in my mind the
term "manipulation" conjures images of editing and not querying.
Just a suggestion.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eliot at isogen.com Tue Nov 18 15:22:16 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 16:58:56 2004
Subject: Data manipulation languages for XML (was Query Languages
...)
Message-ID: <3.0.32.19971118091934.009f7260@swbell.net>
At 08:14 AM 11/18/97 +0000, Richard Light wrote:
>If so, we need mechanisms to specify changes to existing documents. I
>don't really buy the model that says that every change to an XML
>document produces a completely new document. You will certainly have a
>hard time selling that idea to an end-user who changes one word in a
>document, or to a database vendor who has to take back the complete
>document and work out for themselves what (if anything) has changed, in
>order to update the relevant nodes.
I think you're misunderstanding my use of the term "document" and my
reference to the *abstract* processing model of DSSSL and groves, as
opposed to how an implementation might work or how a user might perceive
the result.
By "document" I mean what SGML and XML mean by document: a character string
conforming to the rules of the standard. Identity for documents is defined
by no differences in the character string. If I change one character *I
have a new document*. However, when using the term "document" to mean "an
abstraction of a container for information", which is the usual everyday
meaning of "document", then the document is not a new document, unless the
user considers it to be one.
Note the difference: I'm talking about the mechanics of data manipulation
as related to the formal definition of SGML and XML, users are thinking
about the abstractions of information creation. These are two different
domains.
For the purpose of thinking about standards for defining document
processing, it is a very useful simplification to think of every change as
creating a new *grove* (which, if used to generate an SGML or XML character
string, would result in a new SGML or XML document). Obviously, in an
implementation, you would probably not literally create an entirely new
grove, but would simply modify the one you have and, presumably, remember
the actions that transformed grove[0] to grove[1]. But that implementation
approach doesn't change the truth of the abstract model, which is that
grove[1] *is a different grove* from grove[0]. That's all I'm getting at.
>Also, in the real world you need access control (c.f. GRANT in SQL).
>The very nature of XML documents means that this control needs to be at
>the node rather than the document level, if only to deal with entities.
Not a problem. Remember that we're talking about *editing* here, which
*can only happen* on groves, which consist of nodes, which can therefore be
individually locked if your editor provides that function. There is
nothing in the definition of groves or the DSSSL expression language that
precludes node-level access control within an editor. That's an editing
issue, which is outside the scope of SGML, XML Lang, or DSSSL (as they are
only data representation languages, not editor specifications).
>Also, you need to know which parts of the document you are allowed to
>change as you start editing - it is not good enough to be told some time
>afterwards that certain changes should not have been made!
Again, not a problem as long as your editor provides some system for
associating access policies with nodes, either directly (by addressing
individual nodes) or by algorithm (e.g., elements in context). Again, this
is an editor design issue, not a data representation issue.
>I agree that you can perfectly well define changes to an XML document
>via its representation as a grove, but this grove needs to be linked
>back to the physical objects that gave rise to it. For example, if you
>edit a phrase that happens to be within an entity that is referenced
>more than once within the document you are editing, then perform an
>UPDATE, in principle _all_ references to that entity should be updated.
What's your point? A grove that includes information about the text
entities used to organize it has enough information to correlate references
to entities to their content. How could it be otherwise? A grove has to
enable *complete* representation of the original document. In a complete
grove (one that includes all the properties defined in the property set),
the original document can be recreated byte for byte because the original
document string is stored as part of the grove (using the so-called
"markup" properties).
I'm afraid I don't see how using groves as the fundamental abstraction for
editing is inconsistent with satisfaction of any of the requirements. All
that's needed on top of what DSSSL provides are functions that represent
the editing actions needed (as opposed to modeling editing as a transform,
which is probably not a useful approach). If SQL provides a useful model
for defining such functions, we should use it.
Cheers,
Eliot
--
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Tue Nov 18 20:15:42 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:56 2004
Subject: Three Access Language Paradigms
Message-ID: <3.0.3.32.19971118151445.0093beb0@pop.access.digex.net>
I have been thinking intensely about several issues these
past few days, and I've been trying to put them all together
into a coherent whole. So far I'm not succeeding, so I'm
initiating a series of discussions to help me make sense of
things. Here's the first...
We would like clients to be able to remotely manage documents
residing on servers. Clients need to be able to both query
and edit those documents. This might be done via OMG CORBA
interfaces, or it might be done via a human-readable query
language. Whatever the mechanism, I'd like to call the
mechanism a "document access language" or just an "access
language" for purposes of this discussion. In this posting
I explore three different access language paradigms.
It seems to me that so far the W3C has focused on using DOM
as the language by which clients remotely access documents.
Under DOM, clients view documents through CORBA interfaces
that make the document look like a tree of XML objects.
Once the W3C has established all of the necessary interfaces,
a client will have full control over a document's contents,
subject to DTD and access control constraints.
More recently, we have discussed possibly supplementing the
DOM approach with a human-readable access language. A
streamable access expression would be shipped to the server,
and the server would provide a streamed response. Document
content would have to transfer between client and server,
and the form of the content would be constrained by the DTD
that defines the document. The syntax of the human-readable
language is undecided. It might be OQL or it be SDQL with
extensions or it might be XML with embedded content.
I'd like to present still another form of access language.
This approach is based on a different way of thinking about
documents. Instead of asking document repositories to look
like XML documents to the external world, we only ask that
the repositories speak XML with the external world. DTDs
would be defined for the protocols that repositories might
care to speak. The DTDs would define the structure of the
protocol messages rather than the structures of documents.
One repository might speak several protocols (e.g. 'Patient
Records Protocol V.152' or 'Bank Transaction Protocol 2A').
If the repository were capable of containing arbitrary XML
documents, the repository might speak a specific protocol
called 'XML Document Protocol V.1.0'.
Under the third approach, XML documents would appear less
often as persistent repositories and more often as transient
messages between clients and servers. It would still be
necessary to define the base DTD for all of these protocols
since one server port must be able to parse them all well
enough to identify the protocol. It may even be possible
to define the syntax for queries, insertions, and updates,
so that the individual protocols have less inventing to do.
Briefly consider the benefits of the third approach. The
most significant benefit is that it completely frees the
repository from having to conform to an XML object model.
We could expose a legacy database to the world through one
of the protocols with only a thin wrapper around the
database. New databases could restrict the protocols they
support and specialize their structures according to the
kind of data they care to represent. They could be based
on custom object-oriented schemas or relational schemas.
This approach also lowers the entry level into the data
repository server world. We could think of servers more
as information warehouses than as virtual documents.
The most significant drawback of this approach is that it
doesn't give us a single access language. It probably
gives us a different access language for each protocol.
(Somebody please let me know whether this need not be so.)
One of those access languages would be defined in the
'XML Document Protocol,' and this is the language that we
have been looking for so far. Ideally, the access
languages for all of the protocols would have the same
syntactic substrate, so that the only new additions to
each protocol would be elements that are specific to the
information being represented. However, it is not
immediately apparent to me that this will be possible.
Yet, there are so many ways to represent data in XML and
in other formats such as relational and persistent OO.
The database vendor should not be constrained to use an
architecture that will export the repository as something
that looks like XML (such as DOM). For example, many
different DTDs can be invented to represent a given set
of data, and no standard should constrain a vendor to use
a specific DTD for organizing the information. A standard
should exist for how to query and update information and
for how to represent the data of concern (e.g. patient
records or transactions) -- that's what the DTDs should
define. Hence, I came to the protocol proposal.
Now it's time to talk about SQL and OQL. To a large degree
these languages expose the representation underlying the
database. SQL exposes tables and columns, while OQL
exposes the persistent classes and their methods. These
access languages are defined based on the schemas, so that
once the schemas are defined, voila, so are the access
languages. We save ourselves a lot of time.
The SQL and OQL approach has one extremely significant
drawback: compatible databases have identical schemas.
Where are the clients that speak 'Patient Record Schema
V.2.1,' and where are all the databases that are
compliant with this schema standard? Everybody uses
generic database backends, and no little guys can come
in to compete by specializing for a given standard. If
we had based these older query languages on protocols,
it wouldn't have been much of a problem for object-
oriented vendor X to come in and replace relational
vendor Y's server implementation of a standard; there
would have been no need to replace the clients.
Shouldn't we be building that sort of flexibility into
our new XML-compliant databases now, so that we will be
able to accomodate tomorrow's unexpected architectures?
I do not believe that it is necessary for an access
language to expose the database's architecture. In our
case, I do not believe an access language must assume
that the database is architected in a way that allows it
to appear externally as an XML document. It might be
desirable to do this, since it could keep us from having
to extend the query language for each protocol, but I do
not think that it is necessary. It is only necessary
that the client and the server agree on the structure
and the meanings of messages sent between them. We ought
not place constraints on our servers that need not be
there. I think DTDs for persistent documents are going
to be over-constraining.
I have more issues to discuss regarding DOM and the
required nature of an XML-document query language.
Everything seems related to everything else, but I'll
end this topic here just to get things started.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From lauren at sqwest.bc.ca Tue Nov 18 20:29:43 1997
From: lauren at sqwest.bc.ca (Lauren Wood)
Date: Mon Jun 7 16:58:56 2004
Subject: Three Access Language Paradigms
In-Reply-To: <3.0.32.19971116194534.0068a6b0@pophost.arbortext.com>; from "lauren" at Tue Nov 18 12:29:05 1997
Message-ID:
% From: Joe Lapp
%
%
% It seems to me that so far the W3C has focused on using DOM
% as the language by which clients remotely access documents.
% Under DOM, clients view documents through CORBA interfaces
% that make the document look like a tree of XML objects.
% Once the W3C has established all of the necessary interfaces,
% a client will have full control over a document's contents,
% subject to DTD and access control constraints.
You should not confuse the use of OMG IDL to describe interfaces
with requiring implementations to use CORBA interfaces. The DOM
specification is quite clear that CORBA is not needed. OMG IDL
is simply used as a language.
>From the DOM spec:
"The Object Management Group Interface Definition Language
(OMG IDL) was chosen as it was designed for specifying language
and implementation-neutral interfaces. Various other IDLs could
be used; the use of OMG IDL does not imply a requirement to use a
specific object binding runtime. "
>From the DOM FAQ, at http://www.w3.org/DOM/faq.html
"We expect that the DOM can be implemented using CORBA,
COM, or Java Virtual Machine runtime bindings."
Lauren
--
Lauren Wood, SoftQuad, Inc.
Chair, W3C DOM Activity
Lauren
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From gannon at commerce.net Tue Nov 18 21:17:17 1997
From: gannon at commerce.net (Patrick Gannon)
Date: Mon Jun 7 16:58:56 2004
Subject: Three Access Language Paradigms
Message-ID: <01BCF423.4B86F620@arrow-d23.sierra.net>
Joe,
A very interesting view you present. Let me comment from the Internet commerce view of similar research efforts that may overlap with some of your suggestions.
----------
From: Joe Lapp[SMTP:jlapp@acm.org]
Sent: Tuesday, November 18, 1997 12:14 PM
To: xml-dev@ic.ac.uk
Subject: Three Access Language Paradigms
Whatever the mechanism, I'd like to call the
mechanism a "document access language" or just an "access
language" for purposes of this discussion. In this posting
I explore three different access language paradigms.
Within the Information Access Portfolio of CommerceNet, we are also exploring methods to provide a common "language" and common "protocol" for web-based entities (browsers, agents, directories, registries, other catalogs) can access and exchange product information, whether that information is stored in a web "document" (HTML or XML) or in a database.
I'd like to present still another form of access language.
This approach is based on a different way of thinking about
documents. Instead of asking document repositories to look
like XML documents to the external world, we only ask that
the repositories speak XML with the external world. DTDs
would be defined for the protocols that repositories might
care to speak. The DTDs would define the structure of the
protocol messages rather than the structures of documents.
One repository might speak several protocols (e.g. 'Patient
Records Protocol V.152' or 'Bank Transaction Protocol 2A').
If the repository were capable of containing arbitrary XML
documents, the repository might speak a specific protocol
called 'XML Document Protocol V.1.0'.
Under CommerceNet's eCo architecture, we see the use of marketplace "registries" that "know" about web objects (who they are, what products they make/sell, what data stucture they employ) and have access to the business rules and data mapping (possibly through DTDs) to provide "seamless" access to the source "documents" (i.e. product catalogs). A developing Common Business Language would define some of the protocols you are suggesting. For instance we are defining a Product Information eXchange (PIX) platform as a framework for how some of these protocols can be easily developed in an open, interoperable way.
Your following suggestions are quite interesting and make some valid points in terms of learning from past efforts of developing query languages and underlying data structures.
Under the third approach, XML documents would appear less
often as persistent repositories and more often as transient
messages between clients and servers. It would still be
necessary to define the base DTD for all of these protocols
since one server port must be able to parse them all well
enough to identify the protocol. It may even be possible
to define the syntax for queries, insertions, and updates,
so that the individual protocols have less inventing to do.
Briefly consider the benefits of the third approach. The
most significant benefit is that it completely frees the
repository from having to conform to an XML object model.
We could expose a legacy database to the world through one
of the protocols with only a thin wrapper around the
database. New databases could restrict the protocols they
support and specialize their structures according to the
kind of data they care to represent. They could be based
on custom object-oriented schemas or relational schemas.
This approach also lowers the entry level into the data
repository server world. We could think of servers more
as information warehouses than as virtual documents.
The most significant drawback of this approach is that it
doesn't give us a single access language. It probably
gives us a different access language for each protocol.
(Somebody please let me know whether this need not be so.)
One of those access languages would be defined in the
'XML Document Protocol,' and this is the language that we
have been looking for so far. Ideally, the access
languages for all of the protocols would have the same
syntactic substrate, so that the only new additions to
each protocol would be elements that are specific to the
information being represented. However, it is not
immediately apparent to me that this will be possible.
Yet, there are so many ways to represent data in XML and
in other formats such as relational and persistent OO.
The database vendor should not be constrained to use an
architecture that will export the repository as something
that looks like XML (such as DOM). For example, many
different DTDs can be invented to represent a given set
of data, and no standard should constrain a vendor to use
a specific DTD for organizing the information. A standard
should exist for how to query and update information and
for how to represent the data of concern (e.g. patient
records or transactions) -- that's what the DTDs should
define. Hence, I came to the protocol proposal.
Now it's time to talk about SQL and OQL. To a large degree
these languages expose the representation underlying the
database. SQL exposes tables and columns, while OQL
exposes the persistent classes and their methods. These
access languages are defined based on the schemas, so that
once the schemas are defined, voila, so are the access
languages. We save ourselves a lot of time.
The SQL and OQL approach has one extremely significant
drawback: compatible databases have identical schemas.
Where are the clients that speak 'Patient Record Schema
V.2.1,' and where are all the databases that are
compliant with this schema standard? Everybody uses
generic database backends, and no little guys can come
in to compete by specializing for a given standard. If
we had based these older query languages on protocols,
it wouldn't have been much of a problem for object-
oriented vendor X to come in and replace relational
vendor Y's server implementation of a standard; there
would have been no need to replace the clients.
Shouldn't we be building that sort of flexibility into
our new XML-compliant databases now, so that we will be
able to accomodate tomorrow's unexpected architectures?
I do not believe that it is necessary for an access
language to expose the database's architecture. In our
case, I do not believe an access language must assume
that the database is architected in a way that allows it
to appear externally as an XML document. It might be
desirable to do this, since it could keep us from having
to extend the query language for each protocol, but I do
not think that it is necessary. It is only necessary
that the client and the server agree on the structure
and the meanings of messages sent between them. We ought
not place constraints on our servers that need not be
there. I think DTDs for persistent documents are going
to be over-constraining.
I have more issues to discuss regarding DOM and the
required nature of an XML-document query language.
Everything seems related to everything else, but I'll
end this topic here just to get things started.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
Patrick Gannon, Executive Director
Information Access Portfolio, CommerceNet
mailto:gannon@commerce.net
http://www.commerce.net/services/portfolios/
------------------------------------------------------
865 Tahoe Blvd., Suite 211, Incline Village, NV 89451
702-831-2251 702-831-3925 (Fax)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Tue Nov 18 21:40:43 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:58:56 2004
Subject: Query Languages for XML
Message-ID: <1.5.4.32.19971118214011.00a4b2b4@pop.mindspring.com>
At 04:38 PM 11/17/97 -0600, W. Eliot Kimber wrote:
>SDQL is simply that part of the larger DSSSL expression language that
>enables the accessing of properties of nodes in groves and the navigation
>of groves. It uses the same syntax as the rest of DSSSL, that is a Scheme
>variant. It is based on the basic grove data model (nodes and their
>properties) but has some built-in functions related to SGML (e.g., "gi",
>"att-string", etc.). All the built-in functions are or can be defined in
>terms of primitives (e.g., node-property). It includes some basic
>string-matching functions but does not attempt to provide any sort of
>complete full-text facility (which would be outside the stated scope of
>DSSSL in any case).
In the database world, what you describe would not be called a query
language; at least, not if I understand you correctly. Certainly, something
like SDQL is useful, but it doesn't seem to be a query language, nor does it
seem to eliminate the need for a query language. I think we can learn
something from the history of databases - and if we do, we will not be
condemned to repeat this history!
1. Navigational databases (hierarchical and network) allowed complex data
structures, including hierarchical structures, and used navigation to
retrieve data. Indexes on certain fields could allow a kind of random access
to records. Advantages: complex data structures possible, records always
express their relationships to other records, good run-time efficiency.
Disadvantages: dependent on physical format of records, dependent on the
exact way that records are threaded together, minor changes in the database
produced significant changes to the algorithms used to process them,
difficult to write code for general-purpose queries, queries are dependent
on the programming language used to implement them, query optimization is
virtually impossible.
Your description of SDQL makes me think that it is analogous to navigational
databases, and would probably have these disadvantages: (A) query
optimization is very difficult, because the query is procedural, and tells
precisely how the data is to be retrieved - even if a particular repository
or database has a faster way of retrieving the data, it can not do so,
because the query tells how to find it, not what to find; (B) language
dependence - there is no way to formulate a query string that will work for
any implementation of SDQL, regardless of language (and for now, you have to
formulate SDQL in scheme); (C) physical dependence - if the manner in which
the data is structured changes, the algorithms no longer work. I'm not
saying that SDQL isn't useful, I'm saying merely that it doesn't do what
query languages do.
2. Relational databases introduced the concept of real query languages, and
of logical independence - the operation of a database should not be
dependent on its physical layout. Advantages: significantly easier to change
and maintain databases, queries can be formulated as simple strings, query
language is independent of implementation language, logical independence.
Another, non-technical advantage is that an awful lot of the data we want to
retrieve from databases is currently stored in relational databases.
Disadvantages: logical independence only works as long as you *think* that
everything is a two dimensional table, complex data structures can not be
expressed (and SGML documents can not be managed efficiently using two
dimensional tables), relationships are not supported directly and must be
reestablished at run-time via primary/foreign key pairs, the results of a
query do not always maintain the original relationships among data.
Relational databases won't be a useful way to store structured documents,
but they do contain lots of data that we might want to import into our
structured documents. If we ignore relational databases, we are leaving out
a lot of important functionality.
3. Object-relational and object oriented databases are fairly diverse, so I
have to make some qualifications before I can say anything. The fundamental
difference between object-relational and object-oriented databases has to do
with persistence, a way of automatically storing programming-language
objects; this is something that object-oriented databases do, and
object-relational databases don't. More relevant for us is the underlying
data model, which is very similar for SQL 3, object-relational databases
like Illustra and UniSQL, or object-oriented databases like POET, O2,
Versant, and the ODMG standard for object databases (I am intentionally
omitting ObjectStore, which is largely a navigational database with object
persistence). These databases combine the rich data structures of
navigational databases with the logical independence and query languages of
relational databases. Objects can have complex relationships or complex
structure, and both the structure and relationships can be used as the basis
for queries.
Because hierarchical structures and their relationships are easily used in
queries, this makes a lot of sense for SGML and XML documents. For instance,
here is an OQL query that finds all SECT1 elements that have an ID attribute
and at least one PARA sub-element:
select e
from e in SGMLElement,
a in e.attributes,
s in e.subElements
where e.tagName = "SECT1"
and a.tagName = "ID"
and s.tagName = "PARA";
This kind of query is very useful - it can be understood fairly easily, the
system that performs the query can make its own decisions about the most
efficient way to perform such a query, and the query can explicitly
reference subelements, reflecting the hierachical structure of SGML and XML.
And fortunately, the major relational database vendors are also moving
towards object-relational databases; soon, we will be able to do this kind
of query in SQL-3. One SGML repository vendor has also added a fulltext
operator to allow fulltext queries to be formulated as part of a structured
OQL query - this is really cool because structured queries and fulltext
queries can be combined in the conditions of a query.
Another advantage of object databases is that the results are presented as a
grove - when it is returned as part of a query, each element maintains its
relationships to the other elements of the grove. Cool, eh?
But there are also some problems here:
a. There is no support for hierarchical queries or for transitive closure, a
fancy term for "if you keep going this way, you get there eventually". It is
nice to be able to say that you want SECT1 elements that have at least one
PARA element somewhere below them, or ask for those elements which have ID
attributes and which are somewhere below some particular element. Some
research database systems like semantic network databases have supported
these kinds of operations, but they are not widespread.
b. The form of the query depends on the data structures used to implement
the database. I modified the names for my query to make them friendly - no
real repository would allow you to use exactly those names. On the other
hand, it might not be unreasonable to create standard names to describe the
grove structure, specify how queries can be created using those names, and
have individual vendors map this abstraction onto their own implementations.
4. Some SGML databases have an SGML aware query syntax that is
non-procedural. I am thinking particularly of Texcel and LT-XML, which have
similar query languages. For instance, here is a Texcel query that finds
title elements with a parent of section with an ancestor of appendix whose
type attribute is "informational" and that has a descendant of introduction:
title { -- section { -* appendix { type =
"informational" && +* introduction }}}
This query language, like LT-XML's, directly supports hierarchical queries
and transitive closure, and is designed to support queries on SGML and XML
documents. It is non-procedural, setting no constraints on the system that
will implement the query or the language to be used to carry it out. It
would be interesting to add fulltext operators to a language like this. As I
understand it, DSSSL/SDQL could be used fairly easily to implement queries
designed in a query language like this.
I would think that solutions like this might be useful for queries on
SGML/XML documents, fulltext searches, and queries that combine the two.
This does *not* address the need to use data from non-document databases to
create markup, e.g. to bring data from relational or object-oriented
databases into a dynamic document.
I apologize for the length of this document - I hope it contains enough
useful information to be worth reading.
Jonathan
________________________________
Jonathan Robie
Email: jonathan@texcel.no
Texcel Research, Inc. ("http://www.texcel.no")
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Tue Nov 18 21:57:56 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:58:57 2004
Subject: Three Access Language Paradigms
Message-ID: <1.5.4.32.19971118215738.00aaefdc@pop.mindspring.com>
At 03:14 PM 11/18/97 -0500, Joe Lapp wrote:
>I'd like to present still another form of access language.
>This approach is based on a different way of thinking about
>documents. Instead of asking document repositories to look
>like XML documents to the external world, we only ask that
>the repositories speak XML with the external world. DTDs
>would be defined for the protocols that repositories might
>care to speak. The DTDs would define the structure of the
>protocol messages rather than the structures of documents.
>One repository might speak several protocols (e.g. 'Patient
>Records Protocol V.152' or 'Bank Transaction Protocol 2A').
>If the repository were capable of containing arbitrary XML
>documents, the repository might speak a specific protocol
>called 'XML Document Protocol V.1.0'.
This is an interesting idea, and would allow queries to be defined in an
SGML/XML-aware syntax. For instance, if we want to get "billables" from a
patient record system, we could ask an external system like a relational
database for this information using a query defined in an SGML-aware language:
billable { patient_id = 7537053 }
The external system would have to have a mapping between the DTD structure
that defines the abstract model for this protocol and the internal data
structures used on that particular system. In this case, it would have to
know what a "billable" is, where to find it, and how to find those
"billables" that belong to the patient with this particular patient id.
Offhand, this seems like a reasonable amount of effort to ask people to do
in order to interface their databases to document management systems.
On the repository side, one query could be used to support any external
system that uses this particular DTD, and general-purpose techniques could
be used to manage any virtual document. On the database / external system
side, each DTD abstraction would be a separate programming project, but I
don't really see any way around that.
I'll have to think about it, but at first blush, I like it.
Jonathan
________________________________
Jonathan Robie
Email: jonathan@texcel.no
Texcel Research, Inc. ("http://www.texcel.no")
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From bernerd.anderson at exchange.pnl.gov Tue Nov 18 22:05:49 1997
From: bernerd.anderson at exchange.pnl.gov (Anderson, Bernerd J)
Date: Mon Jun 7 16:58:57 2004
Subject: HELP - XML to Oracle (and back)!; Thanks!
Message-ID: <7A8CF1DC6A9DD0118EA400A024BF29DA0121F047@pnlmse2.pnl.gov>
All -
Just wanted to say 'Thank You' for all of the response that you all have
collectively provided to the questions that I originally posted on this
server.
I haven't found exactly what I'm looking for yet, but at least feel that
"I'm in the ballpark". Thanks also for suggesting resources to help me
with the XML learning curve!
Best regards,
Bern Anderson
(509) 375-2483 * bj.anderson@pnl.gov
Battelle Pacific Northwest National Laboratory
P.O. Box 999, MSIN: K7-63, Richland WA 99352
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Tue Nov 18 22:37:21 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:58:57 2004
Subject: Three Access Language Paradigms
Message-ID: <3.0.32.19971118143751.00a76eb0@mailhost.criinc.com>
At 03:14 PM 11/18/97 -0500, Joe Lapp wrote:
>We would like clients to be able to remotely manage documents
>residing on servers. Clients need to be able to both query
>and edit those documents.
and later,
>This approach is based on a different way of thinking about
>documents. Instead of asking document repositories to look
>like XML documents to the external world, we only ask that
>the repositories speak XML with the external world. DTDs
>would be defined for the protocols that repositories might
>care to speak. The DTDs would define the structure of the
>protocol messages rather than the structures of documents.
>One repository might speak several protocols (e.g. 'Patient
>Records Protocol V.152' or 'Bank Transaction Protocol 2A').
>If the repository were capable of containing arbitrary XML
>documents, the repository might speak a specific protocol
>called 'XML Document Protocol V.1.0'.
I am not sure that the term "document" is clearly defined for your usage.
The problem I see is that a user might change a "Document" which then
affects a number of other "Documents" because they are all just abstracted
views of a database. This breaks my own intuitive definition of "document"
which I would normally use to interpret your first statement. There are no
"Documents residing on servers", but only documents which are generated as
part of a interchange protocol. Or do you mean that there are documents
(A) (which may not be XML) and then there are XML "documents" (B) which are
generated as part of the protocol to interchange the documents (A)?
I am in complete agreement about the use of XML for information
interchange. XML helps solve a number of the problems which CORBA users
are facing currently, esp. in situations demanding high levels of
information content in each query. CORBA is great for a simple (to
formulate & express) query which a server has to think hard about and can
eventually deliver a simple (to express) answer. XML is excelent for
situations where either the query or the responce is not so easily
simplified, and structured data interchange is desired. An example I have
worked on is that we have a java applet which presents an expandable tree
view of some data. The full tree is _very_ large, and the network
connection may not be fast, so we deliver segments of the tree, using XML,
to the applet, as requested. Thus the user does not need to wait for
information they do not need. In a production system the server could
analyse the network connection to determine aproximate-optimal packet
sizes. CORBA is horrible for this type of thing, relative to our
implementation, since we can deliver N nodes of a tree-graph in one network
transaction, while CORBA would require 1 transactions for each node. (yes
there are work-arounds, but the XML solution is the simples, and most
versatile I have seen yet.)
Another thing which XML solves when used as a protocol is the problem of
adding information to an existing protocol without breaking existing
implementations. This is a serious concern. Try and load a Word97
document into Word95 and you will have a number of problems. Same with
different versions of PDF. With NAMESPACES, or some carefull DTD and
implementation design, it is possible to use XML so that this is no longer
a problem. For example, you have a NAME field, which is currently
interchanged via a NAME element like this:
If the implementations are designed to ignore unknown element tags, then
you might have (in the next version of the protocol)
which can handle to the old protocol format, and a new format with more
"meta-info". This strongly appeals to me, since this not only applies to
protocols but configuration info, etc.. the applications are virtually
unlimited. Suddenly I can share information amonst tools without having to
succumb to the least-common-denominator problem. With regard to the
protocol issue, we now have a MIME-ish thing with extensibility!
So the point of my responce, is that some of the ideas in your original
most strike a significant cord with my own ideas, but that the language for
a discussion of these topics is not clear.
There is also an problem that the real requirments for what you (Joe) are
trying to do are extremely fuzzy at this point. A clearer language for
talking about this is needed (clarify some terms) and the requirements of
what you are trying to do need to be specified more clearly.
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Nov 18 23:00:36 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:58:57 2004
Subject: and BEHAVIOR: a meta-proposal and a proposal
In-Reply-To:
References: <3.0.1.16.19971115184507.1fafd4be@pop3.demon.co.uk>
<3.0.5.32.19971114094153.00917d30@pop.access.digex.net>
Message-ID: <3.0.1.16.19971118234002.332fb736@pop3.demon.co.uk>
At 12:28 17/11/97 -0500, David G. Durand wrote:
>I want to respond to the "meta-proposal" a bit, because I disagree with
>some of the axioms, and the proposed procedures. I don't have time or
>energy right now to respond to the specific proposal, though I may well do
>so later (based on my own, somewhat divergent, axioms).
^^^^^^^^Good!^^^^^^^^
They may be preferable to mine :-)
>
>At 6:45 PM -0000 11/15/97, Peter Murray-Rust wrote:
>>
>> In any of these cases there is no general solution acceptable to
>>everyone
>>
>> If no attempt is made to address these problems we shall either
>>end up with a Babel of incompatible solutions, or wait feebly for some
>>powerful autonomous entities to dictate a limited set of actions.
>>
>
>Not necessarily. In fact, for many problems the correct response is to
>ensure that the stylesheet ans processing specification langauges can
>_implement_ each of the specific solutions desired, _without_ forcing the
>specific solutions on whihc divergence of opinion may exist. More on this
>with the "PI" axioms.
I have thought about this and have taken it to heart. I agree that
stylesheets are usually preferable to PIs. I shall therefore mentally look
for stylesheet-based solutions (or attribute-based solutions) before
PI-based solutions.
The XSL stylesheet proposal is still very young. I have read the proposal
carefully and tried to understand how to implement it - I've got about
halfway. the problem (for me) is that it is very paper-oriented (or
paper-like screen displays) and doesn't easily have a mechanism of
implementing BEHAVIOR (which an XLL processor should have). It also doesn't
specify anything about transformation (i.e. XML2XML). It will also be a
little while before it's out fully.
>
>> We have to be careful to avoid the 'only processable with
>>software X' syndrome
>
>Yes. The way to do this is to _avoid_ PIs as much as possible. PIs that are
>required to interpret a document correctly are _inherently_
>anti-portability, since the rule for PIs is that _any application_ should
>be free to ignore them without changin the meaning of the document. The use
>of SGML's PI syntax in XML is a not a good model for the use of PIs in
>general, since they are being used in XML as a syntactic "escape hatch" for
>compatibility with SGML. It would not be necessary (or desirable) if XML
>were not (to some very small extent) changing SGML facilitied (as with
>specifying the character encoding of entitites in PIs, rather than an SGML
>declaration).
>
>If XML had been able to add declarations to SGML, that would have been done
>instead of using the PI syntax.
If I understand this, you are saying that PIs are required to get XML to
work (e.g. , namespace etc.) but they are too dangerous for normal
mortals. I can go along with that view, but the spec-authors should
restrict the PI-targets to XML. The message the current spec gives is:
- Here are PIs. Use them if you want.
What (perhaps) they should say is:
- PIs should be reserved for things we (the XML-WG) can't do in XML any
other way. Using them otherwise can seriously damage your readers' health.
>
>> There is a critical mass of readers of this list who feel the
>>need to address the problem.
>
>Without a problem statement I'm not sure how to judge this, but it may well
>be true.
>
>> Anyone can use any PIs they like in their documents for whatever
>>purposes they like without breaking the spirit of XML.
>
>This is assuredly incorrect. PIs are intended for use in the case where a
>practical _use_ of a document with _particular software_ requires
>additional information that _should not_ have been indicated ina structural
>description of the content. A paradigmatic example is the occasional need
>to insert a page or column break in order to get acceptable formatting in a
>particular processing situation (including: software, stylesheet, output
>device). This is not information that _should_ be encoded in the abstract
>representation of a document, but _may be essential_ for "getting the thing
>to print right".
Understood. Even in TeX you have to fudge it occasionally. But quite a lot
of XML applications won't have any pages. IMO XML is not yet prepared for
the non-document-oriented applications. We shall want to do other things
with XML documents than read them. :-) Stylesheets are very highly oriented
to typesetting on paper.
>
>> That processing software need not (and so far won't) take any
>>notice of these (or perhaps any) PIs
>>
>
>This is certainly essential. If you are saying something about you document
>that you can imagine being useful to some software that you aren't using
>right now -- then it should probably be in the markup. PIs are for things
>that can be ignored without changing the interpretation of a document.
Yes. Actually that was true of my proposals as well. The PI was modifying
the production of porridge. If the document had gone to a Postscript
formatter instead, it wouldn't have changed the meaning of the document,
just not cooked any porridge.
>
>> If a few people find a way of doing something that works for
>>them, and isn't against the spirit of the XML specs, then flaming their
>>ideas is pointless.
>
>Even this is not necessarily true -- attacking the dissemination of false
>or bad ideas is _never_ pointless, in that dissemination of bad information
>(even if it serves a local porpose adequately well) can seriously mislead
>people. For instance the use of styles in word-processing programs is
>usually a very good idea. The fact that in some instances direct formatting
>may work out, or even work better, should not stop people from quarreling
>with public assertions about the utility of stylesheets based on those
>situations.
>
>To the extent that these axioms seem to be intended to rule out
>disagreement of the merits of future proposals, I must take immediate and
>strong exception to them. It's not possible for a responsible discussant
>who disagrees with a public proposal of working practice to remain silent
>on the topic. "Flaming" is usually not responsible discussion, but
>principled disagreements should be expressed so that the issues are clear
>to all.
Good. The axiom might benefit from revision or deletion - we'll see...
>
>>
>>The proposal I really want to address is, like Month Python's joke,
>>so potentially dangerous that I dare not reveal it yet. The proposal here
>>is also important to me - perhaps to others - and I hope servers as a
>>useful example. It is NOT in a finalised form, but as can be seen from the
>>meta-proposal, there is a method for referring to the a 'pseudo-final' form
>>that is, at least, usable.
>>
>
>This makes me nervous
Wasn't meant to. I am more nervous of implementations which take place
without any discussion at all.
>
>>
>>That a PI of the form is 'reserved' by members of this list for
>>PI-based proposals on this list. [We cannot use XML-DEV as 'XML' is rightly
>>reserved.]
>
>We can certainly do this -- but as I said above, there are good reasons to
>oppose the use of PIs for _any_ use that affects the semantics of
>documents.
That the characters XDEV be used in places such as Attribute names, values,
elements, namespaces and (in the last resort) PIs where they serve to
clarify the semantics by referring to discussions on this list
>
>For example, even the proposed namespace PI would be vulnerable on this
>account, except for the facts that:
>
> 1. It's intended for use in _experiment_ with a proposed _extension_ of
>XML. (In other words, the PI, should it be generally accepted for use with
>all interested XML applications, would become part of XML).
Understood. And I am experimenting with it. It does great things for me and
JUMBO.
>
> 2. The prefix can be processed (and thus, the semantic information
>accessed) _without_ software having to be aware of the namespace PI. In
>other words, the PI can be treated as equivalent to a comment describing
>the proposed intent of the tags that share a prefix. (In other words, you
>can ignore the namespace PI, and still detect the semantic distinctions in
>the document)
I don't understand this. My understanding of the namespace proposal is that:
identifies a namespace FOO used as FOO:xyz in certain names, etc. The HREF
points to a 'schema' for some undefined purpose. When a processor (not a
parser) finds an element of type it can:
- treat it as semantically void
- realise from the PI that bar.xml might say something useful about it
(this is what JUMBO does)
- realise that it knows privately about the FOO namespace and looks up
FOO:plugh
- matches the action with FOO:plugh is a stylesheet
If you are treating the PI as a comment, but relying on a stylesheet, why
use the PI at all. (JUMBO uses the PI, because it can't use stylesheets for
some of the things it wants to do).
>In the long run this may (or for a number of reasons may not) be true.
>However, bad ideas that are initially plausible but unworkable in the long
>term (e.g., from a related, but different doamin, the creation and
>management of large structured information cropora in raw HTML) would get
>an artificial (and community-harmful) boost if an effective social
>convention forbidding disagreement were in effect.
Perhaps. The idea was to create spaces on this list where people with a
common vision can devise approaches to which they can make semantic
reference. At present I'm asking whether that's a good idea. If enough
people think it is, then I would hope the discussion would be ignored by
those not interested. Those who object to it can start their own parallel
discussion - no harm in that.
As you can see - and I'll elaborate later - I think there is virtue in
trying out new ideas in public, even if they have potential flaws or
limitations. HTML is a good example; it was designed to be tolerant of
broken systems, implemented to be even more tolerant. Even in XML,
everything will not be gold plated.
>
>I agree that polite, reasoned disagreement is better than flaming
>(impolite, ad-hominem disagreement) but in the intellectual world the unfit
>perish faster under the lash of criticism.
We'll find somewhere in the middle :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Nov 18 23:01:22 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:58:57 2004
Subject: and BEHAVIOR: a meta-proposal and a proposal
In-Reply-To: <199711160305.OAA25372@jawa.chilli.net.au>
Message-ID: <3.0.1.16.19971118222745.3c0f82e0@pop3.demon.co.uk>
At 14:09 16/11/97 +1100, Rick Jelliffe wrote:
>
>
>> From: Peter Murray-Rust
>
>>
>> That two attribute values for XML-LINK's BEHAVIOR attribute be recognised
>> through an XDEV PI:
>> BEHAVIOR="DisplayStandAlone"
>> BEHAVIOR="DisplayInContext"
>> That for the second option an additional attribute CONTEXTREF is required,
>> whose value is a valid URL and points to the XML element providing the
>> display context of the current element.
>> The actual details of display are application (and possibly stylesheet)
>> dependent.
>>
>
>Another approach might be to use the name prefix XDEV: on attribute
>values, e.g.
>
> BEHAVIOUR="XDEV:DisplayStandAlone"
>
>and the contextref attribute you suggest, e.g.
>
> BEHAVIOUR="XDEV:DisplayInContext"
> XDEV:CONTEXTREF="saltmines.xml#DESCENDANT(1,ORGCHART)"
>
Rick,
I have now realised (forgive my slow thinking) that this provides exactly
what is needed and I was too hasty in my earlier reply.
As you say, the attribute value simply needs to be unique and the XDEV:
mechanism provides that (to a certain extent). It can even be linked to a
namespace if that is allowed when the namespace proposal is finalised.
So, in the first proposal there is no need for PIs. Rick's suggestion
meets my needs, so it can be bolted in very easily. The result is that an
XML-LINK-aware processor may, but need not, recognise BEHAVIOR attribute
values prefixed by XDEV, and one or more additional attributes with names
prefixed by XDEV.
An XDEV-unaware processor will give a graceful message saying it doesn't
understand the XDEV: attribute and the BEHAVIOR value. [At present it will
say it doesn't understand *any* BEHAVIOR values except by private
negotiation, because none have been suggested. I'll write more later...]
This could be a good time for those more expert than me to suggest BEHAVIOR
values. [I have asked at regular intervals whether the XML-LINK attributes
would have suggested values (i.e. for ROLE, BEHAVIOR and more guidance on
CONTENT-ROLE, etc.) I think the current idea is to keep it semantically
neutral. That's why I'm raising it here...
An XDEV-aware processor will be able to do lots of wonderful things with
the BEHAVIOR values... especially when coupled to equipment...
David,
You are rightly concerned about the meta-proposal - I'll reply in more
detail, but say that PIs are now not an essential part of the meta-proposal
(though they may be required sometimes). Your comments are very useful and
I will certainly make sure that I stress standard mechanisms (stylesheets,
for example) where possible. [I am trying to code them into JUMBO, but am
still trying to work out how closely they are coupled to a page-like output
or whether they can be used more generally.] I do not think that
stylesheets can do everything, although if XSL included a transformation
language that might help in some places.
I shall not unleash the Monty Python proposal until we have addressed the
meta proposal a bit more. :-)
Cheers,
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ht at cogsci.ed.ac.uk Tue Nov 18 23:04:47 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun 7 16:58:57 2004
Subject: Three Access Language Paradigms
In-Reply-To: Jonathan Robie's message of Tue, 18 Nov 1997 16:57:38 -0500
References: <1.5.4.32.19971118215738.00aaefdc@pop.mindspring.com>
Message-ID:
Um, why doesn't XLL address all the goals of this thread and then
some?
ht
--
Henry S. Thompson, Human Communication Research Centre, University of Edinburgh
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.cogsci.ed.ac.uk/~ht/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Wed Nov 19 01:49:26 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:57 2004
Subject: Three Access Language Paradigms
In-Reply-To:
References: <3.0.32.19971116194534.0068a6b0@pophost.arbortext.com>
Message-ID: <3.0.3.32.19971118204952.00968c30@pop.access.digex.net>
lauren@sqwest.bc.ca (Lauren Wood) wrote:
>You should not confuse the use of OMG IDL to describe interfaces
>with requiring implementations to use CORBA interfaces. The DOM
>specification is quite clear that CORBA is not needed. OMG IDL
>is simply used as a language.
Thanks for the correction. Shows you how much I know about CORBA.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Wed Nov 19 02:04:02 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:58:57 2004
Subject: Three Access Language Paradigms
Message-ID: <1.5.4.32.19971119020338.00ab71dc@pop.mindspring.com>
At 11:04 PM 11/18/97 +0000, Henry S. Thompson wrote:
>Um, why doesn't XLL address all the goals of this thread and then
>some?
If I remember what I learned in high school rhetoric, I think the burden of
proof is on the affirmative!
Jonathan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Wed Nov 19 03:04:43 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:57 2004
Subject: Three Access Language Paradigms
In-Reply-To: <3.0.32.19971118143751.00a76eb0@mailhost.criinc.com>
Message-ID: <3.0.3.32.19971118220503.0095a4f0@pop.access.digex.net>
Derek Denny-Brown wrote:
>I am not sure that the term "document" is clearly defined for your usage.
I guess I did overuse and under-define the word 'document.' I'll try to
convey what I intended to mean. I understand that pretty much any well-
formed construct can serve as a document in XML. I also understand that
we might want to talk about a single document that consists of multiple
documents that are linked together. However, in my post I intended the
word 'document' to mean a single XML file or any system that makes itself
appear as if it were analogous to an XML file, such as a database that
exposes DOM IDL :-) interfaces. That's the meaning I was using, though
I realize that it's probably not the best definition to work with.
In light of your response, I see that this is kind of a constraining
definition. However, I think the language of my posting can be amended
so that it still has general applicability. The word 'document' might
be taken in its most general sense, so that it applies to anything you
might think of. Next, everywhere I talk about the DTD of the document,
we'd have to modify that to talk about the set of DTDs and structure
of links by which documents of those DTDs are intended to be linked.
>[...] There are no
>"Documents residing on servers", but only documents which are generated as
>part of a interchange protocol. Or do you mean that there are documents
>(A) (which may not be XML) and then there are XML "documents" (B) which are
>generated as part of the protocol to interchange the documents (A)?
Boy I really was being quite inconsistent. When I talk about the protocol
messages being documents I was talking about a single serializable stream
of well-formed XML. I guess I really was quite confusing.
>[...] CORBA is great for a simple (to
>formulate & express) query which a server has to think hard about and can
>eventually deliver a simple (to express) answer. XML is excelent for
>situations where either the query or the responce is not so easily
>simplified, and structured data interchange is desired.
XML seems to remove the client's responsibility for constructing complex
objects from primitive ones. The object arrives complex already. I agree.
Another side-benefit is that complex requests and responses can be batched
and transmitted over single short-lived connections.
>[...]
>Another thing which XML solves when used as a protocol is the problem of
>adding information to an existing protocol without breaking existing
>implementations. This is a serious concern.
I didn't even think of that.
>[...] With regard to the
>protocol issue, we now have a MIME-ish thing with extensibility!
Nor did I think of that, but this may be because I'm more familiar
with mimes that play charades than mail-protocol MIME.
>So the point of my responce, is that some of the ideas in your original
>most strike a significant cord with my own ideas, but that the language for
>a discussion of these topics is not clear.
The language does need to be cleaned up, and I'd certainly appreciate
any help I can get. Let me know whether this post clears things up any
or whether it further muddies the waters.
>There is also an problem that the real requirments for what you (Joe) are
>trying to do are extremely fuzzy at this point. A clearer language for
>talking about this is needed (clarify some terms) and the requirements of
>what you are trying to do need to be specified more clearly.
I know. They are fuzzy in my brain too. I'm working on that one.
I've got something up there, but it is proving to be a very slippery
beast (with fangs and horns and a ferocious roar!).
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Wed Nov 19 06:05:16 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:58:57 2004
Subject: Three Access Language Paradigms
Message-ID: <199711190602.RAA03174@jawa.chilli.net.au>
> From: Joe Lapp
> Derek Denny-Brown wrote:
> >I am not sure that the term "document" is clearly defined for your usage.
>
> I guess I did overuse and under-define the word 'document.'
Another very useful terminological distinction is between "document" and "publication". A publication is one or more documents
rendered for some medium.
After the XML document has been parsed and groved, and auto links embedded, and transformations and stylesheets applied, and then
sent to some output device, that is the publication.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Wed Nov 19 07:54:34 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:58:57 2004
Subject: Three Access Language Paradigms
Message-ID: <3.0.32.19971118230246.00b6cc98@pop.intergate.bc.ca>
At 09:03 PM 18/11/97 -0500, Jonathan Robie wrote:
>At 11:04 PM 11/18/97 +0000, Henry S. Thompson wrote:
>>Um, why doesn't XLL address all the goals of this thread and then
>>some?
>If I remember what I learned in high school rhetoric, I think the burden of
>proof is on the affirmative!
Let me rephrase Henry's comment: I suggest that those who are proposing
brave new query language worlds go have a look at XLL. It *may* be the
case that XLL xpointers hit a good 80-20 point in terms of what we'd
like in a query language and in ease of implementation. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Nov 19 08:24:10 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:58:57 2004
Subject: Three Access Language Paradigms
In-Reply-To: <3.0.32.19971118230246.00b6cc98@pop.intergate.bc.ca>
Message-ID: <3.0.1.16.19971119092048.2c6f8b6c@pop3.demon.co.uk>
At 23:54 18/11/97 -0800, Tim Bray wrote:
[...]
>
>Let me rephrase Henry's comment: I suggest that those who are proposing
>brave new query language worlds go have a look at XLL. It *may* be the
>case that XLL xpointers hit a good 80-20 point in terms of what we'd
>like in a query language and in ease of implementation. -Tim
>
I support this. I have found TEI Xpointers in XLL *extremely* useful - they
have revolutionised my thinking about XML documents. Essentially, in many
cases, the 'document is the database' (for smallish applications). I also
use them inside JUMBO for navigating within known structures (e.g. seeing
whether an element has certain relatives and, if so, taking some action.)
For certain purposes the TEI Xpointer is limited. Initial discussions
suggested:
- SPACE (for coordinate systems such as images, tables)
- some sort of regexp
- FOREIGN for adding your own methods on.
- and I'd be happy to see something for numeric and other types values
I was in favour of these (I have to use them somehow in JUMBO), but it was
made clear that Xpointers were intended as an addressing scheme and not a
query language. I respect this distinction, but it would be very nice to be
able to extend TEI syntax to allow this.
As I understand it, TEI syntax (a la XLL) is confined the use in HREFs
within elements with XML-LINK attributes. Any of the proposed extensions is
(rightly) illegal there. But it would be possible to extend TEI for use
*elsewhere* (e.g. in querying documents) and I would be very happy to see
keywords of the sort above added *for query purposes*.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mtbryan at sgml.u-net.com Wed Nov 19 09:30:03 1997
From: mtbryan at sgml.u-net.com (Martin Bryan)
Date: Mon Jun 7 16:58:57 2004
Subject: Query Languages for XML
Message-ID:
Paul Prescod wrote
> We already have forms, and it doesn't require
>an updatable SDQL.
Where in XML do we have forms, or any statement that tells anyone what will
happen to data placed into an editable field?
-----------------------------------------------------------------
Martin Bryan, 29 Oldbury Orchard, Churchdown, Glos GL3 2PU, UK
Phone/Fax: +44 1452 714029 E-mail: mtbryan@sgml.u-net.com
For more information about The SGML Centre contact
http://www.sgml.u-net.com
For more information about the European Commission's
Open Information Interchange (OII) initiative contact
http://www.echo.lu/oii/en/oiistand.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Nov 19 10:34:12 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:58:58 2004
Subject: Query Languages for XML
References:
Message-ID: <3472C157.BE6EE6FD@technologist.com>
Martin Bryan wrote:
>
> Where in XML do we have forms,
"To reduce the initial barriers to adoption, a core set of HTML flow
objects is recommended in addition to the core DSSSL flow objects. The
HTML/CSS formatting model is somewhat different from the DSSSL model,
and the inclusion of the HTML/CSS flow objects will make it possible to
use XSL with HTML and CSS. It simplifies the targeting of HTML as the
output format, and retains consistency of the object model and dynamic
behaviors."
- http://www.w3.org/TR/NOTE-XSL.html
Included in the list are:
"FORM
INPUT
SELECT
TEXTAREA"
> or any statement that tells anyone what will
> happen to data placed into an editable field?
This is specified in the HTML 4.0 proposed recommendation which has
provisions for interactive processing on either the client or server
sides. If and when someone standardizes an updateable document data
manipulation language, it can be accessed from these forms just as SQL
and ODQL are today.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mtbryan at sgml.u-net.com Wed Nov 19 11:40:20 1997
From: mtbryan at sgml.u-net.com (Martin Bryan)
Date: Mon Jun 7 16:58:58 2004
Subject: Query Languages for XML
Message-ID:
Paul Prescod wrote
>> Where in XML do we have forms,
>
>"To reduce the initial barriers to adoption, a core set of HTML flow
>objects is recommended in addition to the core DSSSL flow objects. The
>HTML/CSS formatting model is somewhat different from the DSSSL model,
>and the inclusion of the HTML/CSS flow objects will make it possible to
>use XSL with HTML and CSS. It simplifies the targeting of HTML as the
>output format, and retains consistency of the object model and dynamic
>behaviors."
> - http://www.w3.org/TR/NOTE-XSL.html
>
>Included in the list are:
>
>"FORM
> INPUT
> SELECT
> TEXTAREA"
>
>
>> or any statement that tells anyone what will
>> happen to data placed into an editable field?
>
>This is specified in the HTML 4.0 proposed recommendation which has
>provisions for interactive processing on either the client or server
>sides. If and when someone standardizes an updateable document data
>manipulation language, it can be accessed from these forms just as SQL
>and ODQL are today.
>
So we are constrained to using the types of form objects defined in HTML
using the processes defined in HTML 4.0, and can add no new functionality
via XSL?
Martin Bryan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Wed Nov 19 12:03:57 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:58:58 2004
Subject: Three Access Language Paradigms
Message-ID: <1.5.4.32.19971119120312.00ac60a0@pop.mindspring.com>
At 11:54 PM 11/18/97 -0800, Tim Bray wrote:
>At 09:03 PM 18/11/97 -0500, Jonathan Robie wrote:
>>At 11:04 PM 11/18/97 +0000, Henry S. Thompson wrote:
>>>Um, why doesn't XLL address all the goals of this thread and then
>>>some?
>>
>>If I remember what I learned in high school rhetoric, I think the burden of
>>proof is on the affirmative!
>
>Let me rephrase Henry's comment: I suggest that those who are proposing
>brave new query language worlds go have a look at XLL. It *may* be the
>case that XLL xpointers hit a good 80-20 point in terms of what we'd
>like in a query language and in ease of implementation. -Tim
I agree - XLL pointers may be a good starting point for a query language,
and this would have the advantage of reducing the number of things that
people have to learn. It really *is* a nonprocedural query language,
independent of the implementation language, etc., and it is easy to read.
I am not sure, however, that it "addresses all the goals of this thread and
then some". I'll have to take a closer look at it, and ask myself what it
would take if, at some point, the other 20% needed to be added to it.
Jonathan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at light.demon.co.uk Wed Nov 19 12:53:32 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun 7 16:58:58 2004
Subject: Data manipulation languages for XML (was Query Languages ...)
In-Reply-To: <3.0.32.19971118091934.009f7260@swbell.net>
Message-ID:
In message <3.0.32.19971118091934.009f7260@swbell.net>, "W. Eliot
Kimber" writes
>
>I'm afraid I don't see how using groves as the fundamental abstraction for
>editing is inconsistent with satisfaction of any of the requirements. All
>that's needed on top of what DSSSL provides are functions that represent
>the editing actions needed (as opposed to modeling editing as a transform,
>which is probably not a useful approach). If SQL provides a useful model
>for defining such functions, we should use it.
I'm perfectly happy with this idea too, and agree that we wouldn't need
to add much to DSSSL/SDQL to allow the abstract representation of an
editing process. SQL can act as a touchstone for us to check the
completeness of the set of additional functions - I'm not sure it is a
useful model as such.
However, what I am really arguing is that once we have done this, there
is still a case for going on to define a more user-friendly SQL-like
syntax for specifying data manipulations. This syntax would have
exactly the same relationship to SDQL as XSL does: it would be a simple
front-end into a subset of SDQL's functionality.
Richard.
Richard Light
SGML/XML and Museum Information Consultancy
richard@light.demon.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Wed Nov 19 14:48:01 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:58 2004
Subject: Three Access Language Paradigms
In-Reply-To: <3.0.32.19971118230246.00b6cc98@pop.intergate.bc.ca>
Message-ID: <3.0.3.32.19971119094813.0095d780@pop.access.digex.net>
Tim Bray wrote:
>Let me rephrase Henry's comment: I suggest that those who are proposing
>brave new query language worlds go have a look at XLL. It *may* be the
>case that XLL xpointers hit a good 80-20 point in terms of what we'd
>like in a query language and in ease of implementation. -Tim
I assume that XLL is what the XML-LINK document describes. If so, then
for starters, what sort of editing mechanisms does XLL have?
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Patrice.Bonhomme at loria.fr Wed Nov 19 14:54:33 1997
From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme)
Date: Mon Jun 7 16:58:58 2004
Subject: msxml 1.6 : ID without DTD Declaration
Message-ID: <199711191453.PAA04490@chimay.loria.fr>
I am developping an XLL package using the msxml parser. But i wondered if we
can use ID attribute without any DTD declararion ? MSXML use the method
DTD.findID(Name name) to retrieve an Element with the attribute ID=name, but
without a DTD declaration i cant call DTD.findID(Name name) !
Is there a way to get round this ? A kind of :
Pat.
--
==============================================================
bonhomme@loria.fr | Office : B.228
http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37
--------------------------------------------------------------
* Projet Aquarelle : http://aqua.inria.fr
* Serveur Silfide : http://www.loria.fr/Projet/Silfide
==============================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Wed Nov 19 15:52:58 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:58:58 2004
Subject: Three Access Language Paradigms
Message-ID: <1.5.4.32.19971119155058.00a6527c@pop.mindspring.com>
At 11:54 PM 11/18/97 -0800, Tim Bray wrote:
>It *may* be the case that XLL xpointers hit a good 80-20 point
>in terms of what we'd like in a query language and in ease of
>implementation. -Tim
Tim,
In XLL, is there a way to combine conditions with boolean operators? Say I
am using XL7, and I need to do a query for those billable items for a
particular patient number AND for a particular physician. Can I do this with
XLL? If there are boolean operators, is there a way to specify precedence?
Jonathan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Wed Nov 19 16:20:15 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:58:58 2004
Subject: msxml 1.6 : ID without DTD Declaration
Message-ID: <199711191617.DAA14394@jawa.chilli.net.au>
>From: Patrice Bonhomme
>I am developping an XLL package using the msxml parser. But i wondered if we
>can use ID attribute without any DTD declararion ? MSXML use the method
>DTD.findID(Name name) to retrieve an Element with the attribute ID=name, but
>without a DTD declaration i cant call DTD.findID(Name name) !
>Is there a way to get round this ? A kind of : #IMPLIED>
The current enhancements to SGML allow pretty much exactly what you suggest.
I am not sure when this will be added into XML. If it is are not in
XML 1.0 then you should lobby for it to go into XML 1.1 (if such
a thing comes).
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at light.demon.co.uk Wed Nov 19 17:11:10 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun 7 16:58:58 2004
Subject: Three Access Language Paradigms
In-Reply-To: <3.0.3.32.19971119094813.0095d780@pop.access.digex.net>
Message-ID:
In message <3.0.3.32.19971119094813.0095d780@pop.access.digex.net>, Joe
Lapp writes
>I assume that XLL is what the XML-LINK document describes. If so, then
>for starters, what sort of editing mechanisms does XLL have?
None - we are still talking read-only access here.
Richard Light
SGML/XML and Museum Information Consultancy
richard@light.demon.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at light.demon.co.uk Wed Nov 19 17:43:56 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun 7 16:58:58 2004
Subject: Three Access Language Paradigms
In-Reply-To: <1.5.4.32.19971119155058.00a6527c@pop.mindspring.com>
Message-ID:
In message <1.5.4.32.19971119155058.00a6527c@pop.mindspring.com>,
Jonathan Robie writes
>
>In XLL, is there a way to combine conditions with boolean operators? Say I
>am using XL7, and I need to do a query for those billable items for a
>particular patient number AND for a particular physician. Can I do this with
>XLL? If there are boolean operators, is there a way to specify precedence?
No. An XLL expression supports a chain of locators, each of which
starts from the last place you got to in the target document's
structure. You can have a second chain, pointing to somewhere else, in
which case the XPointer is deemed to point to the span witihn the
document whose end-points are the two elements or characters you have
specified by your locators.
I've just checked over the original TEI Extended Pointer mechanism on
which XPointers are based, and there is nothing in that to support
boolean logic either.
Richard Light
SGML/XML and Museum Information Consultancy
richard@light.demon.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Wed Nov 19 18:18:08 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:58:58 2004
Subject: Three Access Language Paradigms
Message-ID: <1.5.4.32.19971119181728.00a3f010@pop.mindspring.com>
At 04:41 PM 11/19/97 +0000, Richard Light wrote:
>In message <1.5.4.32.19971119155058.00a6527c@pop.mindspring.com>,
>Jonathan Robie writes
>>
>>In XLL, is there a way to combine conditions with boolean operators? Say I
>>am using XL7, and I need to do a query for those billable items for a
>>particular patient number AND for a particular physician. Can I do this with
>>XLL? If there are boolean operators, is there a way to specify precedence?
>
>No. An XLL expression supports a chain of locators, each of which
>starts from the last place you got to in the target document's
>structure. You can have a second chain, pointing to somewhere else, in
>which case the XPointer is deemed to point to the span witihn the
>document whose end-points are the two elements or characters you have
>specified by your locators.
That's pretty much what I had thought when I read the XLL spec. Personally,
in evaluating the 80/20 mix for a query language, I would think that boolean
operators, boolean functions, and precedence would be pretty important.
Another significant limitation of XPointers as a query language is that each
term specifies *one* location, if I understand the spec correctly. It
doesn't seem to be set up to allow result sets, e.g. the set of patient
records that satisfy a particular requirement, the set of catalog entries
that specify a particular requirement, etc. I would think that result sets
are pretty important for query languages.
I really like the simplicity, readability, and design cohesiveness of XLL,
and I do think that the functionality it contains should be present in a
query language for SGML/XML documents. It is not clear to me whether there
is a good, orthogonal way to add in some of this other functionality to XLL;
if so, XLL could be used as the basis for a query language. Using the same
primitives would be nice, since anybody working with XML is going to have to
learn XLL, and we don't want every poor schmo to have to learn 50 different
ways to do a query.
Jonathan
________________________________
Jonathan Robie
Email: jonathan@texcel.no
Texcel Research, Inc. ("http://www.texcel.no")
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Wed Nov 19 19:15:17 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:58:58 2004
Subject: Three Access Language Paradigms
Message-ID: <3.0.32.19971119111601.00a73eb0@mailhost.criinc.com>
At 10:05 PM 11/18/97 -0500, Joe Lapp wrote:
>Derek Denny-Brown wrote:
>>I am not sure that the term "document" is clearly defined for your usage.
>
>... in my post I intended the
>word 'document' to mean a single XML file or any system that makes itself
>appear as if it were analogous to an XML file, such as a database that
>exposes DOM IDL :-) interfaces. That's the meaning I was using, though
>I realize that it's probably not the best definition to work with.
that helps.
>>[...] There are no
>>"Documents residing on servers", but only documents which are generated as
>>part of a interchange protocol. Or do you mean that there are documents
>>(A) (which may not be XML) and then there are XML "documents" (B) which are
>>generated as part of the protocol to interchange the documents (A)?
>
>Boy I really was being quite inconsistent. When I talk about the protocol
>messages being documents I was talking about a single serializable stream
>of well-formed XML. I guess I really was quite confusing.
this really helps. (for me at least)
So this gives us two separate issues, which you are talking about.
1) XML used as an interchange abstraction (your discussion of protocols)
2) XML used as a data modeling abstraction (your XML on the server and XML
document query discussions)
Regarding (1)
my last post had a number of my comments on using XML as a foundation for
interchange (i.e. as a layer in a protocol implementation) so I won't go
into it much more, other than to say that I see this as one of XML's
greatest potentials. One real need though is some good, simple, free
software that people can use to make this easy. LT-XML is a good step, but
what I think is needed is a GPL version of something similar. One really
good way to get XML into regular use beyond HTML-NextGeneration would be to
get it into some GNU projects... just my 2 cents... If I had more time
that I could devote to freeware projects, I would already be working on this.
Regarding (2)
XML with XLL provides all the pieces, but is almost too flexible to be used
as a general purpose data-modelling abstraction. I think something
somewhere between XML and RDF and Tim Bray's typed data extensions to XML.
The problem is that XML is all about marking up text. For it to be used as
a general data-modeling tool, you need some further mechanisms to constrain
the actual data/document instances. With some basic work to add some more
typing information to XML, and place some limits on element content models
for parts of the document which are not really just text streams.
At least in my mind there is a significant difference between:
DerekDenny-Brownddb@criinc.comblah.. blah .blah
and
DerekDenny-Brown
is contactable via email at
ddb@criinc.com
or via the more traditional postal services at
blah.. blah .blah
they contain the same info, but one is a very tightly constrained structure
which enforces some nice rules (like you can have only one current NOMEN,
though it might provide for alternate (non-prefered) NOMENs etc..) while
the second is good for pulling the information to build the first from a
free form document. The second would be much better if it included the
first and all the PEERSON-INFO blocks were just references to the PERSON
block to pull the appropriate structures.
My general point being that XML is _too_ flexible for use as a general
purpose data modelling tool, without some additional information. If I
really wanted to use XML as a data modeling tool, I would require all sorts
of data-type meta-info and content modeling constraints to allow XML to be
used as a sort of snapshot of a data-set which stradled the relational and
object oriented data modeling worlds. Used this way it provides a kind of
object oriented (with some relation capabilities) database view, with
strong support for dynamic quiries.
Then again if what you are really after is a marked up text stream, then
XML is a better tool than most, if only because so many people seem to like
it. Java and Microsoft (independently) have helped show the world that
mass marketing and the "boardroom sell" can take something a lot farther
than it might ever have gotten on its own.
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Wed Nov 19 19:17:45 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:58:58 2004
Subject: Three Access Language Paradigms
Message-ID: <3.0.32.19971119111803.00932b70@mailhost.criinc.com>
At 05:06 PM 11/19/97 +1100, Rick Jelliffe wrote:
>After the XML document has been parsed and groved, and auto links
embedded, and transformations and stylesheets applied, and then
>sent to some output device, that is the publication.
what if the output device is a network interface for sending it to a client
for interpretation? i.e. it is never intended to be rendered on screen or
paper? One place I would like to sue XML is for application configuration
files. When does that become a publication?
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From neil at bradley.co.uk Wed Nov 19 20:36:59 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun 7 16:58:58 2004
Subject: CSS2 and XML tables
Message-ID: <199711192036.UAA02763@andromeda.ndirect.co.uk>
The CSS2 proposal mentions XML (thank goodness) several times,
and even CSS1 had the capability to specify in-line and block
styles, and list and list item styles for arbitary XML elements.
When I saw that CSS2 had additional features for handling tables,
I immediately thought there would be property types for use with
XML. Maybe they are there, and I cannot find them. If not, can
they be added to the Display property, as in 'table',
'head-row', body-row' and 'cell' shown below:
Property name:
'display'
Value:
block | inline | list-item | run-in | compact |
none | table | head-row | body-row | cell
Initial:
block
Applies to:
all elements
If something like this is not done, I fear that rendering XML
tables will only be achievable if HTML element names are used, or
some other nasty technique is adopted.
Neil.
-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From paul at arbortext.com Wed Nov 19 21:04:17 1997
From: paul at arbortext.com (Paul Grosso)
Date: Mon Jun 7 16:58:58 2004
Subject: Three Access Language Paradigms
Message-ID: <3.0.32.19971119144056.0068dddc@pophost.arbortext.com>
At 13:17 1997 11 19 -0500, Jonathan Robie wrote:
> Using the same
>primitives would be nice, since anybody working with XML is going to have to
>learn XLL. . .
I sure hope it is not the case that anybody working with XML is going
to have to learn XLL.
paul
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Wed Nov 19 21:23:26 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:58:58 2004
Subject: Three Access Language Paradigms
Message-ID: <1.5.4.32.19971119212307.00a70970@pop.mindspring.com>
At 04:03 PM 11/19/97 -0500, Paul Grosso wrote:
>At 13:17 1997 11 19 -0500, Jonathan Robie wrote:
>> Using the same
>>primitives would be nice, since anybody working with XML is going to have to
>>learn XLL. . .
>
>I sure hope it is not the case that anybody working with XML is going
>to have to learn XLL.
Oops! I guess that was a bit of an overstatement, wasn't it ;->
Jonathan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From markb at iosphere.net Wed Nov 19 21:56:32 1997
From: markb at iosphere.net (Mark Baker)
Date: Mon Jun 7 16:58:58 2004
Subject: XML and Bean serialization
Message-ID:
I've recently proposed to Javasoft, via their public RMI-USERS mailing
list, that they adopt XML as the serialization format for Beans and
JARs. I see this is a critical move in unifying the "web" and
"object" implementations of the distributed future (their respective
*visions* are already practically identical).
Interest, what little there has been so far, has been very positive. But
unfortunately, Javasoft themselves have not yet responded.
I'm trying to drum up public interest so that we might be able to push a
little harder on this, perhaps even constructing a prototype two-way
Bean/XML serializer to demonstrate our case (somewhat similar to
Netscape's JavaScript Beans).
Thanks.
MB
--
Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans
http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069
Will distribute business objects for food.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From norbert at datachannel.com Wed Nov 19 22:08:18 1997
From: norbert at datachannel.com (Norbert Mikula)
Date: Mon Jun 7 16:58:59 2004
Subject: XML and Bean serialization
References:
Message-ID: <34736334.65CAE8CB@datachannel.com>
Mark Baker wrote:
> I see this is a critical move in unifying the "web" and
> "object" implementations of the distributed future (their respective
> *visions* are already practically identical).
I think you might also be interested in some interesting thoughts
by John Tigue.
"XML Enabled Mechanisms for Distributed Computing on the Web"
http://www.datachannel.com/channelworld/feature.htm
--
Norbert H. Mikula
Sr. Online Information Architect
Norbert@DataChannel.com
DataChannel, 155 108th Avenue NE Ste 400, Bellevue, WA 98004
Phone: 425.455.5450 Fax: 425.637.1192 http://www.datachannel.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 428 bytes
Desc: Card for Norbert Mikula
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971119/7b21fae1/vcard.vcf
From peter at ursus.demon.co.uk Thu Nov 20 00:39:09 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:58:59 2004
Subject: Query Languages for XML
In-Reply-To:
Message-ID: <3.0.1.16.19971120000132.09f72e3c@pop3.demon.co.uk>
At 11:02 19/11/97 -0000, Martin Bryan wrote:
>Paul Prescod wrote
>>> Where in XML do we have forms,
>>
>>"To reduce the initial barriers to adoption, a core set of HTML flow
>>objects is recommended in addition to the core DSSSL flow objects. The
>>HTML/CSS formatting model is somewhat different from the DSSSL model,
>>and the inclusion of the HTML/CSS flow objects will make it possible to
>>use XSL with HTML and CSS. It simplifies the targeting of HTML as the
>>output format, and retains consistency of the object model and dynamic
>>behaviors."
>> - http://www.w3.org/TR/NOTE-XSL.html
>>
>>Included in the list are:
>>
>>"FORM
>> INPUT
>> SELECT
>> TEXTAREA"
>>
>>
>>> or any statement that tells anyone what will
>>> happen to data placed into an editable field?
>>
>>This is specified in the HTML 4.0 proposed recommendation which has
>>provisions for interactive processing on either the client or server
>>sides. If and when someone standardizes an updateable document data
>>manipulation language, it can be accessed from these forms just as SQL
>>and ODQL are today.
>>
>So we are constrained to using the types of form objects defined in HTML
>using the processes defined in HTML 4.0, and can add no new functionality
>via XSL?
The XSL/HTML4.0 looks an exciting place to start from (which I had
overlooked). It would seem to be the most appropriate way to think about
forms in XML (rather than developing them from scratch)
Currently XSL (IMO) seems to derive almost entirely from a paper based
metaphor. Although 'screen' is mentioned (just) under SCROLL flowobjects,
these are little more than inanimate chunks of pixels. XML does not address
how to add buttons, etc. This *could* be done by ECMAScript, I suppose, but
it starts to look a bit kludgy.
I am much more concerned with the potential interactive properties of XSL
than laying out text to the nearest micron. I am not disparaging that -
it's very important - but it seems to be the main philosophy behind XSL.
I'd like to see an interactive component built in.
P.
>
>Martin Bryan
>
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Thu Nov 20 02:28:16 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:58:59 2004
Subject: Query Languages for XML
Message-ID: <3.0.32.19971119182940.00a66b80@mailhost.criinc.com>
At 12:01 AM 11/20/97, Peter Murray-Rust wrote:
>The XSL/HTML4.0 looks an exciting place to start from (which I had
>overlooked). It would seem to be the most appropriate way to think about
>forms in XML (rather than developing them from scratch)
>
>Currently XSL (IMO) seems to derive almost entirely from a paper based
>metaphor. Although 'screen' is mentioned (just) under SCROLL flowobjects,
>these are little more than inanimate chunks of pixels. XML does not address
>how to add buttons, etc. This *could* be done by ECMAScript, I suppose, but
>it starts to look a bit kludgy.
>
>I am much more concerned with the potential interactive properties of XSL
>than laying out text to the nearest micron. I am not disparaging that -
>it's very important - but it seems to be the main philosophy behind XSL.
>I'd like to see an interactive component built in.
I think there is some real potential for an extension to XSL to allow
something akin to Microsoft's dHTML (dynamic HTML). I am not sure that it
should neccessarily be in XSL 1.0, and it could be really hairy if people
are using XSL-grove interface to the XML and DOM interface to the output.
I have not quite figured out how to factor in DOM into XSL without making
things really confusing...
But, I tend to agree, XSL allow people to get to Netscape 3.0/IE 3.0 level
from XML, but not the full "4.0" range that people are (justifiably) going
wild over.
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From lauren at sqwest.bc.ca Thu Nov 20 02:39:32 1997
From: lauren at sqwest.bc.ca (Lauren Wood)
Date: Mon Jun 7 16:58:59 2004
Subject: Query Languages for XML
In-Reply-To: <3.0.32.19971119182940.00a66b80@mailhost.criinc.com>; from "lauren" at Wed Nov 19 18:38:51 1997
Message-ID:
Derek Denny-Brown wrote:
% I think there is some real potential for an extension to XSL to allow
% something akin to Microsoft's dHTML (dynamic HTML). I am not sure that it
% should neccessarily be in XSL 1.0, and it could be really hairy if people
% are using XSL-grove interface to the XML and DOM interface to the output.
% I have not quite figured out how to factor in DOM into XSL without making
% things really confusing...
I'm confused by this. The idea of the DOM is to standardize the object
model part of "dynamic HTML" (whatever that might mean; the definition
seems to change with the application that supports it, the person talking
about it, and probably the phase of the moon as well). So what sort of
extension to XSL do you mean? I also don't understand why the XML
would have an XSL-grove interface, and the "output" (what does
output mean?) would have a DOM interface, when the DOM should
be an interface to an XML document...
cheers,
Lauren
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Thu Nov 20 05:20:25 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:58:59 2004
Subject: Three Access Language Paradigms
Message-ID: <199711200517.QAA03826@jawa.chilli.net.au>
> From: Derek Denny-Brown
> At 05:06 PM 11/19/97 +1100, Rick Jelliffe wrote:
> >After the XML document has been parsed and groved, and auto links
> embedded, and transformations and stylesheets applied, and then
> >sent to some output device, that is the publication.
>
> what if the output device is a network interface for sending it to a client
> for interpretation? i.e. it is never intended to be rendered on screen or
> paper?
If you need a second word for this, then "publication" is available. If you
don't "document" is fine. But usually "publication" refers to (the result of)
the processing chains that end at some computer interaction medium
(e.g. a printer or screen).
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Thu Nov 20 14:28:02 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:58:59 2004
Subject: Access Languages are Tied to Schemas
Message-ID: <3.0.3.32.19971120092659.0093cad0@pop.access.digex.net>
I have been searching for the properties that a repository access
language must have. Here I present an argument for why an access
language must be tied to a repository's architecture in the manner
analogous to how SQL and OQL are tied to database schemas. I
infer what this implies for XML DTDs and then ask a question whose
answer I think has important repercussions.
Let's say that a "repository" is any software object that contains
information and that provides a way for clients to read, write,
and modify that information. A client must know how to talk to
the repository in order to get the repository to do anything.
We'll call the language that the client must speak the "access
language." The client uses this language to submit requests and
to understand responses. The server uses this language to make
sense of requests and to submit responses. Both the client and
the repository must house knowledge of this access language.
(The access language may use distinct subset languages for requests
and responses, but both software objects would still have to contain
knowledge of both subset languages. For simplicity, I assume that
requests and responses use the same language, but my argument should
hold even if they are different.)
The access language must convey information in two directions. In
order for the information to be comprehensible, it must be conveyed
in recognizable units. Both the client and the repository must
know how to generate and parse these units. Hence, a standard must
exist to which both sides conform. This standard says what kind of
information units there are and what they look like.
Information units usually have relationships with one another. A
client often cares about accessing units that have a particular
relationship with some other unit. For example, a client might
care to retrieve all liens on a particular property. The access
language must allow a client to select units according to their
relationships with other units. In particular, a client must be
able to identify the relationships of concern. Both the client
and the repository must now be in agreement about the kinds of
relationships that may exist among information units. We find we
also need a standard that says what kinds of relationships there
are and what kinds of information units participate in them.
It seems that the standard has quite a bit to say. It says what
kinds of information units there are, what kinds of information
they contain, what kinds of relationships there are, and what
information units participate in those relationships. What we
have is an object model. This is the kind of thing that OMT and
UML are very good at expressing. We have learned that both the
client and the repository must have knowledge of the same object
model. Moreover, in the spirit of object-oriented design, each
side should harbor some representation of this model. That is,
both sides have components that share a common architecture.
In retrospect, this makes sense. Were the two sides working with
different models we'd have a case of the infamous impedance
mismatch. We normally think of impedance mismatch as occurring
between an object-oriented application and a relational database,
but it can also occur between two object-oriented applications.
One organization may decide that liens are not useful entities in
themselves and so bottle them up with their associated properties
(i.e. properties would be aggregates containing liens, and liens
would not be classes of the schema). Another organization may
want to store liens separately so that they can select all liens
that meet a given criterion (i.e. properties would be associated
with liens, and liens would be classes of the schema). When the
second organization decides to hook its client up to the first
organization's database, the client can neither select among
liens nor properly interpret property objects.
Okay, so we've established the need for industries to standardize
on object models. These standard object models would only say
what the repositories need to look like through an access
language. Any given repository is free to transparently translate
that model into a more suitable internal one. We've also
established the need for access languages to reflect these object
models. SQL and OQL conform to this requirement by having clients
use the language of the database's persistent storage schema. XML
introduces another way to model information, a way that is
distinct from the relational approach but somewhat similar to the
object-oriented approach. XML repositories have schemas too, and
these schemas are defined by the DTDs.
Before concluding I'd like to ask a question whose answer may
have significant repercussions. It seems that by asking an XML
repository to manage information for a particular industry, we
are asking ourselves to create DTDs that model the industry. The
question is this: to what extent are DTDs to specify the object
model of a given industry? More specifically, do we intend for
the following capabilities to fully implement an object model:
(1) the ability of a repository to ensure that the information it
contains is always in conformance with the DTDs, and (2) the
ability of the clients to properly interpret the informational
units and the relationships that the DTDs declare?
In conclusion, it seems that that an access language must impose
architectural constraints on at least a component of a repository
and that these architectural constraints will apply to all
repositories that conform to a particular industry standard. In
particular, it does not seem possible to create individual access
language protocols that won't to some degree constrain the
architectures of the repositories. Such languages are probably
feasible only when we can think of a repository as a flat file
of unrelated information units. Since an object model will have
to be developed for each industry, we might as well standardize
on a way to access object models in general. This way we won't
be asking industries to perform the additional work of inventing
an access language for each object model.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Thu Nov 20 14:41:09 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:58:59 2004
Subject: Query Languages for XML
References:
Message-ID: <34744CC7.E233E27C@technologist.com>
> So we are constrained to using the types of form objects defined in HTML
> using the processes defined in HTML 4.0, and can add no new functionality
> via XSL?
We can add functionality in XSL, but I think that it should be in the
spirit of these basic form elements, in other words XSL should leave the
interactive processing of user interface elements to languages that are
explicitly designed to do it, such as ECMA(Java)script, TCL and Python.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Thu Nov 20 14:42:48 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:58:59 2004
Subject: Query Languages for XML
References: <3.0.1.16.19971120000132.09f72e3c@pop3.demon.co.uk>
Message-ID: <34744D2A.3609B868@technologist.com>
> XML does not address
> how to add buttons, etc. This *could* be done by ECMAScript, I suppose, but
> it starts to look a bit kludgy.
I don't think that it is a generic markup language's role to address how
to add
buttons. ECMAScript is a good language for creating scriptable code
components. XSL and DSSSL are good languages for specifying which
scriptable
code component should be used to represent which XML object.
> I am much more concerned with the potential interactive properties of XSL
> than laying out text to the nearest micron. I am not disparaging that -
> it's very important - but it seems to be the main philosophy behind XSL.
> I'd like to see an interactive component built in.
I don't think that that is its job. XSL specifies a mapping from
structured
document nodes to (perhaps interactive) graphical components. I think it
is
going too far to ask it to also script those components. I would expect
to
make a tree control in DSSSL like this:
(make component system-id: "http://www.controls.are.us.com/tree.js"
parameters: '(()) )
Of course if a huge number of stylesheets needed a tree control, then
it would be a good idea to make a tree control flow object:
(make tree-control width: height: ...)
Then the behaviour would be implicit in the flow object.
Putting the code for the control inside the stylesheet would be, in my
mind,
rather ugly and confusing. Perhaps it wouldn't be too bad if the code
snippet is very short:
(make button onClick: "doit()")
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Thu Nov 20 16:03:52 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:58:59 2004
Subject: Access Languages are Tied to Schemas
Message-ID: <1.5.4.32.19971120160239.00ab6714@pop.mindspring.com>
At 09:26 AM 11/20/97 -0500, Joe Lapp wrote:
>I have been searching for the properties that a repository access
>language must have. Here I present an argument for why an access
>language must be tied to a repository's architecture in the manner
>analogous to how SQL and OQL are tied to database schemas.
Ideally, the logical model exposed by an SGML repository should be the
structure of the document itself, not the implementation details used for a
particular repository architecture. An SGML DTD defines structures in the
same way that the table declarations do for SQL, and in the same way that
the class declarations do for object databases that use OQL.
This is in keeping with the fundamental idea behind object persistence in
object oriented databases: if you use an object oriented database with C++,
your C++ class declarations are your schema. In the same way, if you use a
repository with SGML or XML, the logical model is declared by the DTD.
>A client must know how to talk to
>the repository in order to get the repository to do anything.
>We'll call the language that the client must speak the "access
>language." The client uses this language to submit requests and
>to understand responses. The server uses this language to make
>sense of requests and to submit responses. Both the client and
>the repository must house knowledge of this access language.
If we're talking traditional databases, that means that both sides must know
SQL, or both sides must know OQL, or whatever. Since we are talking SGML or
XML repositories, that means that both sides must know SGML or both sides
must know XML.
>The access language must convey information in two directions. In
>order for the information to be comprehensible, it must be conveyed
>in recognizable units. Both the client and the repository must
>know how to generate and parse these units. Hence, a standard must
>exist to which both sides conform. This standard says what kind of
>information units there are and what they look like.
For an SGML repository, these recognizable units are SGML elements.
Of course, for any particular SGML application, there would also be a DTD
that defines the schema for the applications, and the clients may well have
knowledge of this schema. The server might not need to have this knowledge
in some cases, as long as it knows how to manage SGML in general. And there
may be some clients that do not need this knowledge, either - e.g. a general
purpose querying and browsing client should be written to work for any DTD,
as should a formatting and printing engine, etc.
In order to make general-purpose clients possible, clients must have some
way of asking the repository for the schema - either the DTD schema or the
structure of a particular document.
>Information units usually have relationships with one another. A
>client often cares about accessing units that have a particular
>relationship with some other unit. For example, a client might
>care to retrieve all liens on a particular property. The access
>language must allow a client to select units according to their
>relationships with other units. In particular, a client must be
>able to identify the relationships of concern.
The relationships among objects often express much of the semantics of any
system - "it's not what you know, it's who you know". SGML/XML has two kinds
of relationships: containment and links. Queries should be able to handle
both. This has proven invaluable in OQL and SQL-3.
>We find we
>also need a standard that says what kinds of relationships there
>are and what kinds of information units participate in them.
But this can be quite general, e.g. the definition of SGML/XML. Again, this
is analogous to using C++ or Java to define schemas in object oriented
databases.
>It seems that the standard has quite a bit to say. It says what
>kinds of information units there are, what kinds of information
>they contain, what kinds of relationships there are, and what
>information units participate in those relationships. What we
>have is an object model.
An object model of the kind you discuss here seems like the object model of
a particular application.
>Moreover, in the spirit of object-oriented design, each
>side should harbor some representation of this model. That is,
>both sides have components that share a common architecture.
In the spirit of object oriented systems, metadata is the way one system
finds out about another system, unless they belong to the same application,
in which case they share class declarations. The same should hold for
SGML/XML repositories: programs that are part of the same application may
have knowledge of the DTD, but metadata is the way to write general purpose
programs, and writing general purpose software as much as possible is
usually a big win.
>We normally think of impedance mismatch as occurring
>between an object-oriented application and a relational database,
>but it can also occur between two object-oriented applications.
>One organization may decide that liens are not useful entities in
>themselves and so bottle them up with their associated properties
>(i.e. properties would be aggregates containing liens, and liens
>would not be classes of the schema). Another organization may
>want to store liens separately so that they can select all liens
>that meet a given criterion (i.e. properties would be associated
>with liens, and liens would be classes of the schema). When the
>second organization decides to hook its client up to the first
>organization's database, the client can neither select among
>liens nor properly interpret property objects.
That depends, of course, on how the programs function. As long as I have
access, I can log into anybody's database, browse it, formulate queries to
find information, etc., because I use a general-purpose browsing and query
facility. If I have programs dependent on the classes defined in a
particular schema, then my programs do need to know the schema, e.g. the DTD.
One of the great advantages of architectural forms is that they make it
possible to write programs that work only on an agreed-upon abstract
representation of the schema, and each individual organization can build on
that abstraction to build documents that meet their own needs. This is a
real strength of the HL7 Kona proposal for medical record attachments, which
would allow parties to interchange information based on a set of
well-defined architectural forms, yet allow freedom for each party to
implement their own DTDs based on these architectural forms in order to
accomodate their own needs. This is, of course, analogous to the "design
patterns" approach of object oriented design, which strongly encourages
writing programs that use the abstract base classes which define the
interfaces rather than write programs that use the concrete classes that
implement them.
Jonathan
________________________________
Jonathan Robie
Email: jonathan@texcel.no
Texcel Research, Inc. ("http://www.texcel.no")
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From lauren at sqwest.bc.ca Thu Nov 20 16:19:23 1997
From: lauren at sqwest.bc.ca (Lauren Wood)
Date: Mon Jun 7 16:58:59 2004
Subject: Access Languages are Tied to Schemas
In-Reply-To: <3.0.3.32.19971120092659.0093cad0@pop.access.digex.net>
Message-ID:
> From: Joe Lapp
> Okay, so we've established the need for industries to standardize on
> object models. These standard object models would only say what the
> repositories need to look like through an access language. Any
> given repository is free to transparently translate that model into
> a more suitable internal one. We've also established the need for
> access languages to reflect these object models.
A nice summary of what the principles of the DOM are all about -
defining an interface in a language-independent way that clients and
hosts can implement without necessarily implementing any given
underlying representation of the information. So the DOM is not
really properly named, since it's really the specification of the
interface rather than the object model that we are concerned with.
> Before concluding I'd like to ask a question whose answer may
> have significant repercussions. It seems that by asking an XML
> repository to manage information for a particular industry, we are
> asking ourselves to create DTDs that model the industry. The
> question is this: to what extent are DTDs to specify the object
> model of a given industry? More specifically, do we intend for the
> following capabilities to fully implement an object model: (1) the
> ability of a repository to ensure that the information it contains
> is always in conformance with the DTDs, and (2) the ability of the
> clients to properly interpret the informational units and the
> relationships that the DTDs declare?
One example (though not the only possible) is in the DOM work, which
has three parts.
1) core - this contains the general methods, functions, definitions
which are applicable to HTML and XML documents, e.g., what is a Node,
how is an element represented, how does an attribute relate to the
element it is attached to, etc.
2) HTML -this knows the HTML DTD and therefore can build on top of
the DOM core with functions specific to that DTD
3) XML - this contains the stuff that HTML doesn't need that is in
XML, such as CDATA sections
I could imagine industry-specific versions of part 2), that build on
the DOM core to add DTD-specific functionality for that industry.
> In conclusion, it seems that that an access language must impose
> architectural constraints on at least a component of a repository
> and that these architectural constraints will apply to all
> repositories that conform to a particular industry standard. In
> particular, it does not seem possible to create individual access
> language protocols that won't to some degree constrain the
> architectures of the repositories. Such languages are probably
> feasible only when we can think of a repository as a flat file of
> unrelated information units. Since an object model will have to be
> developed for each industry, we might as well standardize on a way
> to access object models in general. This way we won't be asking
> industries to perform the additional work of inventing an access
> language for each object model.
I think it is possible to build a general API for XML documents, so
if one of your imposed requirements on a repository is that it be in
XML, and a general solution would not require that, then I agree. I
do not agree that an object model must be developed for each industry
- if the access method is standard, then whichever underlying model
of the information a given tool uses doesn't really matter. It will
have implications in performance etc, but it should be possible to
implement the interfaces if they have been reasonably defined.
cheers,
Lauren
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Thu Nov 20 16:56:49 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:59:00 2004
Subject: Access Languages are Tied to Schemas
In-Reply-To:
References: <3.0.3.32.19971120092659.0093cad0@pop.access.digex.net>
Message-ID: <3.0.3.32.19971120115219.0093b740@pop.access.digex.net>
"Lauren Wood" wrote:
>[...]
>I think it is possible to build a general API for XML documents, so
>if one of your imposed requirements on a repository is that it be in
>XML, and a general solution would not require that, then I agree. I
>do not agree that an object model must be developed for each industry
>- if the access method is standard, then whichever underlying model
>of the information a given tool uses doesn't really matter. It will
>have implications in performance etc, but it should be possible to
>implement the interfaces if they have been reasonably defined.
>From reading Jonathan's and Lauren's responses, it looks like I need
to throw in a quick clarification. I agree that a repository need not
have any knowledge of the semantics of a particular industry. We
could use a general SGML repository to store any kind of document,
where the repository's only knowledge of the document is its DTD.
Relational databases (for example) give us this sort of approach, since
they need not understand what is meant by the schemas that are stored
within them. Elements are the informational units of an SGML/XML
repository in the same way that tables and columns and rows are the
informational units of relational databases.
However, each domain does have information units that are specific to
that domain, and they exist as units regardless of the more fundamental
units from which they are constructed. An RDBMS's schema specifies
these domain-specific units, as does an XML-document's DTD. Hence, the
DTD does intend to capture the object-model of a particular domain,
even if this object model is expressed in the language of a more general
object model. I'm asking a question about what we expect our DTD
schemas to accomplish for these domain-specific object models. Do we
expect general SGML/XML repositories to be powerful enough to allow
them to represent almost any domain-specific object model?
BTW, I agree that IDL interfaces are another kind of access language to
a repository and that DOM in particular satisfies the property of
access languages I was arguing for. It provides fundamental contructs
from which domain-specific information units can be built.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Thu Nov 20 17:11:37 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:59:00 2004
Subject: Access Languages are Tied to Schemas
Message-ID: <1.5.4.32.19971120171103.00a78064@pop.mindspring.com>
At 11:52 AM 11/20/97 -0500, Joe Lapp wrote:
>Do we expect general SGML/XML repositories to be powerful
>enough to allow them to represent almost any domain-specific
>object model?
Yes. There are at least three SGML/XML repositories that claim to be able to
import any SGML document, and which also support XML. To my knowledge, none
of them currently supports queries that take advantage of the relationships
expressed in links, but at least two of them support queries that combine
structure and content in at least some form, and which support queries based
on containment relationships.
Jonathan
jonathan@texcel.no
Texcel - http://www.texcel.no
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Thu Nov 20 17:20:24 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:59:00 2004
Subject: Access Languages are Tied to Schemas
In-Reply-To: <3.0.3.32.19971120115219.0093b740@pop.access.digex.net>
References:
<3.0.3.32.19971120092659.0093cad0@pop.access.digex.net>
Message-ID: <3.0.3.32.19971120122026.0096c830@pop.access.digex.net>
Joe Lapp wrote:
>[...] Do we
>expect general SGML/XML repositories to be powerful enough to allow
>them to represent almost any domain-specific object model?
I don't like how I worded the question here. Let's try again: What
sorts of object models do we want to be able to represent in SGML/XML?
An answer that says "whatever SGML/XML can represent as it is currently
defined" doesn't help me here. I care about what we intend to do with
these future repositories and what it's going to take to do it.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Thu Nov 20 17:57:07 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:59:00 2004
Subject: Query Languages for XML
Message-ID: <3.0.32.19971120095847.00a8ce80@mailhost.criinc.com>
At 06:38 PM 11/19/97 -0800, Lauren Wood wrote:
> The idea of the DOM is to standardize the object
>model part of "dynamic HTML" (whatever that might mean; the definition
>seems to change with the application that supports it, the person talking
>about it, and probably the phase of the moon as well). So what sort of
>extension to XSL do you mean? I also don't understand why the XML
>would have an XSL-grove interface, and the "output" (what does
>output mean?) would have a DOM interface, when the DOM should
>be an interface to an XML document...
I am not neccessily saying that DOM = dynamic HTML, but rather it is my
expectaction that dynamic HTML will depend on the DOM model, which from
what little I have glimpsed (admission of a failure to properly look into
it on my part), is quite different from the SGML/XML Grove model. I
envision that a number of the initial XSL implementations which use the
HTML/CSS flow objects, will be based on existing HTML display engines.
These engines, asuming they have any real "dynamic" HTML potential, will be
using javascript/jscript/vbscript and something at least DOMish to provide
the "dynamic" part of the dynamic HTML. Thus I would expect that a XSL
implementation that did more than build a static page would need to work
with these engines using a DOMish interface.
This means in the case of some XSL document, that the 'input' is a XML
document (and a XSL stylesheet) and the output is the screen via this
HTML-based display engine which allows some "dynamic" behaviour via a
DOMish interface. That means that the XSL stylesheet (assuming it is using
some XSL extensions to talk DOMishness with the display engine) is talking
Grove-speak to the original XML document (because that is how XSL was
defined, at least in how I read the spec) and DOMishness to the display
engine (beyond the initial flow-object creation). Having only limited
experience with DSSSL, I really don't have a complete picture of how
XSL/DSSSL could work in an "dynamic" output media environment.
what I mean by "dynamic" in the above paragraphs is that the display engine
has some means to change the (existing) rendering, on the fly. I click the
"Verify" button and all the text fields which have invalid entries become
some nuclear-neon pink, so that I know where my error are, as an example.
Or even better, I can insert some new flow-objects or remove existing flow
objects from the displayed flow-object stream. My classic example of what
I want from a "dynamic" HTML rendering engine is that I can build a "tree"
using the builtin list/list-item flow-objects, where I can expand/collapse
portions of that tree at runtime, without reloading the document.
I hope this emplains a bit. I realize my original post was a (wee-bit)
criptic, and I left out some of my in-between thought processes (as an
excercise to the reader of coarse. )
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Thu Nov 20 18:06:01 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:59:00 2004
Subject: Query Languages for XML
Message-ID: <3.0.32.19971120100709.00a8c5b0@mailhost.criinc.com>
At 09:46 AM 11/20/97 -0500, Paul Prescod wrote:
>Of course if a huge number of stylesheets needed a tree control, then
>it would be a good idea to make a tree control flow object:
One of the things that I see as a potential problem is that HTML etc as it
is used now has 2 (as I count them this side of the morning) relatively
distinct uses.
1) as an alternate form of (relatively) static information.
2) as a (very-basic) cross-platform (g)ui.
XSL and DSSSL are focusing rather hard on (1), but not on (2). That may
not be a bad thing if it is made clear that from the designers point of
view (2) is better left to java, which it would be if the borwser people
could better integrate java into their browsers. the problem is that (2)
often spends a lot of time trying to do a lot of the stuff that the display
engine for (1) already has figured out.
hmm... so maybe what I am looking for is a "standard" way to extend a XSL
processing/display engine with new flow-object types at run-time. Paul,
was it you who talked about this some months ago? Someone did, so it isn't
a new idea.
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Thu Nov 20 18:15:23 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:59:00 2004
Subject: Access Languages are Tied to Schemas
Message-ID: <1.5.4.32.19971120181453.00ad6a4c@pop.mindspring.com>
At 12:20 PM 11/20/97 -0500, Joe Lapp wrote:
>Joe Lapp wrote:
>>[...] Do we
>>expect general SGML/XML repositories to be powerful enough to allow
>>them to represent almost any domain-specific object model?
>
>I don't like how I worded the question here. Let's try again: What
>sorts of object models do we want to be able to represent in SGML/XML?
Any object model, but with some limitations on the extent of the representation.
The following properties of object models are easily represented in SGML/XML:
o Identity
o State
o Type
These properties are not easily represented:
o Behavior (except for in languages that allow methods to be
represented as data, e.g. Java)
o Encapsulation constraints
There are indirect methods for describing inheritance in SGML/XML, but they
are different from the inheritance mechanisms in OO languages.
SGML/XML can represent the data and identity portion of any object model
expressed in C++, Java, CORBA, etc., including the reference network.
Jonathan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Thu Nov 20 18:56:32 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:00 2004
Subject: Generalizing the SGML/XML information model and Releasing MONDO
Message-ID:
[This is a long email so I will also put it online at
"http://www.chimu.com/projects/mondo/" ]
Some recent discussions on xml-dev and c.t.sgml have included query
languages, encoding complex information (trees, graphs, etc.), object
serialization, and extended metamodeling. I recommend enlarging the
scope of these discussions and thinking about aligning SGML/XML with
other disciplines that can help accomplish these tasks. This aligning
would take advantage of the tools and techniques that are already
available in other industries: not just by duplication of design but by
actually merging with more general capabilities. Although alignment has
been successfully done in some areas of SGML/XML I think it is
conspicuously lacking in a crucial place: SGML's information model. By
improving this particular weakness in SGML by taking advantage of
well-established industries, an abundance of other needs become much
more easily satisfied.
Generalizing the SGML/XML information model
-------------------------------------------
The desired applications of SGML/XML have grown beyond the original
focus on documents towards working with much more general information
and processing. SGML is a combination of encoding technology and an
information modeling language. But that modeling language (DTDs and
Groves) is very weak and is constrained by being focused on
document-oriented information. It is also esoteric and not equivalent
to any of the mainstream information modeling approaches.
I recommend considering modeling separately from encoding technology.
For modeling I think object-oriented information models can subsume
SGML's document-oriented models and provide the ability to handle much
more advanced models. Object-oriented information models can be very
general, expressive, and understandable. This allows them to model many
types of information equally well: both document-oriented and more
general information. The strength of object-oriented information
modeling has resulted in an abundance of good analysis, patterns, and
specific models being built using it.
This last point is the most important. If SGML/XML aligns with the
information modeling industry, many more tools will immediately become
available. For describing models you can use the Unified Modeling
Language (UML) and tools such as Rational Rose (and several other
techniques and tools). Implementing models can be done very easily with
most OO languages (with or without generic frameworks), and the
resulting implementation can be far more knowledgeable about the
semantics of the information it is working with. There are many
products that provide persistence and UI presentation that are designed
to work with OO DomainModels. There are standard query languages
(OQL/SQL) and interface languages (CORBA/IDL). The information modeling
industry provides an extensive list of high-quality technologies,
standards, and techniques.
There has been a lot of great work done with SGML/XML in both modeling
(DTDs) and technologies (e.g. HyTime). If this quality work is
integrated into the common environment of OO information modeling and OO
technologies then it will be available to a larger audience. It will
also frequently become easier to understand and more capable because it
can take advantage of the inherent abilities of OO models. For example,
much of HyTime addressing is very easily and flexibly described in terms
of object associations. HyTime becomes more powerful in the general
object context.
This isn't to say everything is easy. There are still the issues of how
to work with different information models on different technologies
(e.g. how smart the objects are) and what additional technologies need
to be provided to reproduce expected SGML functionality (e.g. like
HyTime or extending (through object-methods) OQL with
containment-closure abilities). And some tools would never be
generalized because the SGML DTD&Grove model are sufficient for the task
or the tool is too high a quality to risk moving (e.g. Jade).
Overall, I think the benefits will be enormous.
MONDO
-----
I have been working on a project (called MONDO) to prove the benefits of
this alignment and to provide an architecture and the frameworks to
support it. MONDO is primarily an architecture: it describes the
components (e.g. ObjectBuilder, DomainModel, ObjectEncoder), their
responsibilities, and the interfaces among those components. It is
meant to be open and language neutral.
MONDO will also have a reference implementation in Java (prototypes were
in Java, Perl, and Smalltalk). The current reference implementation
includes frameworks and tools for the normal document-oriented tasks and
also for some more general or object-oriented capabilities. As an
example of the later, MONDO can serialize and deserialize Java objects
to human readable (XML or OML) encodings.
I have been working on MONDO for quite a while and been producing
tangibles (i.e. designs, documentation, and code) off and on for a bit
more than a year. This is the first time I am releasing them openly.
The WWW site currently has some FAQ's, some references (extracted from
the design document), and placeholders and timelines for expected
additions. The references may be especially useful because they provide
a sampling of the integration from these multiple fields. I hope to
have the design document (first pass is about 80 pages) up on the web
site by early next week and will start putting up the reference code
shortly thereafter.
The MONDO WWW site is at:
http://www.chimu.com/projects/mondo/
As an example (teaser ;-) of the MONDO design, I have included a couple
(non-sequential but related) paragraphs below.
======
ObjectBuilder
The responsibility of the ObjectBuilder is to build all or part of the
Objectbase from an external source. Generally this source will be a
human-readable text file, but there are several stages to ObjectBuilding
which can each have different approaches (e.g. we could read from a
binary file instead). Assuming we have a textual file-based approach,
ObjectBuilding would go through three stages:
Read from the text file and produce a stream of text
Parse the text and turn it into a recipe (what objects to build and
what ingredients to use)
Build the recipe and construct objects within the DomainModel
-------
Recipes for building objects
A recipe describes how to build a collection of associated objects. All
the information that is placed into the DomainModel by MONDO is the
result of building recipes. By formalizing recipes we separate the
encoding of information (e.g. whether it is human readable and how to
parse it) from what information is in the encoding. MONDO uses that
information to construct the knowledge in a form we want to work with,
the Objectbase.
======
Any feedback on MONDO or these concepts is appreciated and I hope they
contribute to some of the topics that have been addressed recently. I
will let people know when the main design document is on line and when
the code to work with is downloadable. If you are interested in MONDO
for your application or want to help with the project, let me know.
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From elm at arbortext.com Thu Nov 20 19:16:30 1997
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun 7 16:59:00 2004
Subject: Three Access Language Paradigms
Message-ID: <3.0.32.19971120141133.00b17780@village.doctools.com>
At 12:21 AM 11/20/97 -0500, Rick Jelliffe wrote:
>
>
>> From: Derek Denny-Brown
>
>> At 05:06 PM 11/19/97 +1100, Rick Jelliffe wrote:
>> >After the XML document has been parsed and groved, and auto links
>> embedded, and transformations and stylesheets applied, and then
>> >sent to some output device, that is the publication.
>>
>> what if the output device is a network interface for sending it to a client
>> for interpretation? i.e. it is never intended to be rendered on screen or
>> paper?
>
>If you need a second word for this, then "publication" is available. If you
>don't "document" is fine. But usually "publication" refers to (the result
of)
>the processing chains that end at some computer interaction medium
>(e.g. a printer or screen).
Other names for this that I've heard (and sometimes used):
o Deliverable
o Presentation instance (not as good for non-rendered information)
Eve
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Thu Nov 20 19:27:31 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:59:00 2004
Subject: Access Languages are Tied to Schemas
In-Reply-To: <1.5.4.32.19971120181453.00ad6a4c@pop.mindspring.com>
Message-ID: <3.0.3.32.19971120142812.00972360@pop.access.digex.net>
Jonathan Robie wrote:
>These properties are not easily represented:
>
>o Behavior (except for in languages that allow methods to be
>represented as data, e.g. Java)
>o Encapsulation constraints
I'm not sure what you mean by "encapsulation constraints." OMT uses
a variety of constraints, but none go by that name. Pouring over the
UML documentation I only see the term "constraint" being used in a
general way.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Thu Nov 20 19:35:48 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:59:00 2004
Subject: Access Languages are Tied to Schemas
Message-ID: <1.5.4.32.19971120193521.00aa7580@pop.mindspring.com>
At 02:28 PM 11/20/97 -0500, Joe Lapp wrote:
>Jonathan Robie wrote:
>>These properties are not easily represented:
>>
>>o Behavior (except for in languages that allow methods to be
>>represented as data, e.g. Java)
>>o Encapsulation constraints
>
>I'm not sure what you mean by "encapsulation constraints." OMT uses
>a variety of constraints, but none go by that name. Pouring over the
>UML documentation I only see the term "constraint" being used in a
>general way.
I'm thinking of public/protected/private access in languages like C++, i.e.
the constraints on access to encapsulated data.
Jonathan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Thu Nov 20 20:40:39 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:00 2004
Subject: Access Languages are Tied to Schemas
Message-ID:
Jonathan Robie wrote
> The following properties of object models are easily represented in
> SGML/XML:
> o Identity
> o State
> o Type
I would disagree that even these items can be easily represented in
SGML/XML (for example, State is more complicated than a particular set of
attribute values). I think it is more the other way around: SGML/XML has
a particular model of Identity, State, and Type which an object model can
easily represent.
But in any case, these items are (mostly) the core concept of OO (i.e.
Objects) instead of being properties of object models. Objects have
Identity, State, and Behavior where the implementation of both state and
behavior is encapsulated. Object models describe the possible objects
and structures that can exist in a system. This will include describing[1]:
Types: The interfaces (methods, associations, and abstract
state) that objects can have.
Associations: The possible relationships between objects
Operations: The messages an object can respond to
State Models: The possible state transitions for an object
Attributes: The simple associations (to basic value types) of an
object
Inheritance: The similarities/relationships among types
DTDs can describe some of this modeling information, but not particularly
well and really only for a limited set of object models. Examples of
weaknesses are: only one true association (content) which is a pure
containment, all other attributes must be basic data types, limited
cardinality control, likelihood of arbitrary ordering, inability (or
difficulty) to express Type relationships, inability (or difficulty) for
an Object to support more than one type. These are weaknesses compared
to the most basic modeling abilities of common modeling techniques (UML,
Booch, HOOD, Syntropy, OORAM).
Thought about another way, DTDs are good models for textual input of
information (what rules must be satisfied by the encoding) but this
should be considered only a view onto the true information model.
SGML/XML describes a construction view of an information model and
provides the front-end to instantiating an Objectbase from that model.
Using SGML/XML to try to describe any information model (via DTDs) will
be over extending its abilities into areas where other tools/techniques
are much better qualified.
--Mark
mark.fussell@chimu.com
[1] An implementation of an object model (or an implementation model
developed from a conceptual model) also uses classes, methods, and
instance variables to satisfy the above descriptions within a particular
system. I am trying to use the most established and main-stream
definitions of all these terms, but you may also want to see the
references at the MONDO site for possible different definitions (e.g.
Dictionary of Object Technology [Fireside+E 95]).
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Thu Nov 20 21:31:42 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:59:00 2004
Subject: Access Languages are Tied to Schemas
Message-ID: <1.5.4.32.19971120213117.00ae2538@pop.mindspring.com>
At 12:40 PM 11/20/97 -0800, Mark L. Fussell wrote:
>I would disagree that even these items can be easily represented in
>SGML/XML (for example, State is more complicated than a particular set of
>attribute values). I think it is more the other way around: SGML/XML has
>a particular model of Identity, State, and Type which an object model can
>easily represent.
Our basic difference here is that I am thinking primarily in terms of the
network of objects available in object oriented systems at run-time, with
their metadata (if available), and you seem to be thinking of abstractions
used to create object oriented systems. For instance, the state of an object
is precisely equivalent to the set of attribute values associated with that
object. Either of these can be referred to as an object model, but they are
not the same thing. Also, you may be inferring that I am trying to say that
SGML can be a replacement for CORBA or other distributed object
architectures. No way.
In fact, at this point I am not advocating anything concrete, except that I
think there should be some kind of query language that SGML/XML systems can
use to access data in foreign systems like relational or object oriented
databases, and at present, it makes sense to me that such a query language
should be defined in terms of SGML/XML structure. And I think that SGML/XML
is probably powerful enough for that - at least, it is if we are using it
only for retrieval of information, and not for modification of information;
for instance, everything that is stored in an object oriented database can
be stored in SGML - the object ids can be turned into IDs, containers can be
expressed either through containment or sets of IDREFs, etc. As long as
access is read-only, you aren't losing much. However, you wouldn't want to
modify it through such an interface, since you have lost encapsulation,
polymorphic references, type safety of references, etc.
This is analogous, in some ways, to ODBC access for object oriented
databases, which allows a view on the data in the model, but does not
encompass the full semantics of the object database. Such interfaces are
great for read-only access, but certainly do not replace the need for an
object database, and are not really very good for write access.
Jonathan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Thu Nov 20 21:42:48 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:59:00 2004
Subject: Access Languages are Tied to Schemas
In-Reply-To:
Message-ID: <3.0.3.32.19971120164327.00976670@pop.access.digex.net>
"Mark L. Fussell" wrote:
>[...] Object models describe the possible objects
>and structures that can exist in a system. [...]
>DTDs can describe some of this modeling information, but not particularly
>well and really only for a limited set of object models.
I do think that "object model" is too broad a term for what an SGML/XML
repository can accomplish. I think the SGML/XML repository offers a new
way of looking at objects. Clients may process SGML/XML constructs as
raw document information, or clients may process the constructs by
interpreting them (adding semantic value not provided by the repository).
I'm guessing that most clients that go about interpreting repository
data will create objects that contain those data, and those objects will
have behavior. We have a single object's data living as sibling objects
on many client machines. I'm guessing that these objects (instantiated
on client machines) will all have behavior and any other property we
ascribe to objects of an object model. The object model of an SGML/XML
repository is schizophrenic.
When we evaluate the capabilities of SGML/XML to support object models,
I think we need to take client behavior into account. The repository is
acting more like a file system for the state information of objects, and
the clients are more like applications that use the file system.
This seems like a different model for designing systems, and I wonder
how far we can take it.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Nov 20 21:50:17 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:00 2004
Subject: Generalizing the SGML/XML information model and Releasing
MONDO
In-Reply-To:
Message-ID: <3.0.1.16.19971120211323.1e97f602@pop3.demon.co.uk>
At 10:56 20/11/97 -0800, Mark L. Fussell wrote:
>
>[This is a long email so I will also put it online at
>"http://www.chimu.com/projects/mondo/" ]
>
Mark,
thanks very much for this posting. One of the main goals in setting up
XML-DEV was for the shared development of software and (although I'm
posting this before visiting your site) this looks very valuable.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From markb at iosphere.net Thu Nov 20 22:12:58 1997
From: markb at iosphere.net (Mark Baker)
Date: Mon Jun 7 16:59:01 2004
Subject: Access Languages are Tied to Schemas
In-Reply-To: <3.0.3.32.19971120164327.00976670@pop.access.digex.net>
Message-ID:
On Thu, 20 Nov 1997, Joe Lapp wrote:
> When we evaluate the capabilities of SGML/XML to support object models,
> I think we need to take client behavior into account. The repository is
> acting more like a file system for the state information of objects, and
> the clients are more like applications that use the file system.
No, I think that's what we should be trying to stay away from.
XML is self-describing structured storage - for anything you want to shove
in it. Implementation, state, properties, events, behavioural semantics,
whatever.
Any object I have can be entirely serialized into an XML document and
back again without information loss. The XML document *is* the object.
All I need is a framework to transparently activate documents. Or in
other words, reserialize it from XML into RAM.
So, there are no 'clients' per se. There's browsers, and then there's
serialized objects streaming themselves into them.
My Javasoft proposal mentions Beans specifically, but for those of you not
familiar with Beans, *every* Java object is automatically a Bean in JDK
1.1. So, my proposal to Javasoft isn't a niche idea - it's meant to
apply to all objects.
Now it's also apparently implemented in MONDO. Bonus. Thanks Mark!
MB
--
Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans
http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069
Will distribute business objects for food.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Thu Nov 20 22:32:18 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:01 2004
Subject: Three Access Language Paradigms
Message-ID: <3.0.32.19971120074053.00bac08c@pop.intergate.bc.ca>
At 10:50 AM 19/11/97 -0500, Jonathan Robie wrote:
>In XLL, is there a way to combine conditions with boolean operators?
No; no booleans. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Thu Nov 20 22:44:59 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:59:01 2004
Subject: Access Languages are Tied to Schemas
In-Reply-To:
References: <3.0.3.32.19971120164327.00976670@pop.access.digex.net>
Message-ID: <3.0.3.32.19971120174538.0093b9e0@pop.access.digex.net>
Mark Baker wrote:
>On Thu, 20 Nov 1997, Joe Lapp wrote:
>> When we evaluate the capabilities of SGML/XML to support object models,
>> I think we need to take client behavior into account. The repository is
>> acting more like a file system for the state information of objects, and
>> the clients are more like applications that use the file system.
>
>No, I think that's what we should be trying to stay away from.
>[...]
>Any object I have can be entirely serialized into an XML document and
>back again without information loss. The XML document *is* the object.
>All I need is a framework to transparently activate documents. Or in
>other words, reserialize it from XML into RAM.
I think we are in agreement (I disagree, we agree). An XML document is
capable of representing any object and all aspects of that object. But
an XML document isn't the object it represents. You have to deserialize
that document back into an object before you have the fully featured
object again. An XML repository could store those objects (in their
XML document representation) and even keep the relationships among those
objects, but it does not animate those objects. The objects are alive
when they are deserialized on the clients. To get a repository to
animate the objects you'd have to make the repository a bit more than
just a repository. For one thing, you'd also need a JVM.
>[...]
>My Javasoft proposal mentions Beans specifically, but for those of you not
>familiar with Beans, *every* Java object is automatically a Bean in JDK
>1.1. So, my proposal to Javasoft isn't a niche idea - it's meant to
>apply to all objects.
Oh, I fully agree here, too. Actually, I was thinking of your proposal
to JavaSoft when I wrote that previous post. I intended to mention that
XML repositories could serve as databases for serialized Java objects.
Your idea to use XML to represent serialized Java objects is intriguing.
As a side note, you mention that in JDK 1.1 every object is a bean. I
thought beans had to be serializable. Are you saying that in JDK 1.1
every Java object that ever gets created is serializable?
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Fri Nov 21 01:58:13 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:01 2004
Subject: Access Languages are Tied to Schemas
Message-ID: <199711210155.MAA03460@jawa.chilli.net.au>
> From: Jonathan Robie
> The following properties of object models are easily represented in SGML/XML:
>
> o Identity
> o State
> o Type
>
> These properties are not easily represented:
>
> o Behavior (except for in languages that allow methods to be
> represented as data, e.g. Java)
> o Encapsulation constraints
I think you miss what is perhaps *THE* most important thing that SGML content
models represent: sequence.
This is one of the essential distinguishing features of SGML.
If I have
Refer also to
XML draftathttp://www.w3c.org/TR for more info.
then the sequence of elements and data in to citation element
are vitally critical. Sequence is not an artifact of formatting,
in many cases, but as intrinsic to the data as encapsulation
and so on.
The problem I see with so many discussions of the virtues of object-oriented
inheritance systems is that they fail to discuss how inheritance works with
sequence. It seems to be an issue tucked aside.
For example, if the content model of the above is
and I want to use the citation element type as a supertype, and derive a
new element type with the following content model
so I can say
Refer also to XML draftedited byMcQueen, Bray, Paoliathttp://www.w3c.org/TR for more info.
This kind of adding element types in particular points in sequences is,
as I say, one of the most basic requirements for any real work.
I am very interested in seeing inheritance-based models that address
this issue: that would be great.
The best idea I have some up with is the following: to allow a new keyword
#OTHER (or #ANY)
to be allowed in content models, to represent any one unambiguous element type.
This allows the creator of the original content model the ability to
declare points in content models which are publically available for extension
by derived element types (declared or undeclared).
I currently think that any inheritance-based declaration system must presuppose
such explicit inheritance points. I think it is merely a matter of strong typing
and interface control.
Rick Jelliffe
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From markb at iosphere.net Fri Nov 21 02:05:43 1997
From: markb at iosphere.net (Mark Baker)
Date: Mon Jun 7 16:59:01 2004
Subject: Access Languages are Tied to Schemas
In-Reply-To: <3.0.3.32.19971120174538.0093b9e0@pop.access.digex.net>
Message-ID:
On Thu, 20 Nov 1997, Joe Lapp wrote:
> I think we are in agreement (I disagree, we agree). An XML document is
> capable of representing any object and all aspects of that object. But
> an XML document isn't the object it represents. You have to deserialize
> that document back into an object before you have the fully featured
> object again.
That's one way of doing it of course, and very useful for some
applications, such as dynamic binding of data to behaviour ala compound
document frameworks (and the new beans activation framework). Think of
this as serializing a class.
But in *many* cases, you just want to make the *object* persist simply,
perhaps even on the machine with the browser. This is especially
suitable for agent systems; you bring the ability to persist along with
you instead of attempting to store it "behind" you. It's a move away from
TP-monitor style ACID transactions, and towards a more "make forward
progress" means of distributed computing. Object groups are a good
example of this.
Certainly though, both tools should be available to us. We shouldn't
try to shoehorn everything into a single solution when that solution
isn't general enough for all of our needs.
But, I've got the feeling that we'll be doing a lot more of one than the
other before too long. YMMV. 8-)
>An XML repository could store those objects (in their
> XML document representation) and even keep the relationships among those
> objects, but it does not animate those objects. The objects are alive
> when they are deserialized on the clients. To get a repository to
> animate the objects you'd have to make the repository a bit more than
> just a repository. For one thing, you'd also need a JVM.
Which isn't too difficult nowadays, especially when so much is being done
with the browser (as it should). And a JVM is no different than requiring a
script interpreter.
> As a side note, you mention that in JDK 1.1 every object is a bean. I
> thought beans had to be serializable. Are you saying that in JDK 1.1
> every Java object that ever gets created is serializable?
You're right of course. But you'll find that anything that "makes sense"
to serialize, can be.
MB
--
Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans
http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069
Will distribute business objects for food.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Fri Nov 21 02:07:10 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:59:01 2004
Subject: Access Languages are Tied to Schemas
Message-ID: <1.5.4.32.19971121020612.00b21624@pop.mindspring.com>
At 12:53 PM 11/21/97 +1100, Rick Jelliffe wrote:
>
>
>> From: Jonathan Robie
>
>> The following properties of object models are easily represented in SGML/XML:
>>
>> o Identity
>> o State
>> o Type
>>
>> These properties are not easily represented:
>>
>> o Behavior (except for in languages that allow methods to be
>> represented as data, e.g. Java)
>> o Encapsulation constraints
>
>I think you miss what is perhaps *THE* most important thing that SGML content
>models represent: sequence.
>
>This is one of the essential distinguishing features of SGML.
The purpose of my message was to describe what SGML/XML-based interfaces to
object systems can represent, not to propose that SGML/XML should have the
same inheritance mechanisms as object oriented systems. Whether or not they
should, I think it is pretty clear that they don't.
Jonathan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Fri Nov 21 02:20:19 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:01 2004
Subject: Access Languages are Tied to Schemas
Message-ID: <199711210215.NAA04666@jawa.chilli.net.au>
> From: Mark Baker
> On Thu, 20 Nov 1997, Joe Lapp wrote:
> > When we evaluate the capabilities of SGML/XML to support object models,
> > I think we need to take client behavior into account. The repository is
> > acting more like a file system for the state information of objects, and
> > the clients are more like applications that use the file system.
>
> No, I think that's what we should be trying to stay away from.
>
> XML is self-describing structured storage - for anything you want to shove
> in it. Implementation, state, properties, events, behavioural semantics,
> whatever.
I think this is a good point.
XML/SGML is a markup language (it is concerned with the mechanics of
constraining, labelling and pointing to user-defined hierarchical information)
not a data modeling language. This neutrality is its weakness, in that
may will be suboptimal for any specific job, compared to what you might
do if you have all the resources and brains to tailor a specific notation
and train everyone up in it.
However, most people can only learn a small handful of languages, so
having a standard markup language frees people's brains to concentrate
on the distinguishing specifics of their information, rather than juggling
many different notations in their brains.
This neutrality also explains why XML's content model system is so simple.
SGML has a more complex content model system (inherited inclusions and exclusions,
and a "required anywhere" connector "&"), but they have been found in practise
to complicate matters more than seems warranted. So I think it is useful
to not think of the "poverty" of XML content models, but rather their "modesty"
and "neutrality".
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ht at cogsci.ed.ac.uk Fri Nov 21 10:37:11 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun 7 16:59:01 2004
Subject: New release of xslj
Message-ID: <3590.199711211036@grogan.cogsci.ed.ac.uk>
Version 0.3 of xslj, my XSL-to-DSSSL translator, is now available.
This version includes a number of bug fixes (thanks for reports) and
much improved HTML output when the CSS/HTML flow objects are used.
See http://www.ltg.ed.ac.uk/~ht/xslj.html for information on access, etc.
ht
-----------
Henry S. Thompson, Human Communication Research Centre, University of Edinburgh
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.cogsci.ed.ac.uk/~ht/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Fri Nov 21 12:39:07 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:01 2004
Subject: Access Languages are Tied to Schemas
Message-ID:
Jonathan Robie wrote:
> In fact, at this point I am not advocating anything concrete, except
that I
> think there should be some kind of query language that SGML/XML systems can
> use to access data in foreign systems like relational or object oriented
> databases, and at present, it makes sense to me that such a query language
> should be defined in terms of SGML/XML structure. And I think that
SGML/XML
> is probably powerful enough for that - at least, it is if we are using it
> only for retrieval of information, and not for modification of information;
> for instance, everything that is stored in an object oriented database can
> be stored in SGML - the object ids can be turned into IDs, containers
can be
> expressed either through containment or sets of IDREFs, etc. As long as
> access is read-only, you aren't losing much.
Which would make SGML/XML a presentation model (i.e. similar to a
reporting view) on more sophisticated information bases. This would
inherently be worthwhile if it provided a very understandable model to
the user: more understandable than the underlying database.
One of the nice things about relational databases is the capability of
defining "views" on the data. However simple SQL is compared to (say)
C++, very few end-users can do anything more than a simple join. After
that things get a bit murky and even if the query produces results the
end-user may have no idea (or the wrong idea) of what the answer
means[1]. There are many examples of this (see C.J. Date's writing
especially). But views and reporting tools (and general UI applications)
come to the rescue and provide a simple useful view of the complexity
below them.
SGML/XML could provide a very sophisticated version of this "reporting"
but I think it could be trapped between the ultra-simple HTML and the
more sophisticated information models and would rarely be used outside of
niches (just use an HTML builder on top of a database). So I would
rather see SGML/XML go upward and provide a more accessible interface to
"complete" information models than stay in the middle. By going upward
it immediately gains the rewards that you mentioned earlier in the week:
benefiting from the history/mistakes/knowledge of the database
community.
Actually, I think in concrete terms I would like to be able to change
your suggested OQL from:
select e
from e in SGMLElement,
a in e.attributes,
s in e.subElements
where e.tagName = "SECT1"
and a.tagName = "ID"
and s.tagName = "PARA";
to something like:
select section
from section in Sections
children in section.allChildren
where section.level > 1
and section.title.beginsWith("MONDO")
and children.text.contains("ChiMu")
But still use SGML/XML/OML technology and be working from the same
original encoding.
--Mark
mark.fussell@chimu.com
[1] Part of the problem is because SQL is flawed compared to relational
theory, but it would still be a problem with a better query language.
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Fri Nov 21 15:56:17 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:59:01 2004
Subject: Integrity in the Hands of the Client
Message-ID: <3.0.3.32.19971121105652.0095e600@pop.access.digex.net>
In this posting I'm going to be a little bold and propose that both
the XML and DOM specifications are flawed. The existence of these
flaws ride on the assumption that we care to use SGML/XML to create
domain models for data where the data evolves over time. I'm also
assuming that it is unacceptable for the client objects of a document
to maintain the integrity of the document.
In order for me to most convincingly convey the point, I need you to
bear with me as I explore an example of how we might use XML. I do
not directly suggest how to correct the XML specification, but I
think I end up implying a few different solutions. However, it seems
that the correction to DOM is a bit more straightforward, so I make
the obvious suggestions.
Suppose we want to create a document that contains information about
books and about the authors of those books, and suppose we require
that whenever the document has a book, it also has information about
the author of the book. The document will reside on a server, and
one or more administrators will populate the document from their
clients. Other users will be free to browse the document.
We need to design the DTD for this document. Here is our first pass:
]>
To get a better feel for what we've designed, we create a little sample
document:
Text goes here.Text goes here.Text goes here.Text goes here.Text goes here.
This seems to work. It stores information about books and authors,
and it is not possible to add a book without associating it with
the description of some author. But we can see that it breaks as
soon as we add any other kind of element that has an ID. We know
that every book will eventually have an ID, because we'll soon want
to have an element whose content elements reference the New York
Times Bestsellers. Once we do that, nothing prevents an administrator
(or the client program he or she is using) from indicating that the
author of a book is another book. This DTD will not suffice.
It seems that we might have to use links, but lets look at other
approaches first. We entertain the idea that an author's books
belong to the content of the author. We quickly throw that one out
when we realize that a book can have more than one author.
Now we consider having authors belong to the content of a book,
but we throw that idea out because authors may author many books.
It is possible to put author information in the content of each book,
but then we'd be duplicating the lengthy bio and wasting disk space
as well as introducing the headache of managing duplicate copies.
The same problem arises if we were to duplicate book information
under each of the authors of the book, especially since each book has
a lengthy book description.
So now we ask whether links can do the job. Links allow us to use
URLs and XPointers to reference other elements. For the moment,
consider trying to accomplish our task using a single DTD, so that
all element IDs have the same scope. In this case, the URL of any
link references the document that contains the link, so all of our
distinguishing information resides in the XPointers. The ID()
location term looks useful, but this term cannot constrain the
element type of the element that it references. Using ID() as the
first locator term would not be sufficient to distinguish between
books and authors.
Suddenly a brilliant idea comes to mind. We'll use a locator term
to specify the element and then follow that with the ID()
term to select the idea of the particular element. But
this idea has a problem: when the ID() term appears, it must appear
as the first locator term.
Another idea comes to mind. We could use the following combination
of locator terms:
CHILD(1,authors)(1,author,id,'A3')
Here 'A3' is the identifier of the author. We know that we cannot
try to match the author's name, because more than one author may
have the same name. ID's are guaranteed to be unique.
That seems to work. Something similar could have been accomplished
by separating books and authors into different documents and then
using the URL portion of the href to specify the document that
contains the target element.
However, these link solutions all have one problem: nothing in the
link specification allows a link element declaration to constrain
the kind of resource to which a link links. WD-XML-LINK-970731
indicates that an href is an URL, and that when the URL references
another XML document, XPointer locator terms may be appended to
the URL. I do not see any mechanism by which a link element can
constrain the kind of element that the link references.
I have not been able to find a way to have the document server force
clients to ensure that whenever they add a book, that book is
associated with some author. Clients are given the responsibility
of maintaining the integrity of the document.
The problem grows more complicated when we also ask that no author
exist in the document unless we also have at least one book be
associated with the author. A solution to the first problem would
not be a sufficient change to specifications in order to guarantee
a solution that handles this additional requirement. By having
constraints operate in both directions we now require that every
change to a document occur within a transaction, so that the
document is validated against the DTD only at transaction boundaries.
(If every book had to have at least one author and every author
had to have at least one book, then when it comes time to add a
new book by a new author, the document will not validate against
the DTD after we add one and before we add the other.)
The example I have given here may seem trivial. Surely we can find
a way to live with books that don't have associated author entries
and authors that don't have associated book entries. However, in
general, constraints between elements will be important. For
example, it would not be acceptable to store away an account
deduction entry without having an associated account entry or to
have an account entry that does not have at least one associated
account-owner entry. It seems to me that there are very few domains
that can be represented without these kinds of constraints.
I think the solution to this problem resides partly in the XML
specification and partly in the document access language. A DTD
needs to be able to express these kinds of constraints among
elements, so that the document server can enforce the constraints.
We would then not be relying on the proper behavior of all the
clients that wish to add to or modify the document. (Let me know
if you need an argument for why clients should not hold this
responsibility; I'm assuming we agree on this point.) The access
language also needs to reflect the solution because in order for
a server to implement constraints, all document update operations
must be couched in the language of transactions. That is, every
document update operation must be associated with a transaction.
The DOM model allows us to manage documents from a client, so long
as clients assume part of the responsibility for maintaining object
model constraints. However, if we decide that the document server
is responsible for maintaining these constraints, then the DOM
model as it is currently architected will not suffice, since its
document-update operations are not architected around transactions.
Moreover, I do not see a way to extend the current DOM design so
that it can safely support transactions. One way to correct DOM
is redesign it so that it submits query/edit objects to the server,
where each query/edit object is submitted via a transaction object.
Another way to correct DOM is to add a transaction parameter to all
document-update method signatures. I don't think of this latter
approach as an extension to DOM, since the corrected DOM would not
be backwards-compatible with the current DOM.
I think the XML specification as it currently stands is extremely
well-suited for describing data that does not change over time, but
that it is lacking in specifying how documents are to evolve.
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From paul at arbortext.com Fri Nov 21 16:15:50 1997
From: paul at arbortext.com (Paul Grosso)
Date: Mon Jun 7 16:59:01 2004
Subject: Query Languages for XML
Message-ID: <97Nov21.111319est.18823@thicket.arbortext.com>
At 21:38 1997 11 19 -0500, Lauren Wood wrote:
>Derek Denny-Brown wrote:
>
>% I think there is some real potential for an extension to XSL to allow
>% something akin to Microsoft's dHTML (dynamic HTML). I am not sure that it
>% should neccessarily be in XSL 1.0, and it could be really hairy if people
>% are using XSL-grove interface to the XML and DOM interface to the output.
>% I have not quite figured out how to factor in DOM into XSL without making
>% things really confusing...
>
>I'm confused by this. The idea of the DOM is to standardize the object
>model part of "dynamic HTML" (whatever that might mean; the definition
>seems to change with the application that supports it, the person talking
>about it, and probably the phase of the moon as well). So what sort of
>extension to XSL do you mean? I also don't understand why the XML
>would have an XSL-grove interface, and the "output" (what does
>output mean?) would have a DOM interface, when the DOM should
>be an interface to an XML document...
Not only do I share all of Lauren's confusion, but I'd like to
add that all this discussion about extensions to XSL is quite
premature. There is no XSL to extend. No one can know what XSL
is at this point. If there is any discussion about XSL, it would
be more appropriate to be one of requirements and goals, not one
about details.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jwrobie at mindspring.com Fri Nov 21 16:42:20 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun 7 16:59:01 2004
Subject: Access Languages are Tied to Schemas
Message-ID: <1.5.4.32.19971121164151.00b2cd9c@pop.mindspring.com>
At 04:38 AM 11/21/97 -0800, Mark L. Fussell wrote:
>Which would make SGML/XML a presentation model (i.e. similar to a
>reporting view) on more sophisticated information bases. This would
>inherently be worthwhile if it provided a very understandable model to
>the user: more understandable than the underlying database.
Precisely. This would be equivalent to defining a "document view" for a
database using a DTD.
>SGML/XML could provide a very sophisticated version of this "reporting"
>but I think it could be trapped between the ultra-simple HTML and the
>more sophisticated information models and would rarely be used outside of
>niches (just use an HTML builder on top of a database). So I would
>rather see SGML/XML go upward and provide a more accessible interface to
>"complete" information models than stay in the middle.
The ability to define "document views" for external systems is important
whether or not anything more sophisticated is done. I'm not sure exactly
what you mean by "a more accessible interface to 'complete' information
models". Could you spell that out for me?
I see a big difference between using SGML/XML to create information models
and using SGML/XML to simulate information models that are actually defined
in another paradigm. I think it is important to recognize that the object
oriented model and the SGML/XML document model are significantly different.
SGML/XML can be used as an exchange format or view model for object data,
but it is not an object oriented system. Similarly, SGML/XML can be used as
an exchange format or view model for other kinds of systems, such as
relational databases.
Of course, SGML/XML is a data model in its own right. The data defined by
XL7, for instance, may be defined in documents, but it is the kind of data
traditionally managed in databases, and complex relationships among this
data are possible. I guess what I am saying is that (1) documents are not
just substitutes for objects in object systems, (2) documents can be used to
manage rich data, (3) SGML/XML does not need to be changed into an object
oriented system to make this possible, (4) architectural forms allow great
flexibility in this kind of system.
>Actually, I think in concrete terms I would like to be able to change
>your suggested OQL from:
>
> select e
> from e in SGMLElement,
> a in e.attributes,
> s in e.subElements
> where e.tagName = "SECT1"
> and a.tagName = "ID"
> and s.tagName = "PARA";
>
>to something like:
> select section
> from section in Sections
> children in section.allChildren
> where section.level > 1
> and section.title.beginsWith("MONDO")
> and children.text.contains("ChiMu")
>
>But still use SGML/XML/OML technology and be working from the same
>original encoding.
Let me give a little background: the first query is slightly modified from
an actual query for an object oriented database that contains SGML data. The
only modification that I made was to change some of the names of the classes
used to store the data, which is basically like changing the names of the
tables in a relational database. My query assumes that there is not a new
database type or a new table for each element type, but that the data model
for the relational or object oriented database is quite simple, representing
elements, their children, and their attributes. In an object oriented
database, your query would require that each element type be registered as a
separate class in the class dictionary for the database. I think that it
will probably be easier to implement queries of the first kind in existing
object-oriented and object-relational databases.
But I think we are pretty much in agreement that full-text and other text
operators would be useful, that boolean operators are important (as well as
precedence), that path expressions of some kind are important to allow
queries to utilize the structure of SGML containment (and, if possible,
references), etc.
Jonathan
Jonathan Robie
jonathan@texcel.no
Texcel Research http://www.texcel.no
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Fri Nov 21 16:47:41 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:01 2004
Subject: Query Languages for XML
References: <3.0.32.19971120100709.00a8c5b0@mailhost.criinc.com>
Message-ID: <3475BBE8.9676A18A@technologist.com>
Derek Denny-Brown wrote:
>
> One of the things that I see as a potential problem is that HTML etc as it
> is used now has 2 (as I count them this side of the morning) relatively
> distinct uses.
> 1) as an alternate form of (relatively) static information.
> 2) as a (very-basic) cross-platform (g)ui.
>
> XSL and DSSSL are focusing rather hard on (1), but not on (2).
I'm not sure what you mean by that. XSL as currently proposed has access
to all of the form features of HTML, just as it has access to all of the
static display features of HTML. It is correct to argue that we are
spending more effort on *improving* HTML's static display features than
improving its form features, but I think that that is probably
appropriate considering the market's interest in better static pages,
SGML's particular strengths in that area and Java's suitability for
forms.
> hmm... so maybe what I am looking for is a "standard" way to extend a XSL
> processing/display engine with new flow-object types at run-time. Paul,
> was it you who talked about this some months ago?
Yes, I looked into this, and will talk about it at SGML/XML 97. I was
more interested in compound "heavy weight" flow objects like "title",
"section", "table of contents" and so forth. There are some tricky
issues with even these simple compound objects and the issues get
trickier when you want to talk about new primitives (how do they
negotiate real estate? how much information do they need to negotiate
properly? what about line breaking?).
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Fri Nov 21 17:01:52 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:01 2004
Subject: Integrity in the Hands of the Client
Message-ID: <3.0.32.19971121085250.00bbce94@pop.intergate.bc.ca>
At 10:56 AM 21/11/97 -0500, Joe Lapp wrote:
>In this posting I'm going to be a little bold and propose that both
>the XML and DOM specifications are flawed.
Mr. Lapp has discovered one of the well-known shortcomings of SGML,
inherited by XML; namely, the typing and constraint mechanisms supplied
by DTDs are well-known to be insufficiently rich to allow their use
for purposes which we have come to expect of database schemas.
More obviously, if, in Mr. Lapp's example, I wanted to give prices
for the books, I might want to be able to say that this has to be
a number, with 2 digits right of the decimal point. SGML doesn't
help you here either.
Yes; we need a new and richer form of schema. No boldness is
required. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Fri Nov 21 17:07:13 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:01 2004
Subject: Integrity in the Hands of the Client
References: <3.0.3.32.19971121105652.0095e600@pop.access.digex.net>
Message-ID: <3475C074.BEEB9A03@technologist.com>
Joe Lapp wrote:
>
> Once we do that, nothing prevents an administrator
> (or the client program he or she is using) from indicating that the
> author of a book is another book. This DTD will not suffice.
...
> However, these link solutions all have one problem: nothing in the
> link specification allows a link element declaration to constrain
> the kind of resource to which a link links.
...
Neither SGML nor XML DTDs are meant to, nor will ever be able to express
all interesting semantic constraints. SGML/XML cannot even express all
interesting *syntactic* constraints (try to make an attribute that
allows only valid DOS filenames). The question of what is the right
balance of simplicity and constraint expression is an interesting one,
and one that should be rethought from time to time. But the inability to
express a *particular* constraint is not evidence that the language is
fundamentally flawed. The only language that could express all
interesting contraints would be a Turing-complete one.
I've toyed with the idea of a DSSSL subset (DSSSL-Check?) that would
return a list of error messages, or the empty list of the document was
conforming. The DTD would express simpler constraints and the
DSSSL-Check Spec would express the more complex ones. In a graphical
editor, the DTD constraints would probably checked in real-time and the
DSSSL-Check constraints would be checked periodically (since they could
conceivably be quite slow).
RDF may be a useful system in-between these two extremes. It is more
concerned with semantics (and probably less with syntax) than SGML, but
is not Turing complete.
> owever, in
> general, constraints between elements will be important. For
> example, it would not be acceptable to store away an account
> deduction entry without having an associated account entry or to
> have an account entry that does not have at least one associated
> account-owner entry. It seems to me that there are very few domains
> that can be represented without these kinds of constraints.
It is worth noting that SQL does not provide a complete system for
expressing all interesting constraints in relational databases. That's
why "business logic" often resides in proprietary stored procedures or
on completely separate application servers.
> The access
> language also needs to reflect the solution because in order for
> a server to implement constraints, all document update operations
> must be couched in the language of transactions. That is, every
> document update operation must be associated with a transaction.
Please explain this point.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From markb at iosphere.net Fri Nov 21 17:12:55 1997
From: markb at iosphere.net (Mark Baker)
Date: Mon Jun 7 16:59:01 2004
Subject: Integrity in the Hands of the Client
In-Reply-To: <3.0.3.32.19971121105652.0095e600@pop.access.digex.net>
Message-ID:
On Fri, 21 Nov 1997, Joe Lapp wrote:
> In this posting I'm going to be a little bold and propose that both
> the XML and DOM specifications are flawed.
Bold's good. I like bold.
But I'm going to be just as bold and suggest that it is your use of
XML/DOM that is giving you problems, not the specs themselves.
>The existence of these
> flaws ride on the assumption that we care to use SGML/XML to create
> domain models for data where the data evolves over time.
Okay, so let's investigate how XML (and a couple words on DOM) are, IMO,
just fine for this.
> I'm also
> assuming that it is unacceptable for the client objects of a document
> to maintain the integrity of the document.
Amen. Once you've done encapsulation and data-hiding, there's no going back.
> Suppose we want to create a document that contains information about
> books and about the authors of those books, and suppose we require
> that whenever the document has a book, it also has information about
> the author of the book. The document will reside on a server, and
> one or more administrators will populate the document from their
> clients. Other users will be free to browse the document.
>
> We need to design the DTD for this document. Here is our first pass:
Ok, let me stop you right there.
A DTD is a fixed statement of structure. If you use one, you better be
darned sure that that structure isn't going to change anytime soon. As
we see from your example, you were struggling to define that structure
(as anybody would have given the same task).
So, what to do?
Go finer-grained. Ask yourself what doesn't change over time. In this
example, you know that you have books and authors. So why not give each
of those their own document type?
Furthermore, the relationship itself between a book and an author might
also be treated as a document type.
Sound too funky? Consider that that's exactly what is done in
loosely coupled structural OO work, or before that, first-normal-form
entity/relationship schemas.
CORBA has the Relationship service for just this kind of functionality
for objects. Objects can create, destroy, type, and navigate directed
relationships at runtime.
Maybe for this example, it's a bit heavy-weight. I'm not sure. But
with just an author DTD, a book DTD, and XML-Links, you could get the
same job done - perhaps not quite as flexibly (since dependancies are
introduced within the documents themselves), but just as functionally
capable.
BTW, this is the same reason that a stream of serialized-to-XML Java
objects won't have a DTD. The structure of a set of objects is only
guaranteed to be known at runtime. But these streams will still be
well-formed.
> I have not been able to find a way to have the document server force
> clients to ensure that whenever they add a book, that book is
> associated with some author. Clients are given the responsibility
> of maintaining the integrity of the document.
The OMG's OMA has a place holder for a "Rules Facility" that does exactly
this. It allows arbitrary rules (including structural) to be hung off
the ORB as objects/documents, and the ORB is responsible for enforcing these
rules.
See, for example;
http://www.jeffsutherland.org/oopsla97/rouvellou.html
> The DOM model allows us to manage documents from a client, so long
> as clients assume part of the responsibility for maintaining object
> model constraints.
That depends who the 'client' is. If it's a traditional application,
then yes, that's bad. But it might be something on another "level"
(hopefully you'll understand what I mean by that by these examples),
such as a Rules Facility or Persistence service, in which case it's ok -
because their job is to maintain the internal integrity of the object.
>However, if we decide that the document server
> is responsible for maintaining these constraints, then the DOM
> model as it is currently architected will not suffice, since its
> document-update operations are not architected around transactions.
I don't see the need for two reasons. First, I would never use DOM
(or any other mechanism) to try and break the encapsulation of my
documents. Second, as I stated in my last message, transactions are an
overrated means of reasoning about distributed systems. They try and
make distributed processing look like local processing, when we now know
how impractical that view is.
MB
--
Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans
http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069
Will distribute business objects for food.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From howardk at paradigmdev.com Fri Nov 21 17:18:51 1997
From: howardk at paradigmdev.com (Howard Katz)
Date: Mon Jun 7 16:59:02 2004
Subject: Integrity in the Hands of the Client
Message-ID: <57B675B21506D1118BAB0060081C295D168029@VSERVER>
Mark, would you mind expanding just a bit on the following paragraph?
I'm not seeing what your point is:
BTW, this is the same reason that a stream of serialized-to-XML
Java
objects won't have a DTD. The structure of a set of objects is
only
guaranteed to be known at runtime. But these streams will still
be
well-formed.
Thanks,
Howard Katz
> -----Original Message-----
> From: Mark Baker [SMTP:markb@iosphere.net]
> Sent: Friday, November 21, 1997 9:08 AM
> To: Joe Lapp
> Cc: xml-dev@ic.ac.uk
> Subject: Re: Integrity in the Hands of the Client
>
> On Fri, 21 Nov 1997, Joe Lapp wrote:
> > In this posting I'm going to be a little bold and propose that both
> > the XML and DOM specifications are flawed.
>
> Bold's good. I like bold.
>
> But I'm going to be just as bold and suggest that it is your use of
> XML/DOM that is giving you problems, not the specs themselves.
>
> >The existence of these
> > flaws ride on the assumption that we care to use SGML/XML to create
> > domain models for data where the data evolves over time.
>
> Okay, so let's investigate how XML (and a couple words on DOM) are,
> IMO,
> just fine for this.
>
> > I'm also
> > assuming that it is unacceptable for the client objects of a
> document
> > to maintain the integrity of the document.
>
> Amen. Once you've done encapsulation and data-hiding, there's no
> going back.
>
> > Suppose we want to create a document that contains information about
> > books and about the authors of those books, and suppose we require
> > that whenever the document has a book, it also has information about
> > the author of the book. The document will reside on a server, and
> > one or more administrators will populate the document from their
> > clients. Other users will be free to browse the document.
> >
> > We need to design the DTD for this document. Here is our first
> pass:
>
> Ok, let me stop you right there.
>
> A DTD is a fixed statement of structure. If you use one, you better
> be
> darned sure that that structure isn't going to change anytime soon.
> As
> we see from your example, you were struggling to define that structure
>
> (as anybody would have given the same task).
>
> So, what to do?
>
> Go finer-grained. Ask yourself what doesn't change over time. In
> this
> example, you know that you have books and authors. So why not give
> each
> of those their own document type?
>
> Furthermore, the relationship itself between a book and an author
> might
> also be treated as a document type.
>
> Sound too funky? Consider that that's exactly what is done in
> loosely coupled structural OO work, or before that, first-normal-form
> entity/relationship schemas.
>
> CORBA has the Relationship service for just this kind of functionality
>
> for objects. Objects can create, destroy, type, and navigate directed
>
> relationships at runtime.
>
> Maybe for this example, it's a bit heavy-weight. I'm not sure. But
> with just an author DTD, a book DTD, and XML-Links, you could get the
> same job done - perhaps not quite as flexibly (since dependancies are
> introduced within the documents themselves), but just as functionally
> capable.
>
> BTW, this is the same reason that a stream of serialized-to-XML Java
> objects won't have a DTD. The structure of a set of objects is only
> guaranteed to be known at runtime. But these streams will still be
> well-formed.
>
> > I have not been able to find a way to have the document server force
> > clients to ensure that whenever they add a book, that book is
> > associated with some author. Clients are given the responsibility
> > of maintaining the integrity of the document.
>
> The OMG's OMA has a place holder for a "Rules Facility" that does
> exactly
> this. It allows arbitrary rules (including structural) to be hung off
>
> the ORB as objects/documents, and the ORB is responsible for enforcing
> these
> rules.
>
> See, for example;
>
> http://www.jeffsutherland.org/oopsla97/rouvellou.html
>
> > The DOM model allows us to manage documents from a client, so long
> > as clients assume part of the responsibility for maintaining object
> > model constraints.
>
> That depends who the 'client' is. If it's a traditional application,
> then yes, that's bad. But it might be something on another "level"
> (hopefully you'll understand what I mean by that by these examples),
> such as a Rules Facility or Persistence service, in which case it's ok
> -
> because their job is to maintain the internal integrity of the object.
>
> >However, if we decide that the document server
> > is responsible for maintaining these constraints, then the DOM
> > model as it is currently architected will not suffice, since its
> > document-update operations are not architected around transactions.
>
> I don't see the need for two reasons. First, I would never use DOM
> (or any other mechanism) to try and break the encapsulation of my
> documents. Second, as I stated in my last message, transactions are
> an
> overrated means of reasoning about distributed systems. They try and
> make distributed processing look like local processing, when we now
> know
> how impractical that view is.
>
> MB
> --
> Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML,
> Beans
> http://www.iosphere.net/~markb distobj@acm.org
> ICQ:5100069
>
> Will distribute business objects for food.
>
> xml-dev: A list for W3C XML Developers. To post,
> mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ser at javalab.uoregon.edu Fri Nov 21 17:42:18 1997
From: ser at javalab.uoregon.edu (Sean Russell)
Date: Mon Jun 7 16:59:02 2004
Subject: XML on the web: docproc 2
Message-ID: <3475C95A.23524B2@javalab.uoregon.edu>
Hiho,
I'm just now getting around to announcing docproc 2, an XML + XSL
document processor. This is a beta release, and I welcome feedback.
docproc is currently installed on javalab.uoregon.edu and is functioning
as a Servlet.
The URL for the docproc documentation and distribution site is:
http://javalab.uoregon.edu/ser/software/docproc_2/docs/index.xml
There are several pages on Javalab which have been XML-ized, as test
cases. I have spent most of my time working on the docproc package, and
the style sheets for most of these pages are not particularly clever.
The "document" style sheet is, however, rather complex, and it is this
stylesheet which the docproc documentation page uses. One test page
URL, which will lead you to other test pages, is:
http://javalab.uoregon.edu/vlab/select.xml
To retrieve and view the XML source of any given XML page on javalab,
replace "javalab" in the URL with "jersey." Jersey is running Apache,
without docproc, and has NFS access to the same documents as Javalab.
Please be aware that Javalab is a testbed, and that you may experience
delays or periods of downtime. In particular, the JavaWebServer on
Javalab has been having problems processing delivering non-XML
documents. This has nothing to do with docproc.
Thank you, and again, please send me your feedback.
--- SER
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Fri Nov 21 18:06:15 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:02 2004
Subject: Inheritance (was: Access Languages are Tied to Schemas)
References: <199711210155.MAA03460@jawa.chilli.net.au>
Message-ID: <3475CE52.2321C593@technologist.com>
Rick Jelliffe wrote:
> The best idea I have some up with is the following: to allow a new keyword
> #OTHER (or #ANY)
> to be allowed in content models, to represent any one unambiguous element type.
> This allows the creator of the original content model the ability to
> declare points in content models which are publically available for extension
> by derived element types (declared or undeclared).
>
> I currently think that any inheritance-based declaration system must presuppose
> such explicit inheritance points. I think it is merely a matter of strong typing
> and interface control.
Strong typing and interface control are issues of subclassing, not
inheritance. Inheritance is just a code reuse mechanism. Unlike
subclassing, it does not allow more expressive DTDs to be created (which
is, presuamably, what you are talking about). I think that we must keep
these ideas separate in our mind if we are to make progress on either
front. Their conflation is, (IMO) just a historical mistake driven by
early compiler limitations and performance considerations that do not
apply to SGML. Both concepts are useful in SGML, but they should be
separate, just as they are in most modern OO programming languages (C++,
Java, CLOS, Python, etc.), even those which conflate them in the syntax.
I described the difference in:
http://www.lists.ic.ac.uk/hypermail/xml-dev/9710/0077.html
Anyhow, you can emulate OTHER using subclassing without a first class
OTHER construct.
Now URLs can go in CITATONS after the date.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From markb at iosphere.net Fri Nov 21 18:38:43 1997
From: markb at iosphere.net (Mark Baker)
Date: Mon Jun 7 16:59:02 2004
Subject: Integrity in the Hands of the Client
In-Reply-To: <57B675B21506D1118BAB0060081C295D168029@VSERVER>
Message-ID:
On Fri, 21 Nov 1997, Howard Katz wrote:
> Mark, would you mind expanding just a bit on the following paragraph?
Of course not.
> I'm not seeing what your point is:
>
> BTW, this is the same reason that a stream of serialized-to-XML
> Java
> objects won't have a DTD. The structure of a set of objects is
> only
> guaranteed to be known at runtime. But these streams will still
> be
> well-formed.
Picture a container Bean (i.e. the GlasgowSpec - a BeanContext). When
you design that container, you only know that it will hold other Beans -
not necessarily which other Beans. Your container may publish services
for use by contained Beans. It might, and likely will, contain Beans that
were developed after it was developed. Some of those Beans might also be
containers.
Now, imagine serializing that container at runtime. Can you tell me its
structure *now* (I mean *right* now, as you're reading this - aka design
time)? If not, then you can't use a DTD. The stream itself will be
responsible for describing the structure implicitly, not some separate
static DTD.
Isn't this what well-formed XML documents were meant to address? That
you could still create self-describing documents even when you didn't
know the structure a priori? Based on some of the discussions I've read
on the list archives, I do get the impression that this capability of
XML isn't being used to its fullest potential.
> Thanks,
My pleasure.
MB
--
Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans
http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069
Will distribute business objects for food.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Sat Nov 22 06:32:46 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:02 2004
Subject: Integrity in the Hands of the Client
Message-ID: <199711220630.RAA28483@jawa.chilli.net.au>
> From: Joe Lapp
> This seems to work. It stores information about books and authors,
> and it is not possible to add a book without associating it with
> the description of some author. But we can see that it breaks as
> soon as we add any other kind of element that has an ID. We know
> that every book will eventually have an ID, because we'll soon want
> to have an element whose content elements reference the New York
> Times Bestsellers. Once we do that, nothing prevents an administrator
> (or the client program he or she is using) from indicating that the
> author of a book is another book. This DTD will not suffice.
The SGML standard explictly says that the SGML markup declarations only
form part of the definition of a document type. So you are being no
more bold than the SGML standard. (The contraction DTD is actually
the "Document Type Definition" not the "Document Type Declarations"
by the way, as further evidence of this distinction.)
People expect XML/ SGML to provide a way to do everything, then get
surprised that it doesnt. It does not intend to. It is not a format
for modeling data; it is a language for marking up data with enough
information that your clever programs can make use of it. XML/SGML's
validation only extends to very simple content models and to making
sure that IDs are unique, just for this purpose.
The problem you describe above is very simply dealt with. Make an "application
requirement" that all IDs for books start with one prefix, and that
all IDs for authors start with another. This is very common practise in
the industry. You can write simple external validating code to enforce
it, and it only requires a single line of plain English to document it.
It is almost universal practise among experienced DTD writers to specify
unique prefixes for IDs of different types. I recommend it to anyone
writing XML systems. The simplest way is to just use a contracted form
of the element type name (or the current element or its distinguishing
container) as the prefix.
There is an ISO standard way (part of the SGML Extended Facilities of HyTime'97
which is on the WWW) to mark this up. The Lexical Definition annex lets
you give (in one fixed attribute) a POSIX regular expression to constain the
format of another attribute. So you can specify that IDs and IDREFs have
a common prefix, for particular element types. (Of course, your software
then needs to implement this standard to be able to use the information, but
that is no different from any other markup.)
It is just false that SGML (the family of technologies: ISO 8879, ISO 10774,
ISO 9070, etc) does not provide a way to use regular expressions (or any
other syntax you choose) to provide models for data. The lexical typing
facilities have been on the books for 5(?) years now, and have just been
overhauled in HyTime '97 standard. However, because SGML systems do not
have to provide it to be conforming, few have, as part of their standard configuration, so far. XML has taken exactly the same
road as SGML
and left more useful data validation to the application to take care of.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Sat Nov 22 09:06:36 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:02 2004
Subject: Recipes for Information
Message-ID:
This is somewhat related to the recent threads on Integrity and
Inheritance. It is again a bit long so it will be duplicated at MONDO
(www.chimu.com/projects/mondo).
========
I suggest that SGML/XML be perceived as a markup language to describe how
to build information instead of describing (and modeling) the information
itself. This may appear to be a subtle distinction but it has a lot of
implications.
I will start with a recent concrete example from Rick Jelliffe
:
This says a citation is composed of (through its content) a title, text,
and url. But do not view that as the information model of a citation;
consider it a recipe for a citation. We can build a citation if we
supply the three (named) ingredients: title, text, and url. The detail
of the resulting information (which I will call an object) is unknown.
It is likely that the citation object will have these three attributes,
but it could have more or it could even discard some of them (in which
case the recipe included information that the model did not need).
If we have a different element that requires more information we could
have a different recipe:
The object that results from this recipe might be the same type as a
citation object, a subtype of the citation object (i.e. treatable as a
citation object but has more capabilities), or even an unrelated type of
object. For the moment we will abstain on discussing anything about the
objects resulting from the DetailedCitation and the Citation recipes [why
I started capitalizing will be explained later too].
What about combining the two recipes into a single element? We could
combine them as:
This would be ambiguous (in SGML terms) for the first two but all of
them are bad recipes. They are bad because we (or the computer) must
look at all the content to know which version we are using. This is
analogous to reading a whole recipe before we can be sure what we are
trying to make. It would be better to more clearly separate the options
from the requirements if you choose that option. Our original version
separated the recipes through the elements:
We could also do this with:
or:
In these forms it is explicit what we are trying to build (or at least
the complexity is dramatically reduced). We do not have to look into the
details of the information itself.
RECIPES
=======
Now I will ask for a leap of faith.
Consider separating ELEMENTs between Recipes that build objects and
Parameters that name the ingredients that are required for a particular
recipe. As an architectural-form it would look like this:
Although in the content model parameters are sequential, their order is
insignificant semantically. Each parameter must have a unique name, so
consider them to be and-ed together instead of seq-ed. Sort of like:
or like required element attributes.
As a convention I will capitalize the Recipes and keep parameters in
lowercase. Now returning to our example, to build a Citation required
three parameters:
The original ordering of the parameters is irrelevant to the
informational content because each parameter is uniquely named, it is
only a presentation/encoding restriction to have them be sequential.
Also, the parameters do not describe the Types of the ingredients, just
the Role of them in building the recipe. All of 'title', 'text', and
'url' could be simple strings:
Or any of them could have a more complex type. By separating the two
types of elements we can
Be very explicit about what we are constructing
Have a great deal of flexibility for reuse of elements
Use very simple content models that produce complex structures
Note that although the '&' is considered complex to implement, this
particular use of it has the same form as attributes: Parameters are
unordered and possibly required.
Shortcuts
---------
You might have noticed that String cheats: a String does not follow the
required Recipe pattern of having only parameters in content. This is a
convenience shortcut Recipe [OK, and an insanity prevention device],
which makes putting strings of text into this format more easily.
Similarly we will probably need to have a shortcut for Lists (sequences)
of objects:
With these additions we have to modify our original description of the
architectural-form of Recipes to:
Recipes, DTDs, and DomainModels
-------------------------------
Each Recipe builds an object. What is the type of this object and how
does it relate to the ELEMENT content model? I propose (and agree with
others proposing) that there should be no required connection between the
rules of a recipe (the DTD) and the rules of the DomainModel objects
built from that recipe. Objects can have far more complex relationship
rules than DTDs can describe and the DTD will either over-constrain or
under-constrain the built objects.
Instead consider the DTD as similar to a UI Form. You may want to place
things in a particular order and group them together:
Person
FirstName LastName
SSN
Children
FirstName LastName
But this is a presentation of the (view independent) information model
that has a person with several attributes and associations in no
particular order (even children do not need to be explicitly ordered for
orderings can be derived from [for example] the child's birthdate). The
UI/DTD can place constraints (like a SSN has a 123-45-6789 format) but it
should be very careful about these constraints (what about 99- SSNs) or
really delegate the responsibility of validation to the DomainModel. But
simplified views are still useful.
DTDs can still be used to produce an information model but it should be
possible to unlink the information model and have it start a more robust
life of its own (or the dependency reversed). The Recipes should still
be useful because they encode the knowledge required to build the
information independently of how precisely or extensively it is modeled
(up to a point). The recipes can live on as the model grows.
And, in a strange circularity, information models are also (obviously)
information so they can again be encoded as recipes in SGML/XML and used
as metadata for the domain model. So although DTDs are not good
information models, there is nothing stopping SGML/XML from being a good
encoding for good information models.
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Sat Nov 22 09:34:02 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:02 2004
Subject: Sequence Access Languages ...
Message-ID:
Rick Jelliffe wrote:
> I think you miss what is perhaps *THE* most important thing that SGML
> content models represent: sequence.
> This is one of the essential distinguishing features of SGML.
> If I have
>
>
Refer also to
>
> XML draft
> at
> http://www.w3c.org/TR
> for more info.
>
> then the sequence of elements and data in to citation element
> are vitally critical. Sequence is not an artifact of formatting,
> in many cases, but as intrinsic to the data as encapsulation
> and so on.
[SNIP to possible Content Model]
>
I think your example shows the opposite. There is no information change
between any of the orderings within the citation: vs.
vs. etc. You may consider the
desired presentation and encoding order to be only the first but that
would be a view onto the information and not a property of the
information itself. You could alternatively define an attribute that
says citations look good in English in that particular order. Or maybe
the 'at' should be derived and the content model is simply:
This works well with your next example too:
>
becomes:
Depending on whether the editor is included or not, different text would
be generated at presentation.
The generated text could still be encoded in SGML but as separate
information:
atedited by
I am not saying sequence is unimportant, but I think SGML is overly
focused on it (from an IM perspective) because it comes from a
paper/linear background. Information is rarely linear: it is only time
that is, which has caused some media [and the humans who use them] to be
(mostly) linear also. It can be difficult to break that linear
assumption when it doesn't apply if your tools keep reinforcing it.
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Nov 22 09:50:25 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:02 2004
Subject: Integrity in the Hands of the Client
In-Reply-To: <199711220630.RAA28483@jawa.chilli.net.au>
Message-ID: <3.0.1.16.19971122104401.2a87a84a@pop3.demon.co.uk>
At 17:26 22/11/97 +1100, Rick Jelliffe wrote:
[...]
>It is just false that SGML (the family of technologies: ISO 8879, ISO 10774,
>ISO 9070, etc) does not provide a way to use regular expressions (or any
>other syntax you choose) to provide models for data. The lexical typing
>facilities have been on the books for 5(?) years now, and have just been
>overhauled in HyTime '97 standard. However, because SGML systems do not
>have to provide it to be conforming, few have, as part of their standard
configuration, so far. XML has taken exactly the same
>road as SGML
>and left more useful data validation to the application to take care of.
We are at a very exciting, but critical, time in the development of XML and
I am very heartened by the quality and amount of debate on this list. I
sense that there is a steady influx of people who have had little or no
exposure to 'traditional' SGML and are discovering its power and
limitations in an empirical manner :-) [If so, I have particular empathy,
as I come from outside the SGML community and have never created an SGML
document for 'production' purposes.]
XML will be used by vastly more people that current practise SGML. That is
both liberating and a cause for concern. It's certainly likely that useful
methods already developed in SGML will often not be used simply because
people don't know about them. Similarly there are often standards in other
disciplines which map directly onto XML problems. Where possible they
should be used.
In many cases the XML specs (including XLL and XSL) deliberately do not say
how something should be done - only what syntax should be used. The WG has
(often rightly) taken the view that it should not prescribe ways of doing
things. But we are not at - or very near to - the time when people will
start doing things and there is a danger that we shall end up with serious
inconsistencies. For example, when Britain first invented and developed
railways there were two gauges (4' 8.5", and 8') and Baker Street station
in London had both. Australia had (?5) and I gather is only now
rationalising them (Rick?). As an example, if we use DATEs in XML I think
we need a good reason not to use ISO 8601.
It is clear that there is overwhelming demand for some datatyping in XML.
For example, I am now extending JUMBO as an authoring tool and I want to be
able to control the type and validity of both attributes values and PCDATA
content. Obviously I can invent my own rules, but I'd prefer to use
something that other people have already agreed on. I can't do this in a
DTD, but I think I *can* do it consistently with (and in the spirit of)
SGML. [Very simply - I'll expand later - I am developing a per-element
'schema' in XML syntax which encapsulates the DTD approach and enhances it.
As is my spirit, I'm keeping it simple - not adding the complexities of
inheritance as in the XML-data approach.] At present my datatypes are:
STRING
INTEGER
FLOAT (or synonym)
DATE
URL
MIMETYPE
and I'd value comments. [Any new items need code to be written, so they
don't come free :-)]
This almost inevitably leads on to data validation and I'd like to know
what syntax people already have for expressing this. Obviously it would be
nice for it to be XML-compatible.
P.
I have had some positive feedback on the idea of XDEV and I shall try to
reformulate my ideas. It's very clear that we need a way of discussing the
'land beyond syntax'. I liked the phrase 'when ontologies collide' which I
saw recently (I think from a pointer from Robin Cover's page) and this
seems to me an area where XML-DEV can play an important role. At least we
may be able to identify the ontologies :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Sat Nov 22 10:37:54 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:02 2004
Subject: Inheritance
Message-ID:
Paul,
No argument with your posting (I decided not to post a similar statement
after rereading yours), but could you change your terms slightly?
Although the OO terms themselves were definitely conflated during the
eighties they have by now settled down to:
Type: The declaration of the interface of any set of [objects] that
conforms to this common protocol. Any set of objects or values
with similar behavior... [Firesmith+E 95]
Class: A class is the realization of a type. [UML] The idea of
class is closely linked...with the description of implementation details
of software objects [Cook+D 94].
Type vs. Class: Types classify objects according to a common
interface; classes classify objects according to a common implementation.
[Firesmith+E 95]
Subtyping: The incremental definition of a new type in terms of one
or more existing types, whereby the subtype conforms to all of its
supertypes [an is-kind-of relationship] [Firesmith+E 95]
And subclassing implies implementation-inheritance (i.e. code reuse),
exactly what you were trying to avoid implying.
So I would suggest rewriting your example to:
> > Anyhow, you can emulate OTHER using subtyping without an explicit
> > OTHER construct.
> >
> >
> >
> >
> >
> >
Which makes it use the standard terminology.
So, ELEMENTs would be the leaves of a tree/digraph of Types with ANY as
the root. Note that ISA should formally be IS-A-KIND-OF but that is an
annoyingly long keyword. (My dog is-a Dog which is-a-kind-of Mammal vs:
My dog is-a Dog which is-a Mammal).
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Nov 22 12:15:56 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:02 2004
Subject: XDEV proposals (was Re: Recipes for Information)
In-Reply-To:
Message-ID: <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk>
At 01:06 22/11/97 -0800, Mark L. Fussell wrote:
>
>This is somewhat related to the recent threads on Integrity and
>Inheritance. It is again a bit long so it will be duplicated at MONDO
>(www.chimu.com/projects/mondo).
Thanks Mark - extremely valuable.
[... long insightful and stimulating discussion snipped ...]
I think I understand the wishes of Mark and an increasing number of
XML-DEVers and hope the following is useful...
I come from a non-SGML background and have discovered the formal
limitations of SGML/XML in what I want to do. [The DTD doesn't map onto my
problems, no datatyping, no easy extensibility through inheritance, etc.]
As Mark says, the background of XML is paper-based (and this is reflected
in XSL which is essentially paper-based with very small concessions to
paper-like screen display). Nevertheless the DTD-based approach is
extremely powerful in the right cases and in the right hands.
The problems I address have the following generic operations.
- I want to author XML. Ideally this should be human- and
machine-readable. I want this process to be controlled by software/data to
make it both flexible and rigorous. [This is tough, but I'm starting to
address it in JUMBO. Practical help will be appreciated :-)].
- I wish to be able to re-use other people's information objects. This is
almost certainly going to break any DTD, but it is implicit in most of the
current W3C activity. (RDF, MathML, XSL may have some sort of DTDs, but
they will probably be used as components of larger documents, which cannot
have DTDs)
- I wish to be able to manage distributed and multicomponent objects. I
think XML and related disciplines will solve this very well and excitingly.
- I want to be able to validate XML 'objects'. XML can do this
syntactically, but not semantically. For this I need additional 'recipes'
and code
- I want to be able to transform XML objects into other XML objects. XSL
is tantalisingly close to being able to do this but I believe - at present
- that W3C XML-transformation activity is 'undefined'.
- I want to be able to send XML objects to other people *with* a prior
contract as how these are to be used. XML can partially solve this at
present using DTDs, controlled prose and vocabularies and *bespoke
applications* (i.e. a different application for each DTD.) This is as far
as X*L goes. Much of the X*L prose stresses that particular activity is
left to the *application*. This means that XML documents often need to be
authored, knowing what application is going to be used to process them.
This is, presumably, the way that CDF is designed - you have to have a 'CDF
processor'. However it does not support *generic* applications (or even
generic components of applications).
- I wish to be able to send hypermedia. XLL specifically declines to add
any semantics to the syntax, other than an (implied) HTML-like behaviour
for some of the SIMPLE links.
- I wish to send objects to other people who will print them out and read
them. XSL solves this.
- I wish to be able to send XML objects to people who I don't know exist,
have never heard of me or my domain. [Example, a supermarket may need to
hyperlink to molecular information in labelling its food products.] They
need to access my semantics in (a) human-readable and (b) machine-readable
form. For this a *generic* XML processor (or processing component) is
required. This *is* achievable (through XSL) if the processing activity
consists of producing 2D human-readable objects. I, and I suspect many
others, want to be able to create generic XML applications. [JUMBO is a
*generic* XML application - it can process any XML document. The degree of
added value depends on the components made available by the document's
author or domain.]
Most of these issues are not being addressed, and probably will not be
addressed by the current XML activity. [Not a criticism - they are doing a
fantastic job. Their time is taken with deciding on precise syntax,
procedures, meaning of components in XML documents, etc. More difficult
than I think a lot of people realise.]
This is where XML-DEV has a role to play. Not formally - this list has no
standing other than the high quality of its postings. Since many of these
areas will give rise to 'colliding ontologies' (i.e. strongly held views on
how to do things and what things mean) there are no single solutions.
However, if we treat this in the spirit of a biological system, 'fit'
solutions should arise.
To be 'fit' a solution must:
- reproduce readily. IOW it must be relatively easy to understand what
it's about. Simplicity is very valuable here :-)
- be useful.
- have a modest degree of flexibility. Too much variation kills off
complex organisms.
- be aware of its environment. If it's competing in a niche which is
already filled, it will have a hard time. i.e. if you haven't looked in
other disciplines, you will probably reinvent something.
[The biological metaphor isn't worth elaborating :-)]
My hope, therefore, is that we can identify and systematise certain areas
which are useful to a group of people. [There may be multiple and
incompatible solutions - so long as they are identifiable that need not be
a problem.] Among the *simple* ideas that might be tractable as XDEV
proposals are:
- parser APIs and the generic behaviour of applications. Whatever happened
to Xapi-J?
- datatyping
- re-usable elements, probably with machine-readable schemas.
- transformation language (this *might* spare us from my Monty Python
proposal :-).
- behaviour for XLL-based applications
If you do take these ideas up, please use simple subject lines :-)
P.
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sat Nov 22 13:44:05 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:02 2004
Subject: Is XDEV useful? (was re: XDEV proposals)
In-Reply-To: <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk>
References:
<3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk>
Message-ID: <199711221344.IAA00417@unready.microstar.com>
Peter Murray-Rust writes:
> The problems I address have the following generic operations.
> - I want to author XML. Ideally this should be human- and
> machine-readable. I want this process to be controlled by
> software/data to make it both flexible and rigorous. [This is
> tough, but I'm starting to address it in JUMBO. Practical help will
> be appreciated :-)].
I think that the latest version of Adept supports XML editing, and I
announced some patches to PSGML a couple of months ago.
> - I wish to be able to re-use other people's information
> objects. This is almost certainly going to break any DTD, but it is
> implicit in most of the current W3C activity. (RDF, MathML, XSL may
> have some sort of DTDs, but they will probably be used as
> components of larger documents, which cannot have DTDs)
Actually, this turns out not to be the case -- this is actually very
simple with XML in its current form, if you use XML as a data content
notation.
In the internal DTD subset:
In the external or internal DTD subset:
In the document instance:
Here is a a reusable RDF object:Here is a reusable MathML object:Here is a reusable XSL object:
Whenever your processing software finds an external data entity with
the XML notation, it can simply call the parser recursively.
You could also take an HTML-like approach (especially in a DTD-less
document), and simply do something like
Again, just have your processing software call your parser recursively.
> - I wish to be able to manage distributed and multicomponent
> objects. I think XML and related disciplines will solve this very
> well and excitingly.
Exactly -- this is where the entity structure of full SGML and XML are
a big win.
> - I want to be able to validate XML 'objects'. XML can do this
> syntactically, but not semantically. For this I need additional
> 'recipes' and code
And you always will, no matter how XDEV is designed. I've implemented
SQL-based data management systems, and SQL's type checking is _never_
enough (or even close). Certainly we could modify XML so that parsers
could perform validations like
- the contents of this element must be a number
- the contents of this element must not be empty
but we'd just make the parsers bigger and wouldn't help much anyway.
After all, in real-world applications you always need to perform
validations along these lines:
- the contents of the element must be the name of an American city
with a population over 500,000
- the contents of the element must be a name mentioned in a list in
a different XML document
- the contents of the element must be a valid Internet domain name
I think that XML and SGML were smarter to leave all of this to the
application-specific processing software in the first place.
> - I want to be able to transform XML objects into other XML
> objects. XSL is tantalisingly close to being able to do this but I
> believe - at present - that W3C XML-transformation activity is
> 'undefined'.
Architectural forms will bring you part-way there. For one proposal, see
http://home.sprynet.com/sprynet/dmeggins/xml-arch.html
> - I want to be able to send XML objects to other people *with* a
> prior contract as how these are to be used. XML can partially solve
> this at present using DTDs, controlled prose and vocabularies and
> *bespoke applications* (i.e. a different application for each DTD.)
> This is as far as X*L goes. Much of the X*L prose stresses that
> particular activity is left to the *application*. This means that
> XML documents often need to be authored, knowing what application
> is going to be used to process them. This is, presumably, the way
> that CDF is designed - you have to have a 'CDF processor'. However
> it does not support *generic* applications (or even generic
> components of applications).
As, I think, Paul Prescod has noted, nothing but a Turing-complete
language could do this. XML is a method for creating applications --
it is not an application itself, and each application will need its
own conventions, etc.
> - I wish to be able to send hypermedia. XLL specifically declines to add
> any semantics to the syntax, other than an (implied) HTML-like behaviour
> for some of the SIMPLE links.
Are notations not suitable for specifying this information?
> - I wish to send objects to other people who will print them out
> and read them. XSL solves this.
Yes, it may. I wonder if document-viewing will end up being a major
XML application, when most of the effort right now seems to be going
into transactions and meta-data.
> - I wish to be able to send XML objects to people who I don't know exist,
> have never heard of me or my domain. [Example, a supermarket may need to
> hyperlink to molecular information in labelling its food products.] They
> need to access my semantics in (a) human-readable and (b) machine-readable
> form. For this a *generic* XML processor (or processing component) is
> required. This *is* achievable (through XSL) if the processing activity
> consists of producing 2D human-readable objects. I, and I suspect many
> others, want to be able to create generic XML applications. [JUMBO is a
> *generic* XML application - it can process any XML document. The degree of
> added value depends on the components made available by the document's
> author or domain.]
Here, again, architectural forms will help. As long as you use a DTD,
and the DTD implements a "food information" base architecture, the
supermarket will be able to incorporate your molecular information
automatically.
> Most of these issues are not being addressed, and probably will not be
> addressed by the current XML activity. [Not a criticism - they are doing a
> fantastic job. Their time is taken with deciding on precise syntax,
> procedures, meaning of components in XML documents, etc. More difficult
> than I think a lot of people realise.]
I agree.
> This is where XML-DEV has a role to play. Not formally - this list has no
> standing other than the high quality of its postings. Since many of these
> areas will give rise to 'colliding ontologies' (i.e. strongly held views on
> how to do things and what things mean) there are no single solutions.
> However, if we treat this in the spirit of a biological system, 'fit'
> solutions should arise.
[remainder omitted]
XML-DEV would provide simple solutions to a few additional simple
problems, but in the end (as with SQL), people will still have to do a
lot of work in the middleware. I cannot usefully dump the SQL tables
from my database and send them to someone else without a lot of
integration and customisation work, unless we planned our tables
together from the start.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Sat Nov 22 15:44:19 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:03 2004
Subject: Sequence Access Languages ...
Message-ID: <199711221542.CAA07290@jawa.chilli.net.au>
> From: Mark L. Fussell
> I am not saying sequence is unimportant, but I think SGML is overly
> focused on it (from an IM perspective) because it comes from a
> paper/linear background. Information is rarely linear: it is only time
> that is, which has caused some media [and the humans who use them] to be
> (mostly) linear also. It can be difficult to break that linear
> assumption when it doesn't apply if your tools keep reinforcing it.
But do you think HTML would have become a popular markup language if its DTD
was like this?
This is reductio ad absurdum of what you are saying. A DTD where all
sequence information is made explicit.
In such a DTD, all the elements would have IDs, and either some
external specification to set the sequence/containment, or
a "next" IDREF attribute.
SGML is not overly focused on sequence. Sequence is such a basic
property of text that having to always mark it up explicitly is just
bizarre.
Of course boilerplate text can be removed and added. And of course
chunks in one part can be usefully reflected into another part.
But sequence is important because it is a prime property of language.
Databases contain words and pictures and various fragments. However
SGML/XML must be a format to allow these to be placed as cohesive
language-mediating documents.
If people just want a database dump format for nice relational tables,
comma-delimiter formats are available and attractive. But when they have
text which they don't want to have desequenced, SGML/XML can be useful.
I think the other big trouble with trying to view SGML/XML as a poor database
dump format, is that when you get too far from a markup paradigm, you
have to involve programmers rather than writers. Just folks can write
HTML, at a pinch. If you go too much to a database mentality, you move
to requiring custom-tools for data entry, rather than simple text-editors.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Nov 22 15:59:46 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:03 2004
Subject: Is XDEV useful? (was re: XDEV proposals)
In-Reply-To: <199711221344.IAA00417@unready.microstar.com>
References: <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk>
<3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971122162310.0b3796b0@pop3.demon.co.uk>
At 08:44 22/11/97 -0500, David Megginson wrote:
Thanks very much David,
You pose - but do not answer - a question :-).
>Peter Murray-Rust writes:
>
[...]
>
>I think that the latest version of Adept supports XML editing, and I
>announced some patches to PSGML a couple of months ago.
Indeed. I have no doubt that are and will be some excellent commercial
tools. My problem, which I think is not unique, is that I cannot persuade
my (often conservative) colleagues in science to start using a new
discipline if there is a significant entry cost in terms of tools. [I do
not remember how much Adept is, but many SGML tools are beyond the reach of
impecunious individuals :-)]. I also want to be able to customise the tools
I work with, and - for example - to link in the conversion of legacy data
'on the fly'.
I did indeed note your posting on EMACS/pSGML, and thought about
downloading it. But there were dire warnings about 'if you aren't fully
familiar with major modes of EMACS don't try this...' that I didn't :-)
>
> > - I wish to be able to re-use other people's information
> > objects. This is almost certainly going to break any DTD, but it is
> > implicit in most of the current W3C activity. (RDF, MathML, XSL may
> > have some sort of DTDs, but they will probably be used as
> > components of larger documents, which cannot have DTDs)
>
>Actually, this turns out not to be the case -- this is actually very
>simple with XML in its current form, if you use XML as a data content
>notation.
>
>In the internal DTD subset:
>
>
>
>
>
>In the external or internal DTD subset:
>
> SYSTEM "http://www.w3.org/XML/">
>
> doc ENTITY #REQUIRED>
>
>In the document instance:
>
> Here is a a reusable RDF object:
>
> Here is a reusable MathML object:
>
> Here is a reusable XSL object:
>
>
>Whenever your processing software finds an external data entity with
>the XML notation, it can simply call the parser recursively.
This is very clever! Thanks for pointing this out. I wouldn't have thought
of it.
It does, however, require that each ENTITY consistently uses just one DTD.
>
>You could also take an HTML-like approach (especially in a DTD-less
>document), and simply do something like
>
>
>
>
This is indeed what I do at present - but using XML-LINK specifically.
(although the semantics of EMBED - just like SRC - may not be universally
agreed.)
>
>
> > - I want to be able to validate XML 'objects'. XML can do this
> > syntactically, but not semantically. For this I need additional
> > 'recipes' and code
>
>And you always will, no matter how XDEV is designed. I've implemented
>SQL-based data management systems, and SQL's type checking is _never_
>enough (or even close). Certainly we could modify XML so that parsers
>could perform validations like
>
> - the contents of this element must be a number
> - the contents of this element must not be empty
>
>but we'd just make the parsers bigger and wouldn't help much anyway.
>After all, in real-world applications you always need to perform
>validations along these lines:
>
> - the contents of the element must be the name of an American city
> with a population over 500,000
> - the contents of the element must be a name mentioned in a list in
> a different XML document
> - the contents of the element must be a valid Internet domain name
My approach to this is to write Element-specific code which is activated at
various processing times, e.g. Atom.process(). [I also have a
Atom.display()] This, of course, implies that the validation (or display)
of the element is context-independent, but I'm optimistic that - for the
sort of things I'm interested in - that will be true. I can easily see:
Float.validate();
Molecule.validate();
Table.validate();
URL.validate();
being standalone functions and re-usable in different environments. They
can also easily be overridden at the same stages as stylesheets. Your first
two examples are admittedly context-dependent.
>I think that XML and SGML were smarter to leave all of this to the
>application-specific processing software in the first place.
Agreed. I think one role of XML-DEV is to see what agreement(s) are
possible for the next step.
>
> > - I want to be able to transform XML objects into other XML
> > objects. XSL is tantalisingly close to being able to do this but I
> > believe - at present - that W3C XML-transformation activity is
> > 'undefined'.
>
>Architectural forms will bring you part-way there. For one proposal, see
>
> http://home.sprynet.com/sprynet/dmeggins/xml-arch.html
I have read - and appreciated this. I think that, without having an
AF-aware processor to hand, and a friendly guru, it's too difficult for
*me*. And certainly for my community. But I know there are a lot of
devotees of AFs on this list, and perhaps they can come to a communal view
as to whether there is agreement as to how they are to be used in XML and
what software is required (because they do need software).
>
> > - I want to be able to send XML objects to other people *with* a
> > prior contract as how these are to be used. XML can partially solve
> > this at present using DTDs, controlled prose and vocabularies and
> > *bespoke applications* (i.e. a different application for each DTD.)
> > This is as far as X*L goes. Much of the X*L prose stresses that
> > particular activity is left to the *application*. This means that
> > XML documents often need to be authored, knowing what application
> > is going to be used to process them. This is, presumably, the way
> > that CDF is designed - you have to have a 'CDF processor'. However
> > it does not support *generic* applications (or even generic
> > components of applications).
>
>As, I think, Paul Prescod has noted, nothing but a Turing-complete
>language could do this. XML is a method for creating applications --
>it is not an application itself, and each application will need its
>own conventions, etc.
Well, I'm probably mad. But I still feel that (at least parts of) an
XML-processor can be document-independent.
> > - I wish to be able to send hypermedia. XLL specifically declines to add
> > any semantics to the syntax, other than an (implied) HTML-like behaviour
> > for some of the SIMPLE links.
>
>Are notations not suitable for specifying this information?
I don't know :-). I have never used NOTATION. Seeing your example above
suggested that it may be useful. Maybe it will add type information to the
thing pointed at?
XLL states that there is an attribute 'BEHAVIOR' but says nothing about
what it is for. It would be valuable (as I have already posted) if there
is some consensus about the values and their meaning.
>
> > - I wish to send objects to other people who will print them out
> > and read them. XSL solves this.
>
>Yes, it may. I wonder if document-viewing will end up being a major
>XML application, when most of the effort right now seems to be going
>into transactions and meta-data.
I think the definition of 'document' will effectively broaden. I see no
reason why non-textual objects cannot be regarded primarily as 'documents'.
>
[...]
>
>Here, again, architectural forms will help. As long as you use a DTD,
>and the DTD implements a "food information" base architecture, the
>supermarket will be able to incorporate your molecular information
>automatically.
Ah - but this is the problem. I have no idea who will use my information
and that is why I think that AFs are limited in my area. In Java classes,
for example, I can use the Date class without the authors knowing I exist.
I hope that others can use my Molecule class/element in the same way.
>
>XML-DEV would provide simple solutions to a few additional simple
>problems, but in the end (as with SQL), people will still have to do a
>lot of work in the middleware. I cannot usefully dump the SQL tables
>from my database and send them to someone else without a lot of
>integration and customisation work, unless we planned our tables
>together from the start.
No question. Maybe I think there will be a lot of newcomers with simple
problems to which there will be simple solutions. Just as there were with
HTML. Maybe I'm wrong :-), and that most of the problems will have to map
onto very thoroughly worked out solutions on a per-problem basis. We'll see.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From nelson at media.mit.edu Sat Nov 22 17:04:43 1997
From: nelson at media.mit.edu (Nelson Minar)
Date: Mon Jun 7 16:59:03 2004
Subject: Integrity in the Hands of the Client
In-Reply-To: <3.0.1.16.19971122104401.2a87a84a@pop3.demon.co.uk>
References: <199711220630.RAA28483@jawa.chilli.net.au>
<3.0.1.16.19971122104401.2a87a84a@pop3.demon.co.uk>
Message-ID: <199711221704.MAA27293@pinotnoir.media.mit.edu>
>We are at a very exciting, but critical, time in the development of XML
Yes, definitely. The next six months are when XML stops being a small
research effort and starts being used by people who don't care about
how to structure documents, but just want to publish. If XML is rolled
out correctly, we can make it easy for them.
>XML will be used by vastly more people that current practise SGML.
And by a lot of people who have never heard of SGML and don't care
about it.
>In many cases the XML specs (including XLL and XSL) deliberately do
>not say how something should be done - only what syntax should be
>used. The WG has (often rightly) taken the view that it should not
>prescribe ways of doing things. But we are not at - or very near to -
>the time when people will start doing things and there is a danger
>that we shall end up with serious inconsistencies.
The danger is more than inconsistencies. XML is complicated and hard
to understand how to use well. In order to help the people who just
want to publish, examples and tools need to be developed to help
people not just build legal XML, but *good* XML. That's hard, both
because you have to encapsulate a practice of good XML authoring and
even worse, come up with what we mean by "good" in the first place.
I'm reminded of what happened in the first few months of 1994, when a
lot of people suddenly learned HTML. One of the most useful documents
(for me) of that period was Eric Tilton's essay "Composing Good HTML"
(since turned into a book, "Web Weaving", with Carl Steadman and Tyler
Jones). It was a short essay, but it laid out many of the basics of
writing HTML well - issues beyond syntax. Style issues like "don't say
'click here' in a document, integrate the anchor text into the
narrative". Structural issues like "don't misuse headers" and "try to
do logical formatting, not physical". And meta information
recommendations, like "put your name on documents" and "put a last
modified date on documents if it makes sense". For me, that essay made
HTML made sense, gave some order to the varied capabilities of the syntax.
I tried to do my bit back then by writing an HTML editor tool (an
emacs mode) that made it easier to write good HTML. Indenting the HTML
source to show the document structure, providing simple templates to
get basic well formedness, automating last modified footers. And I
think it was reasonably successful - pages written with my editor were
at least a little better than pages written with nothing.
XML needs similar style guidelines and tools if people are going to
use it well. The problem for XML is harder than with HTML since XML is
more powerful. I think XML will be most successful for casual document
writers when there are standard well-established DTDs combined with
style sheets that are simple to use and very well documented as to
what the tags mean and how to use them. I don't know how to smooth the
process of helping people develop their own DTDs.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sat Nov 22 19:09:49 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:03 2004
Subject: Is XDEV useful? (was re: XDEV proposals)
In-Reply-To: <3.0.1.16.19971122162310.0b3796b0@pop3.demon.co.uk>
References: <3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk>
<199711221344.IAA00417@unready.microstar.com>
<3.0.1.16.19971122162310.0b3796b0@pop3.demon.co.uk>
Message-ID: <199711221910.OAA00315@unready.microstar.com>
Peter Murray-Rust writes:
> Thanks very much David,
> You pose - but do not answer - a question :-).
Perhaps as I move into my mid-30's I'm assuming the modesty and
humility of old age (though those who have endured a conversation with
me may have their doubts).
> >Whenever your processing software finds an external data entity with
> >the XML notation, it can simply call the parser recursively.
>
> This is very clever! Thanks for pointing this out. I wouldn't have thought
> of it.
> It does, however, require that each ENTITY consistently uses just one DTD.
Each entity can have its own DOCTYPE declaration -- simply start a new
invocation of your parser.
> I have read - and appreciated this. I think that, without having an
> AF-aware processor to hand, and a friendly guru, it's too difficult for
> *me*. And certainly for my community. But I know there are a lot of
> devotees of AFs on this list, and perhaps they can come to a communal view
> as to whether there is agreement as to how they are to be used in XML and
> what software is required (because they do need software).
The simplest approach to AF does not require an architectural engine
at all; instead, simply look at attribute values instead of element
type names; i.e., instead of
IF element_name = "FOO" DO
do_a_foo()
END
try
IF attribute_name("MYARCH") = "FOO" DO
do_a_foo()
END
> > > - I wish to be able to send hypermedia. XLL specifically declines to add
> > > any semantics to the syntax, other than an (implied) HTML-like behaviour
> > > for some of the SIMPLE links.
> >
> >Are notations not suitable for specifying this information?
>
> I don't know :-). I have never used NOTATION. Seeing your example above
> suggested that it may be useful. Maybe it will add type information to the
> thing pointed at?
That exactly its purpose -- the notation informs the processing
software of a binary entity's type:
(I have to admit that I have no idea with to do with system
identifiers for notations in XML -- in full SGML, I just leave them
out).
[...]
> >Here, again, architectural forms will help. As long as you use a DTD,
> >and the DTD implements a "food information" base architecture, the
> >supermarket will be able to incorporate your molecular information
> >automatically.
>
> Ah - but this is the problem. I have no idea who will use my information
> and that is why I think that AFs are limited in my area. In Java classes,
> for example, I can use the Date class without the authors knowing I exist.
> I hope that others can use my Molecule class/element in the same way.
How could they possibly use your information automatically if you
weren't using some kind of shared standard? How would they know what
information applied to what food, for example, unless you had somehow
encoded that information in advance for them?
All the best, and thanks for an interesting discussion,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Nov 22 20:03:39 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:03 2004
Subject: Is XDEV useful? (was re: XDEV proposals)
In-Reply-To: <199711221910.OAA00315@unready.microstar.com>
References: <3.0.1.16.19971122162310.0b3796b0@pop3.demon.co.uk>
<3.0.1.16.19971122130407.475fef92@pop3.demon.co.uk>
<199711221344.IAA00417@unready.microstar.com>
<3.0.1.16.19971122162310.0b3796b0@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971122205518.0b37d2c8@pop3.demon.co.uk>
At 14:10 22/11/97 -0500, David Megginson wrote:
>Peter Murray-Rust writes:
>
> > Thanks very much David,
> > You pose - but do not answer - a question :-).
>
>Perhaps as I move into my mid-30's I'm assuming the modesty and
Well, so am I (but not in the decimal system :-) Age is unimportant on
XML-DEV (except as datatype, of course :-)
[...]
>
>Each entity can have its own DOCTYPE declaration -- simply start a new
>invocation of your parser.
>
This raises a common problem I have. If I have an 'include file' (e.g. a
chapter) I can 'include' by the following mechanisms:
- declare it as an entity and use &chapter1; In this case it should not
have any doctypes, or other header info
- reference it by XML-LINK="SIMPL" HREF="chapter1.xml"
- use your NOTATION trick
The advantage of the last two is that they are standalone XML files and can
be validated independently, and so I'm leaning towards them in general.
They also have the merit that you load the TOC and then look at whatever
chapters you want. This takes less memory and is faster
The advantage of the first is you have a single object in memory which can
be searched (e.g. Xpointers).
Any comments?
>
>The simplest approach to AF does not require an architectural engine
>at all; instead, simply look at attribute values instead of element
>type names; i.e., instead of
>
> IF element_name = "FOO" DO
> do_a_foo()
> END
>
>try
>
> IF attribute_name("MYARCH") = "FOO" DO
> do_a_foo()
> END
Oh dear! Like the man who didn't realise he had been using prose all his
life. This is exactly what I do for most of my stuff at present :-) It's
advantage is that it makes the DTD much more forgiving :-)
>
>
> > >Here, again, architectural forms will help. As long as you use a DTD,
> > >and the DTD implements a "food information" base architecture, the
> > >supermarket will be able to incorporate your molecular information
> > >automatically.
> >
> > Ah - but this is the problem. I have no idea who will use my information
> > and that is why I think that AFs are limited in my area. In Java classes,
> > for example, I can use the Date class without the authors knowing I exist.
> > I hope that others can use my Molecule class/element in the same way.
>
>How could they possibly use your information automatically if you
>weren't using some kind of shared standard? How would they know what
>information applied to what food, for example, unless you had somehow
>encoded that information in advance for them?
No :-) I produce something I think other people would value and just
produce it with (hopefully) good documentation. Thus I have a class
RealSquareMatrix
in JUMBO. I may make an out of it. I would then document it with
what I felt were ReallyUseful properties of RealSquareMatrices. If people
want to use it, they're welcome. This is the way that we use java.* and
other classes.
So, if I produce I will document what it is, what its components
are, and then offer Molecule.java as something that will
display()/validate() it. For example, a Molecule can have Atoms but not
Bonds, but not Bonds without Atoms. If a food manufacturer reads my
documentation, they can decide for themselves whether it's useful. [I have
had interest by those involved in submission of drugs - e.g. pharmaceutical
companies and regulatory agencies.]
The users then have to satisfy themselves whether is robust,
future-proofed, etc.
In the same way I shall take on trust. I shall create MATHML
objects (possibly with TeX or symbolic algebra) and use them for chemistry.
The original authors of MathML need never know what I am doing (although I
have actually met some and am very excited about what they are doing).
>
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jlapp at acm.org Sat Nov 22 21:36:47 1997
From: jlapp at acm.org (Joe Lapp)
Date: Mon Jun 7 16:59:03 2004
Subject: Integrity in the Hands of the Client
In-Reply-To:
References: <3.0.3.32.19971121105652.0095e600@pop.access.digex.net>
Message-ID: <3.0.3.32.19971122163645.00965aa0@pop.access.digex.net>
Mark Baker wrote:
>But in *many* cases, you just want to make the *object* persist simply,
>perhaps even on the machine with the browser. This is especially
>suitable for agent systems; you bring the ability to persist along with
>you instead of attempting to store it "behind" you. It's a move away from
>TP-monitor style ACID transactions, and towards a more "make forward
>progress" means of distributed computing. Object groups are a good
>example of this.
And in a subsequent posting he wrote:
>[...] transactions are an
>overrated means of reasoning about distributed systems. They try and
>make distributed processing look like local processing, when we now know
>how impractical that view is.
I find these statements very thought-provoking. I'm not quite sure what
you mean by them, at least not in the context of our discussion. It
sounds like you are proffering a very important perspective that I'm going
to need to carry around in my back pocket. In particular, I'm curious
about the implications for data that is shared among many users? Are you
saying that there is a model that accomplishes the same thing as sharing
data but that does not require a central (or a partitioned and replicated
but still synchronized) repository?
--
Joe Lapp (Java Apps Developer/Consultant)
Unite for Java! - http://www.javalobby.org
jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Sat Nov 22 21:38:00 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:03 2004
Subject: Sequence Access Languages ...
In-Reply-To: <199711221542.CAA07290@jawa.chilli.net.au>
Message-ID:
On Sun, 23 Nov 1997, Rick Jelliffe wrote:
> > From: Mark L. Fussell
> > I am not saying sequence is unimportant, but I think SGML is overly
> > focused on it (from an IM perspective) because it comes from a
> > paper/linear background. Information is rarely linear: it is only time
> > that is, which has caused some media [and the humans who use them] to be
> > (mostly) linear also. It can be difficult to break that linear
> > assumption when it doesn't apply if your tools keep reinforcing it.
>
> But do you think HTML would have become a popular markup language if its DTD
> was like this?
>
> ( h1*, h2*, h3*, p*, I*, table*, tr*, td*, th*)>
>
>
> This is reductio ad absurdum of what you are saying. A DTD where all
> sequence information is made explicit.
I certainly wasn't trying to say that sequencing should be removed but
just that it can be difficult to see when it doesn't apply. Sometimes
information is (at least dominantly) organized as a sequence: Ordered
Sections contain Ordered Paragraphs. Sometimes information does not
inherently need to be sequenced but the application would like it to be so
it does not need to worry about ordering it at presentation (I am thinking
of a list of citations where there is a natural ordering [by one of the
columns/attributes of a citation]). And a variation of this case is:
sometimes it is just easier for people to take direct control than to do
informational markup. I think HTML and Word Processors represent this end
of the spectrum.
> If people just want a database dump format for nice relational tables,
> comma-delimiter formats are available and attractive. But when they have
> text which they don't want to have desequenced, SGML/XML can be useful.
Well, I guess I have larger visions of what SGML/XML can do, and I think
it is within (or at most a mild extension) of the original vision.
Requoting [Goldfarb 90, A.2.40]:
---
Generalized markup is based on two novel postulates:
a) Markup should describe a document's structure and other attributes
rather than specify processing to be performed on it, as descriptive
markup need be done only once and will suffice for all future processing.
b) Markup should be rigorous so that the techniques available for
processing rigorously-defined objects like programs and databases can be
used for processing documents as well.
---
SGML is designed to describe information, and although the original vision
may have been focused on describing documents I believe that was just
because it was the particular task at hand.
> ... Just folks can write
> HTML, at a pinch. If you go too much to a database mentality, you move
> to requiring custom-tools for data entry, rather than simple text-editors.
No argument that HTML is easier for novices to directly write than more
structured information, but that also applies to any of the more
sophisticated DTDs. The benefit of a human-readable and
human-understandable encoding like SGML/XML is that people can progress
from simple DTDs like HTML to more complex ones and still understand what
is going on. I have done this with web-site development where content
writers now use a "real" DTD that allows generation of different HTML
views (and more sophisticated linking... etc.)
And I do agree that accurately modeled information (e.g. normalizing in a
RDB context) can make it too hard (for the desired writes) to enter data
directly. It is likely that some SGML/XML DTDs will be designed to
contain all the necessary information with explicitly desired redundancy
and artificial sequencing but with the assumption that the processing will
later remove them on the way to the information model. This is almost
exactly what UI Forms and relational views are doing.
So I don't want to get rid of sequence, I just believe people should think
twice about it and assertain whether it is really part of the information
and is the best way to represent that information.
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Sat Nov 22 23:05:07 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:59:03 2004
Subject: Query Languages for XML
Message-ID: <3.0.32.19971122150639.00a71d60@mailhost.criinc.com>
At 11:50 AM 11/21/97 -0500, Paul Prescod wrote:
>Derek Denny-Brown wrote:
>>
>> One of the things that I see as a potential problem is that HTML etc as it
>> is used now has 2 (as I count them this side of the morning) relatively
>> distinct uses.
>> 1) as an alternate form of (relatively) static information.
>> 2) as a (very-basic) cross-platform (g)ui.
>>
>> XSL and DSSSL are focusing rather hard on (1), but not on (2).
>
>I'm not sure what you mean by that. XSL as currently proposed has access
>to all of the form features of HTML, just as it has access to all of the
>static display features of HTML. It is correct to argue that we are
>spending more effort on *improving* HTML's static display features than
>improving its form features, but I think that that is probably
>appropriate considering the market's interest in better static pages,
>SGML's particular strengths in that area and Java's suitability for
>forms.
I have a difficult time understanding how a "call-back" would work in XSL,
since the processing model does not include any mechanism for such
callbacks. This sense given that XSL is (at least to this point) about
transforming XML to a displayable view. The problem is when that display
able view is interactive. I am not talking about FORMs, where the
interaction is between the browser and the server, but rather interaction
between the user and the browser, a-la JScript/JavaScript, onMouseOver,
etc... What if I have two (or more) possible way to view (ie. differing
applications of a styelsheet) some content. I am not looking to reprocess
teh whole, pag,e but rather to reprocess just a small portion and swap out
a portion of the existing/original flow-objects for the newly generated
flow-objects. Or, as a simpler case, to just modify the attributes of
existing flow objects. Using HTML forms as a example, it would be really
nice if you could performs some sanity checks on the contents of the form
before it was sent to the server. One of the problems with most currnet
scheme's which do this is that they provide no clear indication of what
they think is wrong, when the sanity checking is done. It would be better
if they could use color or some such thing to indicate which fields have
data which it considers invalid.
The way all this (and most of JavaScript) is done now is through
call-backs. You register a function as a call-back in the case a specific
event happens (such as the user pushing the submit button, or the user's
mouse moving over a specific image). I am unclear how XSL could handle
such a call-back. For what I would want, it really needs a queriable model
of the flow-objects which were created originally, and some way to modify
those flow-objects. Now your XSL style sheet has two portions, one which
takes the XML document and builds a complete flow-object stream/tree. The
other handles callbacks from user-generated events regarding flow-objects,
and modifies the flow-objects. This second part is what "dynamic" HTML is
all about. (Either through Netscape's JavaScript or Microsoft's dHTML,
though dHTML is more like what I am talking about.)
If you have looked at some of Microsoft's MSXML samples which use DSO (I
think that is what they call it...), that is kind of in line with what I am
talking about, though in that case the original flow-objects where directly
HTML, not generated from XML...
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Sun Nov 23 01:23:53 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:03 2004
Subject: Integrity in the Hands of the Client
Message-ID:
From: Joe Lapp
In this posting I'm going to be a little bold and propose that both
the XML and DOM specifications are flawed. The existence of these
flaws ride on the assumption that we care to use SGML/XML to create
domain models for data where the data evolves over time. I'm also
assuming that it is unacceptable for the client objects of a document
to maintain the integrity of the document.
I've not been following this thread closely, so I apologize if I get
something wrong. I'll stop, first, too, to note that when
interconverting data formats we rarelt can represent every validity
constraint in the new format -- If I dump a DB record to tabbed files
I lose referential (and all other) integrity checks, but I may have
much better luck moving to a compeiting vendor's system.
When using XML, we may reasonably expect that the richer formalism
will give us more control (and for hierarchical data, that
expectation is well (if not perfectly) met. We may also expect that
other properties can be preserved (eg IDrefs eliminate broken
pointers, but don't allow typed references), but some probably won't be.
We need to design the DTD for this document. Here is our first pass:
]>
To get a better feel for what we've designed, we create a little sample
document:
Text goes here.Text goes here.Text goes here.Text goes
here.Text goes
here.
This seems to work. It stores information about books and authors,
and it is not possible to add a book without associating it with
the description of some author. But we can see that it breaks as
soon as we add any other kind of element that has an ID. We know
that every book will eventually have an ID, because we'll soon want
to have an element whose content elements reference the New York
Times Bestsellers. Once we do that, nothing prevents an administrator
(or the client program he or she is using) from indicating that the
author of a book is another book. This DTD will not suffice.
The problem with this is that it uses database style "joins" on ID
values. XML's most powerful constraints are tree constraints, based on
containment. For example the following structure does not have this
problem:
David BrinwhateverThe Postman whatever
other books go here. If we have more than one author:
...etc
Note that you do have to pick a "by author" or "by book" hierarchy to
use this technique. I also moved title and author into elements:
titles frequently contail markup, and names can be complex enough that
it's often a good idea to be prepared for the eventual need for
markup. Consider Chinese names where the order of family and personal
names is different than it is in most European cultures.
It seems that we might have to use links, but lets look at other
approaches first. We entertain the idea that an author's books
belong to the content of the author. We quickly throw that one out
when we realize that a book can have more than one author.
Or take an alternative approach (as I sketched above).
I have not been able to find a way to have the document server force
clients to ensure that whenever they add a book, that book is
associated with some author. Clients are given the responsibility
of maintaining the integrity of the document.
No, Servers that want to impose non-XML integrity constraints (such as
you are demanding) must impose those constraints themselves. XML, like
traditional databases (which seem to be your starting point)
represents some things well, nd some things very badly. Attempting to
create relational schemas for XML documents produces that same kind of
hairy, unnatural specifications and requires similar extra integrity
checks on update to represent typical document information.
Basically, I think that the flaw of not providing what you ask for is
in fact no flaw, but an artifact of different tools being targeted to
different purposes. There is a difference -- since XML is a data
format and _not_ a processing technology the way a database is, it may
be useful as a way to represent data and transport best _manipulated_
in non-XML ways. You get a rich language of structures for free by
using an XML parser, and that may save some time in writing data
transporters -- for instance, a DTD for the transport of complete RDB
table sets would be easy to write -- but checking those tables for
semantic correctness would not be one of the things you get for free.
I think the XML specification as it currently stands is extremely
well-suited for describing data that does not change over time, but
that it is lacking in specifying how documents are to evolve.
You overstate the case here. It's suited for describing how the data
whose integrity costraints correspond to XML validity should evolve.
These constraints are not theoretically justified, but are
pragmatically justified by the fact that people can get useful
document management work done using them.
This is the same thing with relational database -- all those theorems
about normal forms and algebra merely show that the system is well
defined -- the fact that tables are useful for many kinds of data is
still a pragmatic one, and not a theoretical one. The world is still
full of things that don't fit the relational model very well.
I know that our current data-manipulation-savior is OO databases, bit
once we have experience with them we'll grow to understand the ways in
which they fall short of perfection as well.
Nevertheless, future versions of XML might have small improvements
that will help cases like this. The provision of multiple ID spaces
(ability to have typed IDs and typed IDrefs) is one that has been
suggested a number of times. It would also be very useful in
documents, since (begin example) only would have "fignum"
attributes, and so the user of "figref" attributes will be prevented
from referring instead to a paragraph of random text.
Small suggestions like this that also offer a lot of leverage may get
considered for XML 1.1. (Small in the sense that little syntax is
required to support it, and little processing beyond that already
required for ID/IDREF processing).
To my mind, such suggestions are compelling to the extent that they
are useful in _document_ management (as well as general data
management) because that really describes the primary focus of XML
design. XML may well be useful beyond that area, but I think it should
stay away from bidding on the "universal data format of the ages"
title, that may well be impossible to ever attain.
-- David
------------------------------------------+----------------------------
David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/
| MAPA: mapping for the WWW
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Sun Nov 23 16:57:31 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:03 2004
Subject: Integrity in the Hands of the Client
References:
Message-ID: <3478614E.239F097A@technologist.com>
David G. Durand wrote:
> To my mind, such suggestions are compelling to the extent that they
> are useful in _document_ management (as well as general data
> management) because that really describes the primary focus of XML
> design. XML may well be useful beyond that area, but I think it should
> stay away from bidding on the "universal data format of the ages"
> title, that may well be impossible to ever attain.
This is such an important point I felt I had to emphasize it.
We could legally mandate every single byte that is stored on a computer
hard drive must be in XML and the world would not be a better place. We
would still have incompatibilities between software, we would still have
trouble storing documents in relational databases and relational
information in documents and so forth. Unifying notation is merely a
convenience. It doesn't automatically buy a perfect world of seamless
interoperability as some seem to believe.
"Sometimes the actual claims for markup-based systems are overstated;
the claim that SGML results in portable documents, for example,
falls afoul of the observation that it is possible to put angle
brackets around troff tags, supply a simple document type descrip-
tor,and thereby achieve anSGML-compliant document, without gaining
any portability or descriptiveness for the information. True
portability requires not only that informa- tion be transportable
from one machine to another,but that the semantics of that informa-
tion be the same on either machine. SGML, in particular,claims to
transfer no semantics, so it surely cannot guarantee portability."
[1]
Given this fact, we should focus on making the best notations we can for
the data types we have to represent, rather than trying to stuff all
data into the same notation, or worse, making a single notation that is
adapted for all kinds of data. Putting angle brackets around troff does
not make troff into a serialization of a Java Bean and the fact that
Java Beans and Troff might share a notation does not make it easier to
create troff files from Java or to render them IN Java.
Paul Prescod
[1] "Markup Reconsidered" http://www.sil.org/sgml/raymmark.ps
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Sun Nov 23 17:29:00 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:04 2004
Subject: Sequence Access Languages ...
References:
Message-ID: <347868B1.7604BDDD@technologist.com>
Mark L. Fussell wrote:
>
> SGML is designed to describe information, and although the original vision
> may have been focused on describing documents I believe that was just
> because it was the particular task at hand.
I think that you have this backwards. SGML was designed to represented
documents and insofar as documents share properties with some other
types of information, SGML can represent other information. I see no
reason to believe that a single notation could efficiently represent all
forms of information. If we take this to an extreme then most people
seem to agree: how soon do you expect we will represent bitmapped
graphics in XML?
My personal rule of thumb is that it is okay to represent some
non-document data type in SGML/XML if it is convenient to do so without
extending SGML/XML in a way that would make it less appropriate for
dealing with documents. Suboptimal extensions would be those that
confuse the organizational principles of SGML or make it more
complicated to implement or understand (such as complex validity
constraints).
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mrc at allette.com.au Sun Nov 23 21:22:36 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun 7 16:59:04 2004
Subject: Sequence Access Languages ...
References:
Message-ID: <34789E7C.F1E4FAB6@allette.com.au>
Mark L. Fussell wrote:
> SGML is designed to describe information, and although the original vision may
> have been focused on describing documents I believe that was just because it was
> the particular task at hand.
Actually, the task at hand has always been to capture the information as cleanly
and thoroughly as possible with as little regard to the downstream applications as
possible. Several years ago we converted to SGML a substantial amount of military
data with an anticipated lifespan of fifty years. At the time, there were no
satifactory SGML repositories, yet we are not precluded from uploading to one when
they arrrive even if it does mean (an easy) SGML to XML conversion of the data.
Similarly, and with all due respect to the idea of a repository, by the time this
data reaches its twilight there will be some very different mechanisms for
managing data and I daresay the repository will be long gone.
We know that XML/SGML won't cover everything at once - the quickest path to
failure is to try to make it do so. Stage one, capture the data as best you can
anticipate and hopefully in a way that also works for authors, stage two, convert
it for use in specific applications and write semantic support mechanisms.
Disregard of any type of application is the greatest strength of XML/SGML - we
must not be tempted to lose sight of that no matter how tempting the siren's call.
It's a long game...
--
Regards
Marcus Carr email: mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia) email: info@allette.com.au
Level 10, 91 York Street www: http://www.allette.com.au
Sydney 2000 NSW Australia phone: +61 2 9262 4777
fax: +61 2 9262 4774
_______________________________________________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From terje at in-progress.com Mon Nov 24 03:49:30 1997
From: terje at in-progress.com (terje@in-progress.com)
Date: Mon Jun 7 16:59:04 2004
Subject: Join the Document Interchange Initiative
Message-ID:
The Document Interchange Initiative (DII) is a campaign to foster that the
content of a website can be more easily interchanged between various
software. To reach this goal, the campaign promotes adherence to markup
standards as an alternative to proprietary markup extensions. In a way, the
campaign is a public relations effort to increase the use of XML and
related technologies.
The Document Interchange Initiative promotes markup that conforms to the
established standards both when it comes to syntax and semantics. The
following location is updated with information about the campaign, and new
information is added on a regular basis. Feel free to make a link to it:
http://interaction.in-progress.com/interchange
You are invited to join the Document Interchange Initiative, by adding your
name or organization to a list of those that support the goals of
interchangable documents through adherence to markup standards. There are
no obligations whatsoever, but your name or organization on the list will
help to get the necessarry attention for the campaign.
Please send an email to or directly to me to be
listed as supporter or become an activist, or if you have any questions
related to the campaign.
-- Terje | Media Design in*Progress
C a s c a d e...
a comprehensive Cascading Style Sheets editor for Mac
http://interaction.in-progress.com/cascade
Make your Web Site a Social Place with Interaction -
The Most Powerful Companion to a Mac Web Server!
http://interaction.in-progress.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Mon Nov 24 05:20:30 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:04 2004
Subject: Integrity in the Hands of the Client
Message-ID: <199711240518.QAA09900@jawa.chilli.net.au>
> From: Paul Prescod
>
>
> "Sometimes the actual claims for markup-based systems are overstated;
> the claim that SGML results in portable documents, for example,
> falls afoul of the observation that it is possible to put angle
> brackets around troff tags, supply a simple document type descrip-
> tor,and thereby achieve anSGML-compliant document, without gaining
> any portability or descriptiveness for the information. True
> portability requires not only that informa- tion be transportable
> from one machine to another,but that the semantics of that informa-
> tion be the same on either machine. SGML, in particular,claims to
> transfer no semantics, so it surely cannot guarantee portability."
>
> [1] "Markup Reconsidered" http://www.sil.org/sgml/raymmark.ps
Without wishing to disagree in any way with Paul, the quote is perhaps not
quite true, I think.
Sticking angle brackets on troff code may give you a document that is
syntactically *valid* SGML but, because to the extent that it uses elements
to markup processing instructions, the document does not *conform* to
SGML. Such conformance cannot be judged mechanically, but by looking at the
definitions in ISO 8879 for processing instructions and elements.
People often seem to think "SGML is a grammar; I can markup all sorts of
sloppy things; therefore SGML is a bad grammar". But SGML is more than
a queer grammar, it is a language: the terms "element" and
"processing instruction" (etc) have broad but useable meanings.
I think one problem with XML is that these definitions of what an element,
etc., actually mean are not present. XML *is* just a grammar, more or less.
But to convert it to a useful language, we often have to plug in SGML's
definitions.
And again, we shouldn't then think that in all cases "SGML conformance=good;
SGML non-conformance=bad". But that is separate from "do I need
SGML validity? do I need XML well-formedness? do I need a custom syntax?".
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Mon Nov 24 09:01:06 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:04 2004
Subject: XML and standards (was Re: Integrity in the Hands of the Client)
References: <3.0.2.32.19971124001024.00930d10@pop.iosphere.net>
Message-ID: <3479431B.BA9E33EE@technologist.com>
Mark Baker wrote:
>
> At 12:01 PM 23/11/97 -0500, Paul Prescod wrote:
> >Putting angle brackets around troff does
> >not make troff into a serialization of a Java Bean
>
> What if that troff document contained a link to an implementation of a
> troff formatter? What if that implementation described its interface using
> XML?
What if it didn't? What if it described its interface using CORBA or
some proprietary language that is more powerful than CORBA? You don't
lose any flexibity or expressive power, you just have to write another
parser for CORBA or your proprietary language.
The hard part of writing a troff implementation is not writing the
parser, but in writing the formatter. So XML can only make a marginal
difference in implementation time or effort. The hard part of writing an
interface to a troff implementation is writing the interface, not
publishing it (in my experience, anyway) so XML can only make a marginal
difference there either. The same goes for writing an SGML DTD parser.
The difficulty there is in keeping track of all of those elements,
attributes and entities, not in parsing the syntax. So again you only
get a marginal benefit from using XML as the representation language.
Now if a marginal benefit is enough to tip you into profitability, then
I'm glad we were able to help you. But there are costs associated with
that marginal benefit. You will beat your head against the wall trying
to express constraints that SGML cannot express directly. You will find
that your files are much larger than they would be in an optimized
notation. You will notice redundancy in places that you don't really
need it.
On the other hand, there is a huge benefit to using SGML/XML *for
documents* because SGML is the international standard for representing
structured documents. Thus you get the benefit of hundreds of tools,
books and experts, almost all of them specialized for document markup.
You do not get that benefit when you ignore CORBA (the real object
interface standard) to use XML instead. You do not get that benefit when
you ignore TeX or troff to use XML as a page description language. You
do not get that benefit when you ignore the existing DTD syntax to
invent a new XML instance syntax.
When you use XML to replace an existing standard, you are, for a period
at least, actually working against open standards and promoting a
proprietary alternative, even if it is expressed in the standard
notation of SGML/XML. This might be a good idea if there is a problem
with the existing standard in a given area, but more often it is a
better idea to work with the people who control the standard to improve
it rather than striking out on your own (for all of the usual reasons).
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Mon Nov 24 14:33:51 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:04 2004
Subject: Integrity in the Hands of the Client
References: <199711240518.QAA09900@jawa.chilli.net.au>
Message-ID: <34793F19.2039C768@technologist.com>
Rick Jelliffe wrote:
> Sticking angle brackets on troff code may give you a document that is
> syntactically *valid* SGML but, because to the extent that it uses elements
> to markup processing instructions, the document does not *conform* to
> SGML. Such conformance cannot be judged mechanically, but by looking at the
> definitions in ISO 8879 for processing instructions and elements.
"Element: A component of the hierarchical structure defined by a
document type definition;"
> People often seem to think "SGML is a grammar; I can markup all sorts of
> sloppy things; therefore SGML is a bad grammar".
I would have thought that that flexibility makes SGML a *good* grammar.
SGML would be a GOOD encoding for (e.g.) a typesetting language. In
fact, it already is used in this way for SPDL.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Mon Nov 24 15:47:52 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:04 2004
Subject: Integrity in the Hands of the Client
Message-ID: <199711241545.CAA29767@jawa.chilli.net.au>
> From: Paul Prescod
> "Element: A component of the hierarchical structure defined by a
> document type definition;"
As distinct from
"Processing Instruction: markup consisting of system-specific
data that controls how a document is to be processed."
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cskerr at geocities.com Mon Nov 24 15:48:59 1997
From: cskerr at geocities.com (Charles Kerr)
Date: Mon Jun 7 16:59:04 2004
Subject: MS XML parser only works with IE...
References: <01bcf8a9$98d09820$0a08bdcc@infinity>
Message-ID: <3479A42C.A8BE480B@geocities.com>
(For those of you reading this on the xml-dev mailing list, the article
referred to in this letter is at http://www.javalobby.org/jn001.htm#xml
and WORA == Write Once, Run Anywhere)
> Stating that MSXML is "Arguably the best XML parser for Java today" I think
> is in error and inconsistent with the stated views of the Java Lobby and our
> commitment to WORA. MSXML is not 100% pure and the DSO (Data Source Object)
> applet only works with MS IE 4.0 browsers.
>
> These imports are hidden in the source
> import com.ms.com.*;
> import com.ms.com.IUnknown;
> import com.ms.com.Variant;
> import com.ms.osp.*;
> import netscape.javascript.JSObject;
>
> Not only that, but with this release of MSXML it appears that Microsoft is
> attempting to fragment the XML community by encouraging the use of
> non-standard > end tags and other things like the inline '&' that break
> XML. I can only conclude that Microsoft is giving MSXML away for free --once
> again-- in order to fragment the emerging XML standard. This is not a Java
> application, and MSXML is something we should NOT endorse or support.
The MSXML dependencies on Windows are apparently trivial and fixable.
Equally important, the MSXML EULA grants the right to redistribute such
modified code. See the three letters (from the xml-dev mailing list)
that I include at the end of this letter for more information. What I'd
like to see is someone post these fixes so that each person wanting
the portable version doesn't have to make the changes by hand. If
anyone
does this (clovett, you listening? :) and lets me know, I'll write it up
in the news.
Regarding MSXML's break with the XML spec, I was unaware of the > and
&
notation -- it was discussed in the xml-dev mailing list right before
I joined. It looks like Microsoft is, for once, interested in hearing
constructive feedback. In particular Chris Lovett
(clovett@microsoft.com)
has encouraged such feedback. Anyone interested in this topic should
send him polite mail requesting that MS stick to the spec.
I can understand why you would be upset about this. The splintering
by Microsoft of a great new technology is something that we Java
programmers seem mysteriously sensitive to. ;) Nevertheless, I'll stand
by my statement that MSXML is arguably the best XML parser for Java
today.
There are other choices, such as Lark (http://www.textuality.com/Lark/)
and NXP (http://www.edu.uni-klu.ac.at/~nmikula/NXP/preview/).
I'm cc'ing this to the xml-dev mailing lists in the hopes that it
will rekindle the discussion on the importance of sticking to the
specs.
Charles
cskerr@geocities.org
Unite for Java! http://www.javalobby.org/
--
Exerpts from three letters on the xml-dev mailing list
regarding XML's ties to Windows
[1]
> Windows dependency of MSXML is minimal. All you have to do is following:
> 1. remove com.ms.xml.dso package.
> Delete the class files from the jar and/or comment it out of the makefile.
> DSO is accessed by some of the samples but none of the other MSXML packages.
> 2. remove dependency on com.ms.xml.xmlstream package.
> Latest version of MSXML includes an alternate XMLInputStream class located
> inside the 'make' directory. Replace com.ms.xml.util.XMLInputStream with
> the alternate version to remove dependency on com.ms.xml.xmlstream package.
> WIth above two changes, you will end up with a pure-Java version of MSXML.
> MSXML is the most complete XML parser available right now and you get the
> source code on top of it. I would be smiling by now if I were you :-)
[2]
> The parser uses a newly-defined Interface to a stream library that is
> specific to XML. The parser does not use the implementations of streams
> provided in the JDK 1.1 packages for the internet. I believe that this has
> to do with byte-ordering problems in those implementations. I have not
> checked this for myself.
> The interface per se has no platform dependencies. It is shipped with two
> implementations. One implementation is specific to Windows, the other is
> generic Java using JDK packages. Neither has the byte-order flaw. You may
> use whichever one you prefer. Both work. The generic one has lower
> performance.
> --Andrew Layman AndrewL@microsoft.com
[3]
> How 'bout that! Microsoft's EULA even grants us the right to redistribute
> such modified code. Quite generous of them, I must say. Microsoft just
> went up a point in my rating system. I am indeed smiling now. :-)
> My apologies to the MSXML team.
> --
> Joe Lapp (Java Apps Developer/Consultant)
> Unite for Java! - http://www.javalobby.org
> jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Mon Nov 24 18:06:55 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:59:04 2004
Subject: Sequence Access Languages ...
Message-ID: <3.0.32.19971124100758.00936af0@mailhost.criinc.com>
>On Sun, 23 Nov 1997, Rick Jelliffe wrote:
>> If people just want a database dump format for nice relational tables,
>> comma-delimiter formats are available and attractive. But when they have
>> text which they don't want to have desequenced, SGML/XML can be useful.
It really depends on the requirements. For data with a long expected
life-time, XML may actually be a better choice than comma/tab delimited
file _because_ it is so verbose. If the original architects choose tag
names which are clear, then when someone approaches the data 10 years
later, and the original authors are long gone, the chance of this new-comer
understanding the data format increases significantly. This is what Steven
Newcomb calls self-descriptive documents. (Steven/Peter, did I get that
right?)
I have been bitten by problem that I write a quick and dirty data-dump tool
which dumps out to a tab-delimited file and then, a year later I can't
remember exactly what all the fields were. XML can help. It is not a
perfect solution, but it beats re-engineering software (esp if you don't
have source any more....)
but again, it all goes back to your requirements. If your data is only
going to be used by 3 programs you wrote, and the data has a short life
expectancy, then tab-delimited files are a good choice.
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Mon Nov 24 18:23:16 1997
From: mecom-gmbh at mixx.de (james anderson)
Date: Mon Jun 7 16:59:04 2004
Subject: Inheritance
References:
Message-ID: <3479C712.985B01D9@mixx.de>
greetings,
sorry to start in the middle of this thread, but as an xml novice i'm
wondering why one is at all concerned to extend a language intended to
mark up "structure" in order to encode "behaviour". (this being the
distinction made by separating 'class' and 'type').
why is it not sufficient to accept that an dtd form
encodes the structure of one class only, and to encode the type and/or
class relations in marked-up data, instead of adding new elements to the
definition language? (eg ELEMTYPE).
for example
ANOTHER-TYPEANY
would encode the same information. what advantage do the special forms
and the additional processing mechanisms offer?
why, for instances, isn't the generic dt-element definition
a typea model
? why does there need to be a BNF for document type definitions?
granted, i have gathered only that sgml background which i need to
vaguely understand XML's origins, but, in the processing of writing an
XML 'processor', i couldn't help but wonder why or whether all the
special forms are required by anything other than historical
contingency.
(in point of fact, since it's possible to structure processors which
transform all forms to a uniform intermediate representation, i doubt
that the syntactic distinctions are necessary.)
which brings me to ask why one would want to add more. for whatever
reason.
and, in passing, where it is noted
>And subclassing implies implementation-inheritance (i.e. code reuse),
>exactly what you were trying to avoid implying.
be careful not to conflate subclassing, through "implementation
inheritance", with code reuse. that applies only in languages which
identify class/structure-implementation with
behaviour-implementation. for a 'generic-function' language (eg. CLOS,
DYLAN) specifications for code reuse are in terms of the type relations,
not the class relations.
bye,
james anderson,
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From markb at iosphere.net Mon Nov 24 18:38:32 1997
From: markb at iosphere.net (Mark Baker)
Date: Mon Jun 7 16:59:04 2004
Subject: XML and standards (was Re: Integrity in the Hands of the Client)
In-Reply-To: <3479431B.BA9E33EE@technologist.com>
Message-ID:
On Mon, 24 Nov 1997, Paul Prescod wrote:
> > What if that troff document contained a link to an implementation of a
> > troff formatter? What if that implementation described its interface using
> > XML?
>
> What if it didn't? What if it described its interface using CORBA or
> some proprietary language that is more powerful than CORBA? You don't
> lose any flexibity or expressive power, you just have to write another
> parser for CORBA or your proprietary language.
My point is that if it did, then no longer are clients responsible for
interpreting the semantics of the data - a contained/referenced
implementation is.
In comp doc frameworks, when a new stream of data is introduced into a
container, the framework decides the type of the data and then attempts
to find an editor based on that type. The editor knows what to do with
that data, and negotiates with the container for the real-estate for its
presentation.
For XML docs, the "type" doesn't have to be a DTD, though that might
still be useful. The "type" could just as easily be a tag (so a single
document would contain many embedded types).
So if a well-formed document comes streaming into our container, the
framework would start parsing it, come across a tag called 'troff', and
then proceed to try and discover and install a chunk of code that knows
how to parse/render troff. Or the document could provide its own ref(s)
(more likely for scalability purposes). Either way, it's not the
container (the client) that's responsible for interpreting the semantics
of the data. It's the document itself that is responsible.
> When you use XML to replace an existing standard, you are, for a period
> at least, actually working against open standards and promoting a
> proprietary alternative, even if it is expressed in the standard
> notation of SGML/XML.
In the example above, how might we implement that framework without
assuming a data format?
MB
--
Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans
http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069
Will distribute business objects for food.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cskerr at geocities.com Mon Nov 24 18:58:26 1997
From: cskerr at geocities.com (Charles Kerr)
Date: Mon Jun 7 16:59:05 2004
Subject: MS XML paser only works with IE...
Message-ID: <3479D099.310B46C1@geocities.com>
(For those of you reading this on the xml-dev mailing
list, the article referred to in this letter is at
http://www.javalobby.org/jn001.htm#xml. WORA == Write
Once, Run Anywhere)
> Stating that MSXML is "Arguably the best XML parser for Java
> today" I think is in error and inconsistent with the stated
> views of the Java Lobby and our commitment to WORA.
> MSXML is not 100% pure and the DSO (Data Source Object)
> applet only works with MS IE 4.0 browsers.
>
> These imports are hidden in the source
> import com.ms.com.*;
> import com.ms.com.IUnknown;
> import com.ms.com.Variant;
> import com.ms.osp.*;
> import netscape.javascript.JSObject;
>
> Not only that, but with this release of MSXML it appears
> that Microsoft is attempting to fragment the XML community
> by encouraging the use of non-standard > end tags and other
> things like the inline '&' that break XML. I can only conclude
> that Microsoft is giving MSXML away for free --once again-- in
> order to fragment the emerging XML standard. This is not a Java
> application, and MSXML is something we should NOT endorse
> or support.
The MSXML dependencies on Windows are apparently trivial and
fixable. Equally important, the MSXML EULA grants the right
to redistribute such modified code. See the three letters
(from the xml-dev mailing list) that I include at the end of
this letter for more information. What I'd like to see is
someone post these fixes so that each person wanting the
portable version doesn't have to make the changes by hand.
If anyone does this (clovett, you listening? :) and lets me
know, I'll write it up in the news.
Regarding MSXML's break with the XML spec, I was unaware of
the > and & notation -- it was discussed in the xml-dev
mailing list right before I joined. It looks like Microsoft
is, for once, interested in hearing constructive feedback.
In particular Chris Lovett (clovett@microsoft.com) has
encouraged such feedback. Anyone interested in this topic
should send him polite mail requesting that MS stick to the spec.
I can understand why you would be upset about this.
The splintering by Microsoft of a great new technology is
something that Java programmers seem mysteriously sensitive
to. ;) And once you've written a body of code to work with
the MSXML API it will be a nuisance to rewrite it if MS
diverges even further from the Java or XML specs in the
future. Nevertheless, I'll stand by my statement that MSXML
is arguably the best XML parser for Java today. I commend MS
for their great work and challenge others to add some
competition. Lark (http://www.textuality.com/Lark/) is
one promising alternative to MSXML but doesn't have as many
features.
I'm cc'ing this to the xml-dev mailing lists in the hopes
that it will rekindle the discussion of the importance of
sticking to the specs.
Charles
cskerr@geocities.org
--
Unite for Java! http://www.javalobby.org/
------------------------------------------------------
Exerpts from three letters on the xml-dev mailing list
regarding XML's ties to Windows
[1]
> Windows dependency of MSXML is minimal. All you have to do
> is following:
> 1. remove com.ms.xml.dso package.
> Delete the class files from the jar and/or comment it out of
> the makefile. DSO is accessed by some of the samples but none
> of the other MSXML packages.
> 2. remove dependency on com.ms.xml.xmlstream package.
> Latest version of MSXML includes an alternate XMLInputStream
> class located inside the 'make' directory. Replace
> com.ms.xml.util.XMLInputStream with the alternate version to
> remove dependency on com.ms.xml.xmlstream package.
> WIth above two changes, you will end up with a pure-Java version
> of MSXML. MSXML is the most complete XML parser available
> right now and you get the source code on top of it. I would
> be smiling by now if I were you :-)
[2]
> The parser uses a newly-defined Interface to a stream library
> that is specific to XML. The parser does not use the
> implementations of streams provided in the JDK 1.1 packages for
> the internet. I believe that this has to do with byte-ordering
> problems in those implementations. I have not checked this
> for myself. The interface per se has no platform dependencies.
> It is shipped with two implementations. One implementation
> is specific to Windows, the other is generic Java using JDK
> packages. Neither has the byte-order flaw. You may
> use whichever one you prefer. Both work. The generic one has
> lower performance.
> --Andrew Layman AndrewL@microsoft.com
[3]
> How 'bout that! Microsoft's EULA even grants us the right
> to redistribute such modified code. Quite generous of them,
> I must say. Microsoft just went up a point in my rating system.
> I am indeed smiling now. :-)
> My apologies to the MSXML team.
> --
> Joe Lapp (Java Apps Developer/Consultant)
> Unite for Java! - http://www.javalobby.org
> jlapp@acm.org
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Mon Nov 24 19:02:26 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 16:59:05 2004
Subject: Sequence Access Languages ...
Message-ID: <199711241902.TAA09040@GPO.iol.ie>
Don't forget the DTD - the key difference between SGML/XML and other interchange
formats IMHO.
The DTD is at once rough sketch. formal blue-print, test-bed and QA check
for perhaps gigabytes of data.
>For data with a long expected
>life-time, XML may actually be a better choice than comma/tab delimited
>file _because_ it is so verbose. If the original architects choose tag
>names which are clear, then when someone approaches the data 10 years
>later, and the original authors are long gone, the chance of this new-comer
>understanding the data format increases significantly. This is what Steven
>Newcomb calls self-descriptive documents. (Steven/Peter, did I get that
>right?)
>
Sean Mc Grath
sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From elm at arbortext.com Mon Nov 24 20:13:06 1997
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun 7 16:59:05 2004
Subject: Integrity in the Hands of the Client
Message-ID: <3.0.32.19971124150739.0096d670@village.doctools.com>
At 12:04 PM 11/22/97 -0500, Nelson Minar wrote:
...
>I'm reminded of what happened in the first few months of 1994, when a
>lot of people suddenly learned HTML. One of the most useful documents
>(for me) of that period was Eric Tilton's essay "Composing Good HTML"
>(since turned into a book, "Web Weaving", with Carl Steadman and Tyler
>Jones). It was a short essay, but it laid out many of the basics of
>writing HTML well - issues beyond syntax. Style issues like "don't say
>'click here' in a document, integrate the anchor text into the
>narrative". Structural issues like "don't misuse headers" and "try to
>do logical formatting, not physical". And meta information
>recommendations, like "put your name on documents" and "put a last
>modified date on documents if it makes sense". For me, that essay made
>HTML made sense, gave some order to the varied capabilities of the syntax.
>
>I tried to do my bit back then by writing an HTML editor tool (an
>emacs mode) that made it easier to write good HTML. Indenting the HTML
>source to show the document structure, providing simple templates to
>get basic well formedness, automating last modified footers. And I
>think it was reasonably successful - pages written with my editor were
>at least a little better than pages written with nothing.
>
>
>XML needs similar style guidelines and tools if people are going to
>use it well. The problem for XML is harder than with HTML since XML is
>more powerful. I think XML will be most successful for casual document
>writers when there are standard well-established DTDs combined with
>style sheets that are simple to use and very well documented as to
>what the tags mean and how to use them. I don't know how to smooth the
>process of helping people develop their own DTDs.
I agree that XML needs similar guidelines; there's technology, and then
there are the techniques with which you apply it. It's ideal if new users
can get started with good habits as soon as possible.
I would say the problem for XML is harder because XML is more "meta" (and
it derives its extra power from that). Each DTD and DTD fragment will need
its own user/style guide -- many of the established DTDs already have user
guides, and for some there are even courses that teach you how to use them.
If I may, I'd like to suggest that budding XML DTD writers check out my
book, "Development SGML DTDs: From Text to Model to Markup" (ISBN
0-13-309881-8, published by Prentice Hall Professional Technical Reference
). It contains a system for doing the
requirements analysis for, designing, implementing, and testing DTDs, and
has a lot of technique advice in it (as well as some psychological advice
for dealing with the shock of migration :-).
Its focus is on publishing applications and corporate SGML use, but my
co-author, Jeanne El Andaloussi, and I have used the basic methodology to
create many DTDs for many different situations, and it seems to hold up
very well. Also, the analysis and design phases can be completed with
little detailed knowledge of SGML/XML language syntax.
We wrote the book precisely to "smooth the process of helping people
develop their own DTDs" for SGML; I'm certainly hoping that new XML users
will find it helpful too.
Best regards,
Eve
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mrc at allette.com.au Mon Nov 24 23:15:53 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun 7 16:59:05 2004
Subject: Sequence Access Languages ...
References: <3.0.32.19971124100758.00936af0@mailhost.criinc.com>
Message-ID: <347A0A8A.989C0D87@allette.com.au>
Derek Denny-Brown wrote:
> It really depends on the requirements. For data with a long expected
> life-time, XML may actually be a better choice than comma/tab delimited file
> _because_ it is so verbose. If the original architects choose tag names which
> are clear, then when someone approaches the data 10 years later, and the
> original authors are long gone, the chance of this new-comer understanding the
> data format increases significantly.
Or, if you would prefer, you could use shortref in SGML and parse the comma
delimited files, making your input both a database dump and a valid SGML instance
and your output valid XML. The point is, those of us coming to XML from SGML have
experienced, grappled with, partially solved or lived with a lot of issues that
those from other backgrounds may regard as being imperatives. The current
discussion is a natural result of diverse and intelligent opinions, but a natural
enemy of moderation and controlled change. I hope XML is allowed to settle in
before anyone tries to fix anything, as I doubt if anyone has clear and complete
perspective from all sides of this very large baby.
--
Regards
Marcus Carr email: mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia) email: info@allette.com.au
Level 10, 91 York Street www: http://www.allette.com.au
Sydney 2000 NSW Australia phone: +61 2 9262 4777
fax: +61 2 9262 4774
_______________________________________________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Tue Nov 25 00:14:27 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:05 2004
Subject: MS XML parser only works with IE...
Message-ID: <01bcf936$a360edc0$0100007f@localhost>
>The MSXML dependencies on Windows are apparently trivial and fixable.
>Equally important, the MSXML EULA grants the right to redistribute such
>modified code. See the three letters (from the xml-dev mailing list)
>that I include at the end of this letter for more information. What I'd
>like to see is someone post these fixes so that each person wanting
>the portable version doesn't have to make the changes by hand. If
>anyone
>does this (clovett, you listening? :) and lets me know, I'll write it up
>in the news.
FYI, after writing the first of the three letter mentioned above, I
contacted Andrew Layman at MS and offered to help make MSXML completely
portable without performance sacrifices. Both he and Chris Lovett liked the
idea and we worked hard to make it happen over a weekend. There was never
any hesitation from them about this effort and I am convinced that there was
absolutely no ill will from them regarding peculiar 'features' of MSXML.
They thought they were neat features and got their ears chewed off for it.
All they needed was a gentle reminder instead of the slap they got. Let us
not mix conspiracy theory with our judgement.
WORA version of MSXML is coming soon from Microsoft. It will compile and
run on any Java platform. It will take advantage of native libraries if
available without recompilation. Its WORA without sacrifices. Its WORA-FOW
(Write Once, Run Anywhere - Faster On Windows ;-p).
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Tue Nov 25 00:22:59 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:59:05 2004
Subject: Sequence Access Languages ...
Message-ID: <3.0.32.19971124162449.00a7b100@mailhost.criinc.com>
At 10:15 AM 11/25/97 +1100, Marcus Carr wrote:
>Derek Denny-Brown wrote:
>> It really depends on the requirements. For data with a long expected
>> life-time, XML may actually be a better choice than comma/tab delimited
file
>> _because_ it is so verbose. If the original architects choose tag names
which
>> are clear, then when someone approaches the data 10 years later, and the
>> original authors are long gone, the chance of this new-comer
understanding the
>> data format increases significantly.
>Or, if you would prefer, you could use shortref in SGML and parse the comma
>delimited files, making your input both a database dump and a valid SGML
instance
>and your output valid XML.
But using shortref would defeat the whole point of helping the documents to
be "self-describing". I agree that in some cases, SHORTREF is not a bad
idea, but I believe it should be sued sparingly. (Unless you are using it
as a trick to import existing data... in which case all rules are off)
> The point is, those of us coming to XML from SGML have
>experienced, grappled with, partially solved or lived with a lot of issues
that
>those from other backgrounds may regard as being imperatives. The current
>discussion is a natural result of diverse and intelligent opinions, but a
natural
>enemy of moderation and controlled change. I hope XML is allowed to settle in
>before anyone tries to fix anything, as I doubt if anyone has clear and
complete
>perspective from all sides of this very large baby.
There really is need of a good book, along the lines of of what Nelson
Minar was talking about when he refered to
>I'm reminded of what happened in the first few months of 1994, when a
>lot of people suddenly learned HTML. One of the most useful documents
>(for me) of that period was Eric Tilton's essay "Composing Good HTML"
and the need for something with XML. Such a task is much harder for XML
since XML can be used for many purposes.
I fail to understand how "the current discussion" is an "enemy of
moderation and controlled change". Which current discussion? In general,
there has been a very small amount of talk about the need for things to
change, and the significant comment (by Joe Lapp) to that effect, has
resulted in one of the better discussions on how an application architect
should plan to incorperate XML into their application, without "fixing" the
standard. A number of good concise explanations of how to get the most of
XML, and what the parser should do vs. the application.
It is amazing how trying to teach someone what you think you know can help
you understand the material even better. I am hoping that is true for a
group (XML-Dev) as well as for the individual...
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mrc at allette.com.au Tue Nov 25 02:52:39 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun 7 16:59:05 2004
Subject: Sequence Access Languages ...
References: <3.0.32.19971124162449.00a7b100@mailhost.criinc.com>
Message-ID: <347A3D57.2002F7FE@allette.com.au>
Derek Denny-Brown wrote:
> >Or, if you would prefer, you could use shortref in SGML and parse the comma
> >delimited files, making your input both a database dump and a valid SGML
> >instance and your output valid XML.
>
> But using shortref would defeat the whole point of helping the documents to be
> "self-describing".
As part of the document, the DTD would act as a formal centralised reference
rather than having to infer the structure by examination of the instances; I was
alluding to data handling generally rather than the point you were making about
self-describing documents.
> There really is need of a good book, along the lines of of what Nelson Minar was
> talking about when he refered to
> >I'm reminded of what happened in the first few months of 1994, when a
> >lot of people suddenly learned HTML. One of the most useful documents
> >(for me) of that period was Eric Tilton's essay "Composing Good HTML"
> and the need for something with XML. Such a task is much harder for XML since
> XML can be used for many purposes.
I agree, a flood of good books will be useful. I suspect that the diversity you
mention will lead to smaller publications dealing with single or fairly
tightly-grouped applications of XML.
> I fail to understand how "the current discussion" is an "enemy of moderation and
> controlled change".
Sorry, that does read very badly. What I mean is that answers to XML issues should
be given a fair chance to evolve naturally. HTML was allowed to be just a way to
present documents while people figured out how to extend it in various directions.
Although many lessons have been learned from HTML that XML can springboard from, I
think there is some danger in the perception that XML is the best way to do almost
everything. XML will be supplementary to what a number of organisations have been
doing for a long time - for many it will just be a way of putting SGML on the web
without converting to HTML first. "The current discussion" should of course go on
- I also read it with interest.
--
Regards
Marcus Carr email: mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia) email: info@allette.com.au
Level 10, 91 York Street www: http://www.allette.com.au
Sydney 2000 NSW Australia phone: +61 2 9262 4777
fax: +61 2 9262 4774
_______________________________________________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Nov 25 03:31:16 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:05 2004
Subject: MS XML parser only works with IE...
Message-ID: <199711250329.OAA07558@jawa.chilli.net.au>
> From: Don Park
> FYI, after writing the first of the three letter mentioned above, I
> contacted Andrew Layman at MS and offered to help make MSXML completely
> portable without performance sacrifices. Both he and Chris Lovett liked the
> idea and we worked hard to make it happen over a weekend. There was never
> any hesitation from them about this effort and I am convinced that there was
> absolutely no ill will from them regarding peculiar 'features' of MSXML.
> They thought they were neat features and got their ears chewed off for it.
> All they needed was a gentle reminder instead of the slap they got. Let us
> not mix conspiracy theory with our judgement.
The other point is that floating "&" is required in SGML (even with the
WebSGML adaptations, which have been accepted and are now being wordsmithed).
Short tagging ">" is an optional feature that can be enabled.
If MSXML chooses to support some convenient SGML features on top of XML,
I dont see what there is to complain of. It seems a bonus to me. One of
SGML's main attractive features is that it does not attempt to enforce
policy in many areas: it provides a toolkit and gives the user the choice.
This makes it more complex of course. XML is a choice of particular
features by various boffins and experts, and so XML will inevitably be
suboptimal for some uses.
And there is a lot of old SGML material. If having some clearly labelled
SGML extensions makes MSXML handle kinds of other kinds of SGML as well as
XML, great! In fact, the more full SGML implementation that MXSML provides
the better, IMHO. Give us more, Chris and Andrew! Allow entities to have
attributes like SGML does. Allow tag ommission like SGML and HTML do!
It is the nature of software to have experiments. It is futile, but still
good, to try to freeze syntax. I think this is why in the future
we will end up with a range of markup languages from XML to SGML '97. If this
is an alarming option (and it is), then the displine is for XML developers
(not parser makers) to only use XML features in their systems. I am sure
that everyone who has been through SGML will agree that it is difficult
to not all the time wish for your favorite enhancements. And, if you bite
the bullet and decide to go with the standard, you may then get flack for
being an unthinking sheep :-)
The problem is not with Microsoft for making their XML parser also handle
SGML better, the problem will be with users of the parser in software if they
use these features over the web rather than inhouse. I.e. the problem is
"us" not "them".
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Tue Nov 25 14:09:48 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:05 2004
Subject: MS XML parser only works with IE...
Message-ID:
>The other point is that floating "&" is required in SGML (even with the
>WebSGML adaptations, which have been accepted and are now being wordsmithed).
>Short tagging ">" is an optional feature that can be enabled.
I think we would do well to remember that XML is NOT SGML and should not be
allowed to fall prey to the incredible number of 'options' that have made SGML
worthless to a large number of developers. Short tagging is NOT an optional
feature of XML, and should NOT be a feature of MSXML either. If it is allowed
to be an optional feature, than my XYZ parser is either going to have to
accept Microsoft's 'extensions' or reject a lot of documents created by people
who only tested on the Microsoft tools.
>XML is a choice of particular
>features by various boffins and experts, and so XML will inevitably be
>suboptimal for some uses.
Fine. Let's start off suboptimal and get a standard that works instead of a
standard that can be embraced and extended by any software company that thinks
it has a new grand idea.
>Give us more, Chris and Andrew! Allow entities to have
>attributes like SGML does. Allow tag ommission like SGML and HTML do!
Do not give us more, Chris and Andrew, if you really like XML. If you want to
kill it quickly, add lots of extra SGML parts.
>The problem is not with Microsoft for making their XML parser also handle
>SGML better, the problem will be with users of the parser in software if they
>use these features over the web rather than inhouse. I.e. the problem is
>"us" not "them".
The problem is an incompatibility between the "us"es and "them"s of the world.
Keep XML as clean as possible, at least for now. Forget everything you knew
about SGML's intricacies and focus on what XML, not SGML, can do for the
world, and with any luck, the world might take XML sersiously.
While working on XML: A Primer, I used the Alpha 1.0 MSXML to test my code,
aware of many of its difficulties. As I discovered when 1.6 came out, it had
let me wander outside the spec in a number of key places (mixed declarations,
for one) that took my code outside of valid XML. I've fixed it all now, but
the experience has left me extremely wary of tools that go beyond the
standard, intentionally or accidentally.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From andrewl at microsoft.com Tue Nov 25 16:14:43 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun 7 16:59:05 2004
Subject: MS XML parser only works with IE...
Message-ID: <7BB61B44F197D011892800805FD4F79201CD6639@red-03-msg.dns.microsoft.com>
I think a little more grace and courtesy is called for here. Microsoft has
been working very hard to ship parsers that track the evolving spec. As
with any unfinished product, particularly one whose specifications are
clearly marked "work in progress," there are going to be some areas where
the product lags behind the spec or visa versa.
Regarding the short tagging, did anyone actually run the code? If so, you
would have discovered that the parser does not respect short tagging unless
you go out of your way to turn it on via an undocumented method that is not
meant for clients to call. It is not a secret feature (we give away the
source code) but it is not part of parsing normal XML. If we were trying to
trick people into using this facility, we sure went out of our way to fail!
I recommend approaching this with a bit more benevolence and researching
things a little more before assuming a conspiracy.
--Andrew Layman
AndrewL@microsoft.com
> -----Original Message-----
> From: Simon St.Laurent [SMTP:SimonStL@classic.msn.com]
> Sent: Tuesday, November 25, 1997 6:07 AM
> To: Rick Jelliffe; Xml-Dev (E-mail)
> Subject: RE: MS XML parser only works with IE...
>
> >The other point is that floating "&" is required in SGML (even with the
> >WebSGML adaptations, which have been accepted and are now being
> wordsmithed).
> >Short tagging ">" is an optional feature that can be enabled.
>
> I think we would do well to remember that XML is NOT SGML and should not
> be
> allowed to fall prey to the incredible number of 'options' that have made
> SGML
> worthless to a large number of developers. Short tagging is NOT an
> optional
> feature of XML, and should NOT be a feature of MSXML either. If it is
> allowed
> to be an optional feature, than my XYZ parser is either going to have to
> accept Microsoft's 'extensions' or reject a lot of documents created by
> people
> who only tested on the Microsoft tools.
>
> >XML is a choice of particular
> >features by various boffins and experts, and so XML will inevitably be
> >suboptimal for some uses.
>
> Fine. Let's start off suboptimal and get a standard that works instead of
> a
> standard that can be embraced and extended by any software company that
> thinks
> it has a new grand idea.
>
> >Give us more, Chris and Andrew! Allow entities to have
> >attributes like SGML does. Allow tag ommission like SGML and HTML do!
>
> Do not give us more, Chris and Andrew, if you really like XML. If you
> want to
> kill it quickly, add lots of extra SGML parts.
>
> >The problem is not with Microsoft for making their XML parser also handle
> >SGML better, the problem will be with users of the parser in software if
> they
> >use these features over the web rather than inhouse. I.e. the problem is
> >"us" not "them".
>
> The problem is an incompatibility between the "us"es and "them"s of the
> world.
> Keep XML as clean as possible, at least for now. Forget everything you
> knew
> about SGML's intricacies and focus on what XML, not SGML, can do for the
> world, and with any luck, the world might take XML sersiously.
>
> While working on XML: A Primer, I used the Alpha 1.0 MSXML to test my
> code,
> aware of many of its difficulties. As I discovered when 1.6 came out, it
> had
> let me wander outside the spec in a number of key places (mixed
> declarations,
> for one) that took my code outside of valid XML. I've fixed it all now,
> but
> the experience has left me extremely wary of tools that go beyond the
> standard, intentionally or accidentally.
>
> Simon St.Laurent
> Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
>
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Tue Nov 25 16:48:07 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:05 2004
Subject: MS XML parser only works with IE...
Message-ID:
>I think a little more grace and courtesy is called for here. Microsoft has
>been working very hard to ship parsers that track the evolving spec.
>...
>I recommend approaching this with a bit more benevolence and researching
>things a little more before assuming a conspiracy.
I wasn't promoting a conspiracy (the word appeared nowhere in my post), as you
might know if you remembered my messages from earlier this month, which
included a fairly extensive discussion of Microsoft's former demonstration of
short-tagging in the MSXML site, all of which has been removed. I have
researched this more extensively than I wanted to by a considerable margin. I
do not hold Microsoft to be a villain in this case.
The target of my post, which apparently lacked 'grace and courtesy' was not
Microsoft - it was the SGML folks who clamor for every piece of junk that's
littered the SGML spec to be included in XML. I clamored at one point for
CDATA myself, but I've decided to rest and let the spec take its own course,
as simple as possible. I'll elaborate on this in a more extended post later
this week.
Microsoft has created an excellent parser, and I'm very glad to hear regularly
on this list about your continual willingness to produce XML compliant and
100% Java XML parsing solutions. Keep up the good work, but please try to
read my postings a little more closely before assuming that I'm accusing
Microsoft of fomenting world grief.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Jon.Bosak at eng.Sun.COM Tue Nov 25 18:51:40 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun 7 16:59:05 2004
Subject: MS XML parser only works with IE...
In-Reply-To: <199711250329.OAA07558@jawa.chilli.net.au> (ricko@allette.com.au)
Message-ID: <199711251850.KAA16423@boethius.eng.sun.com>
| If MSXML chooses to support some convenient SGML features on top of XML,
| I dont see what there is to complain of. It seems a bonus to me.
This is a license to repeat the browser wars of the last three years
and hold users hostage to particular software packages. If you want
full SGML support, then lobby for *consistent* full SGML support.
Anything less than that will create exactly the kind of vendor
dependence that we are trying to get away from.
Jon
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Nov 25 19:12:50 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:05 2004
Subject: MS XML parser only works with IE...
In-Reply-To:
References:
Message-ID: <199711251913.OAA00314@unready.microstar.com>
Simon St.Laurent writes:
> While working on XML: A Primer, I used the Alpha 1.0 MSXML to test
> my code, aware of many of its difficulties. As I discovered when
> 1.6 came out, it had let me wander outside the spec in a number of
> key places (mixed declarations, for one) that took my code outside
> of valid XML. I've fixed it all now, but the experience has left
> me extremely wary of tools that go beyond the standard,
> intentionally or accidentally.
As I remember hearing it a few years back, one of the basic rules of
the Internet was to be conservative in what you produce and liberal in
what you accept. With that in mind, I'd suggest using a very strict,
validating parser on the authoring side, like NSGMLS or NXP (I haven't
tried Lark). On the production side, use whatever works for you.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Nov 26 01:17:31 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:05 2004
Subject: Parser considerations (was: MS XML parser only works with
IE...)
In-Reply-To: <199711251913.OAA00314@unready.microstar.com>
References:
Message-ID: <3.0.1.16.19971126020423.2a07fd4a@pop3.demon.co.uk>
At 14:13 25/11/97 -0500, [many people] wrote about MSXML
Some of the things we mustn't forget at this time are:
- there is as yet no frozen XML 'recommendation' (I hope that's the
correct term). Under those circumstances it is unlikely that there are any
completing conforming parsers; the spec is still changing and so any parser
has addressed a moving target.
- for many people helping in the development of XML the question of 'best
parser' is not appropriate at this stage - and I suspect not for at least 3
months. The spec is quite large and is a lot of effort to implement (those
of us who have hacked parsers know). Many of us give up on points we don't
understand (for me it was parameter entities, and that caused others grief
as well :-). So until we see the next spec [is there a later public one
than Aug 7?] we can't be sure whether a parser 'gets PEs right' :-). I
sympathise with anyone who has failed to implement part of the current
spec, and I hope that people trying out parsers and other software will
take a constructive view of such 'failings'.
- I believe that all parser writers at present would like their parsers
validated. Validation *of* a parser seems to me to include checks on
- reporting errors in non-conforming XML documents
- asserting that a conforming XML document is conforming
- carrying out defined transformations on the original input
All of these require a set of test inputs, which I believe we badly need at
present. It is very likely that a parser writer at present will overlook
something in the spec.
Checking the transformations is less easy as there is no defined output.
How, for example, do we check that parser A transforms all the entities
correctly? An important way is to make sure that the outputs of two
independent parsers agree. To this extent, whatever we think about
'steenking ESIS' [a quote from the source code of a well known XML parser],
it is at least checkable :-)
- the really hard bit comes when the semantics of behaviour are unclear.
Does the statement require the parser to
*do* anything? Different authors will certainly have different ideas - some
see it as a request by the author that the document must be validated -
authors that if the reader wishes to validate it, then this is the doctype
that should be used.
There are many subtleties of this sort.
I believe that the development of XML has been one of the outstanding
achievements of the WWW. It has been fast, rigorous, fair, open, and
required extraordinary commitment and patience from those involved. Often
the SIG has had 50 emails a day, and many have required a great deal of
careful reading.
I have been very gratified by the level and amount of constructive
contributions to XML-DEV as this is an important area for ironing parts the
spec cannot reach. I remember the agonies of early C++ compilers where
every platform and vendor had messages 'this feature not supported' and so
on. I believe that all contributors on this list want to avoid this and
that 'any valid XML document can be parsed with any XML parser'. Since some
parsers may purport to be XML compliant but not be, it is critical that
this fact can be recognised, and a test suite of documents seems to be a
key instrument. I hope very much that authors of such parsers will be able
to find the energy to mend them :-)
If - at some future time - I were looking for attractive features in an
XML parser and after discarding the non-compliant ones, I would want to
consider a wide range and I doubt that any one parser would 'win' in all
aspects. To this end I am trying to make JUMBO accept a range of parsers by
a simple commandline switch (or button). Thus:
java jumbo.sgml.SGMLTree foo.xml parser=NXP (or Lark)
I can quite envisage where a user wants to use parser A to read in the
initial document (perhaps because it is large, or tree-structured) and
parser B to read the entities.
I am delighted to hear about WORA-MSXML, and shall hope to look at it
shortly. I hope it's easy to bolt into JUMBO.
I am slightly disappointed that Xapi-J seems to have become dormant,
because then work inside JUMBO would be minimal. At present most of the
parsers I have encountered are event-driven (e.g. doStartTag, doError...)
and not all build trees (JUMBO is happy to build trees from streams) . If,
indeed, this is the model most people use, then let's get a standard
terminology (Element, PI, ElementType, Attribute, etc.) It would make
things so much simpler. I also expect we could get a very very simple API
defined...
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Nov 26 01:20:17 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:05 2004
Subject: Inheritance
In-Reply-To: <3479C712.985B01D9@mixx.de>
References:
Message-ID: <3.0.1.16.19971126021919.2f7726c0@pop3.demon.co.uk>
At 19:27 24/11/97 +0100, james anderson wrote:
>greetings,
>
>sorry to start in the middle of this thread, but as an xml novice i'm
>wondering why one is at all concerned to extend a language intended to
>mark up "structure" in order to encode "behaviour". (this being the
>distinction made by separating 'class' and 'type').
We all started off as novices, so don't be afraid. I'm assuming that
everyone on this list knows that the mails are hypermailed at:
http://www.lists.ic.ac.uk/hypermail/xml-dev
and that it is possible to search this archive. So you can go back to the
start of this thread if it helps (I don't know whether it does or not).
Also I have attempted to abstract some of the posting that may have some
lasting value in http://www.vsms.nottingham.ac.uk/vsms/xml/jewels.html
XML discussions have a cyclic nature - like sunspots - the same topic
reoccurring at intervals of a few months. Since it's often due to 'novices'
joining the club, we're delighted. You will also find that the SGML
community is patient and does not regard ignorance as a crime (some other
things are :-). Precision in language is highly valued.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From shibl at w4.ca Wed Nov 26 02:05:57 1997
From: shibl at w4.ca (Shibl Mourad)
Date: Mon Jun 7 16:59:05 2004
Subject: MS XML parser only works with IE...
References: <199711251850.KAA16423@boethius.eng.sun.com>
Message-ID: <347B8421.E77@w4.ca>
Jon Bosak wrote:
> This is a license to repeat the browser wars of the last three years
> and hold users hostage to particular software packages.
I know that I am going to be hated for saying this, but the browser wars
was a phenomenal success and prompted the development of excellent and
useful technology very rapidely.
Compare this with standards first technology (eg SGML) where the rate of
progress is much slower and the end benefits to the user (not the
devloper) much smaller.
XML needs some breathing space where new features could be made to live
if popular and die if irrelevant.
Shibl
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Nov 26 02:23:59 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:06 2004
Subject: MS XML parser only works with IE...
Message-ID: <199711260224.VAA00626@unready.microstar.com>
Shibl Mourad writes:
> Jon Bosak wrote:
>
> > This is a license to repeat the browser wars of the last three years
> > and hold users hostage to particular software packages.
>
> I know that I am going to be hated for saying this, but the browser wars
> was a phenomenal success and prompted the development of excellent and
> useful technology very rapidely.
Both of these statements are, to an extent, correct. The browser wars
introduced or brought into the mainstream many interesting
innovations, but few (if any) of the good ones are a result of the
mess that both Netscape and Microsoft have both made of HTML.
Applets, real-time audio and video, virtual-reality, animations, and
other types of interaction have certainly made the web more exciting,
but why is it so difficult to find web pages that display well on my
640x480 notebook screen (and what's going to happen on even
lower-resolution TV screens)? How many web pages could
visually-impared people usefully have their software read aloud to
them? Why is it sometimes hard to write a web page that displays
properly in both Netscape and MSIE?
It is possible to innovate without messing around with the standards
(though, to be fair, there won't be an XML standard as such for a
couple more weeks).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Wed Nov 26 11:32:40 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:06 2004
Subject: MS XML parser only works with IE...
Message-ID: <199711261130.WAA17193@jawa.chilli.net.au>
> From: Simon St.Laurent
> I think we would do well to remember that XML is NOT SGML and should not be
> allowed to fall prey to the incredible number of 'options' that have made SGML
> worthless to a large number of developers.
The current XML draft says "(XML) is an extremely simple dialect of SGML."
That is the first sentence of the abstract. I was a member of the SIG from
quite early on, and it has always been the official line.
So XML says it is SGML. Furthermore, the recent correction to SGML (WebSGML), which
is in its next-to-final draft before release (it has already been voted)
means that there should be no doubt that the national standards bodies
involved with ISO want SGML to be XML-accepting too. I have attended ISO
meetings on this, and the ISO people certainly do not see XML as something
independent of SGML either.
The optional features of SGML have not made it worthless to developers.
The complexity of unadorned SGML and the generality of its toolkit approach
is the thing that made it dificult. The very thing that makes you rich makes
you poor.
XML (and the companion change to the SGML standard) have reduced this
base level. It is pure blue-sky to think that one syntax can meet everyone's
need. I am not saying there should be options in XML. I am saying if
someone wants more than XML, there are many things in SGML that are useful,
and if Microsoft want to implement them, good for Microsoft.
Of course, these should not be termed "experimental XML" features. They
should be labelled "non-XML SGML" features. I already said words to that
effect.
> Fine. Let's start off suboptimal and get a standard that works instead of a
> standard that can be embraced and extended by any software company that thinks
> it has a new grand idea.
Am I saying anything other? XML was developed as the technology of choice for
delivering SGML on the Web. I support that 100%. But if a company wants
to use something more powerful at their back-end, why shouldn't they use
a more powerful language nearer SGML if that serves their inhouse needs
better. And why shouldnt Microsoft allow this in their parser?
Any tools just need to have a checkbox marked "XML only" to keep things
obvious. And XML has draconian error correcting, so data with more than
XML will not work over the web anyway!
> Keep XML as clean as possible, at least for now. Forget everything you knew
> about SGML's intricacies and focus on what XML, not SGML, can do for the
> world, and with any luck, the world might take XML sersiously.
The spanner is that many of SGML intricacies are responses to real problems.
For example, XML (and WebSGML) let you pass all whitespace to the application,
which means the application itself must be more complicated since there is
no standard way to cover the problem of what to do if your editor has a fixed
line length and you need to stick in an element that would cause a wrap, but
you do not want to put in a newline in the data.
XML development has been an exhaustive analysis of every part of mainstream
SGML. And I think almost everyone on the SIG would agree that there are
good reasons for almost all the non-intuitive parts of SGML. However, the
need to be straightforward (the #1 goal of XML) means that there is
a different cost/benefit trade-off for deciding what should go into the
base language (compared to SGML in the early 1980s).
The English-using world already runs on SGML. Computer chips, air
transport, legal systems, the military, many stock markets,
much print media, diagnostics of office equiement, and (with HTML 4.0)
WWW. Any claim that SGML is not good for what it has tried to do
are wrong, as far as the market has spoken.
> The target of my post, which apparently lacked 'grace and courtesy' was not
> Microsoft - it was the SGML folks who clamor for every piece of junk that's
> littered the SGML spec to be included in XML.
Do you have access to the deliberations of the XML SIG or WG? If you do not,
you have no way of knowing what "SGML people" clamoured for, and if you do
then you are just wrong.
The minimal SGMLs that were proposed (by "SGML people" since there were no others)
at the start were all substantially smaller than what we have now in XML.
In fact, XML has grown largely because we found there was so much of SGML that
was needed. Only this week there are last minute calls (from "SGML people",
who Simon deems himself to be so different from) to make several quite
important simplifications to XML.
And, in any case, the distinction between SGML and XML people is entirely
spurious. If you use XML, you are an SGML person. You have bought into
the idea of using a human readable Language, of adding Markup to character
data, of markup up Generalized elements rather than a fixed low level tagset,
and you think it is good to have a common Standard. The fact that you
find ISO 8879 baffling and horrible does not make you anti-SGML, an more
than the fact that I cannot read my video recorder manual make me anti-TV.
SGML is not the enemy. The enemy is poorly described data that is no use,
and systems that are inappropriately complicated (or simple) for their
user requirements. SGML is merely a toolkit for constructing markup
languages, which includes a lot of features that are not relevant
to delivering structured data over the Web.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Wed Nov 26 13:35:47 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:06 2004
Subject: SGML and XML
Message-ID:
XML is the best opportunity I've yet seen to create a standard which handles
documents (and other data) intelligently yet simply. Whatever XML's roots
(which of course are SGML), XML has the opportunity to reach an extremely
broad audience - an audience the size of the current (and future) HTML
audience, not just the established SGML community.
The terms of the XML discussion have always been framed in SGML, and are
likely to continue to be for a considerable time to come. While that has
advantages, I don't think the concept of using XML as a Trojan Horse to
introduce SGML proper to a larger audience is a good one. I gave a seminar
two weeks ago in Washington DC to the ACM - a place and an organization that I
would tend to think of as friendly to SGML. Of 50 people in the seminar
(which was on Dynamic HTML), 15 had worked with SGML. Every time I brought up
SGML (in connection with XML, CSS, and the DOM), I was greeted with questions
about "is that really necessary?" "Are those SGML people trying to change
_our_ world?" These questions didn't just come from the HTML beginners; many
of them came from the developers who had worked with SGML, some quite
extensively. At lunch the discussion quickly turned to XML, and I had to do a
lot of convincing to get people 'past' SGML.
For public relations reasons, it seems like XML needs to be able to have it
both ways. Companies already using SGML and developing SGML tools need to be
encouraged to accept XML - not as a replacement for SGML, but as something to
take seriously. The larger non-SGML community, however, needs to be given XML
as something new and different. XML should not just carry in SGML's
reputation as a complicated, slow-to-develop, and difficult-to-implement tool
of the Federal Government. XML evangelists need to be able describe the
problems that XML fixes and how it fixes them, without reference to enormous
systems that SGML has created in the past.
>So XML says it is SGML. Furthermore, the recent correction to SGML (WebSGML),
which
>is in its next-to-final draft before release (it has already been voted)
>means that there should be no doubt that the national standards bodies
>involved with ISO want SGML to be XML-accepting too. I have attended ISO
>meetings on this, and the ISO people certainly do not see XML as something
>independent of SGML either.
XML says it is SGML. Fine. But should the future development of XML be aimed
at gradually including SGML features, or should it be aimed at meeting the
needs of the developing XML community? I expect the XML community in six
months to a year to be rather distinct from the SGML community and hopefully
quite a bit larger. This issue will grow; we'll see what the W3C and ISO do.
>The complexity of unadorned SGML and the generality of its toolkit approach
>is the thing that made it dificult. The very thing that makes you rich makes
>you poor.
And conversely, the thing that makes you poor will make you rich. HTML took
off because it was brilliantly simple. (There were plenty of other factors,
of course, but simplicity was key.) SGML has done very well in sectors that
were able to make the investment in learning SGML, developing in SGML, and
creating systems around SGML. XML has the opportunity to take its much
simpler toolkit to a much larger audience. Simplicity is key to reaching that
larger audience; adding SGML features, even with an on/off switch, is likely
to confuse new users of XML while still disappointing the SGML community.
>But if a company wants
>to use something more powerful at their back-end, why shouldn't they use
>a more powerful language nearer SGML if that serves their inhouse needs
>better. And why shouldnt Microsoft allow this in their parser?
If a company wants to use something more powerful, why don't they consider
'real' SGML an get a parser designed for that instead of creating documents
that are called XML but are no longer XML? Using this suggestion effectively
will require a new series of standards to define what features of SGML have
been added to a set of documents so that people don't blindly run them through
XML parsers with the switch set wrong. Data interchange will be a mess, once
again.
>XML development has been an exhaustive analysis of every part of mainstream
>SGML. And I think almost everyone on the SIG would agree that there are
>good reasons for almost all the non-intuitive parts of SGML. However, the
>need to be straightforward (the #1 goal of XML) means that there is
>a different cost/benefit trade-off for deciding what should go into the
>base language (compared to SGML in the early 1980s).
There is a completely different cost-benefit analysis. XML is the grand
opportunity to extend generalized markup to a far larger audience than exists
today. There may be good reasons for almost all the non-intuitive parts of
SGML, but the fact remains that these non-intuitive features have been
barriers to use and development. After reading some of the ISO specs and too
large a chunk of the SGML literature, it became quite clear to me why SGML
never percolated down to small companies and developers. It's too complicated
to be used without considerable upfront investment.
>The English-using world already runs on SGML. Computer chips, air
>transport, legal systems, the military, many stock markets,
>much print media, diagnostics of office equipment, and (with HTML 4.0)
>WWW. Any claim that SGML is not good for what it has tried to do
>are wrong, as far as the market has spoken.
The market has spoken that SGML does a great job for managing enormous amounts
of information. It has also spoken that SGML presents enormous barriers to
entry (steep learning curve, cost of development, etc.) that have kept a lot
of people from using it. SGML does a great job in many systems. The "many"
there, however, is a tiny select few compared to the many that a simpler
syntax (i.e. XML) could reach. The scale of those projects is very different
from those XML makes possible.
>And, in any case, the distinction between SGML and XML people is entirely
>spurious. If you use XML, you are an SGML person.
This distinction will grow as XML is adopted more widely. Visit the high-end
web development mailing lists and you'll find an incredible amount of
hostility to SGML but a simmering interest in XML. If you use XML, you are
using SGML tools. This does not make you an SGML person. As you may have
detected, I do have a certain amount of hostility toward SGML and SGML
culture, while remaining very enthusiastic about XML.
>SGML is not the enemy. The enemy is poorly described data that is no use,
>and systems that are inappropriately complicated (or simple) for their
>user requirements. SGML is merely a toolkit for constructing markup
>languages, which includes a lot of features that are not relevant
>to delivering structured data over the Web.
XML appears to be addressing the problems with SGML that have kept it from
being used by a wider audience. Poorly described data is the real enemy, of
course. Attacking that enemy in a larger sense requires a reconsideration of
the weapons we have used previously and a refinement. XML's simplicity will
encourage a large number of people to describe their data properly, people who
wouldn't have bothered with SGML.
This is an improvement, and the SGML community deserves great credit for the
effort they have poured into building a simple but useful toolkit, which
avoided the byzantine complexity SGML proposals are known for. XML is more
than just SGML, however. XML is going to bring a lot of 'bozos' into the
field of markup, people who care neither about the history nor the theory and
just want to get things done. A different attitude and different needs will
very likely increase the demands for XML to find its own voice.
I could, of course, be dead wrong. We'll know in a couple of years.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Wed Nov 26 14:41:13 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:06 2004
Subject: MONDO Design Document v0.3
Message-ID:
The first public release of the MONDO Design document is available at:
http://www.chimu.com/projects/mondo/design/mondoDesign.pdf
The home page for the MONDO project is:
http://www.chimu.com/projects/mondo/
The following is the first overview paragraph:
=========================================================
This document describes MONDO, a generalized architecture for encoding,
modeling, and processing information. MONDO is the result of evolving
and integrating the concepts from descriptive markup with the concepts
from object-oriented information modeling. This produces a very flexible
and powerful system for working with both structured documents and
human-readable information models, and removes the boundaries separating
them. The techniques and tools from multiple industries can be focused
on common problems.
=========================================================
The document is not quite where I was hoping it would be, but enough of
the core concepts are there that it should be readable. Another version
of the document will come out in the next week or so to address some of
the difficiences and the feedback that I receive.
We normally publish documents in HTML as well as PDF but the conversion
program is crashing over some of the diagrams and we have not had time to
track them down and fix them. This will be fixed in the next release.
The release of MONDO-J code will probably be next week and I will send
out notice when it is downloadable.
All feedback is very appreciated.
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Wed Nov 26 16:21:30 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:06 2004
Subject: SGML and XML
Message-ID: <199711261619.DAA22835@jawa.chilli.net.au>
> From: Simon St.Laurent
> I gave a seminar
> two weeks ago in Washington DC to the ACM - a place and an organization that I
> would tend to think of as friendly to SGML. Of 50 people in the seminar
> (which was on Dynamic HTML), 15 had worked with SGML. Every time I brought up
> SGML (in connection with XML, CSS, and the DOM), I was greeted with questions
> about "is that really necessary?" "Are those SGML people trying to change
> _our_ world?" These questions didn't just come from the HTML beginners; many
> of them came from the developers who had worked with SGML, some quite
> extensively. At lunch the discussion quickly turned to XML, and I had to do a
> lot of convincing to get people 'past' SGML.
> For public relations reasons, it seems like XML needs to be able to have it
> both ways.
> As you may have
> detected, I do have a certain amount of hostility toward SGML and SGML
> culture, while remaining very enthusiastic about XML.
So you are a speaker with hostility to SGML, and your audience picks up on it.
Maybe that just means you are a sympathetic and hypnotic speaker :-)
However, I do think that a lot of the antagonism against SGML is actually
antagonism against the standard ISO 8879 (which is not intended to be remotely
entry-level or novice-friendly) mixed with antagonism against the early HTML
DTDs (which were overly-complicated, IMHO, in structure for their readerships,
as it turned out).
Plus the fact that SGML implementations often involve
converting peoples minds from presentation structure to logical structure,
which many people find is a big change in discipline and job description
(XML wont alter that!). Plus many SGML editing environments are not set up
to simulate element structures with different formatting, so an operator cannot
use simple visual cues of presentation to keep track of their progress.
I worked for a company (Allette) that gets most of its jobs from SGML projects
that had failed at other companies. When we looked at what made them fail,
it was very often because the DTD did not describe the structures required,
or because of invalid documents which reflect poor QC, and because not smart
enough programming systems were used. XML does not address any of these issues,
so I think that the kinds of projects Allette was troubleshooting (which are
presumably ones that will feed out disgruntled programmers to ACM meetings)
would not have been helped.
Which is in no way to deny that SGML the technology does not have some dross,
and that its wording can be improved.
> This is an improvement, and the SGML community deserves great credit for the
> effort they have poured into building a simple but useful toolkit, which
> avoided the byzantine complexity SGML proposals are known for.
Which proposals?
> XML is more
> than just SGML, however. XML is going to bring a lot of 'bozos' into the
> field of markup, people who care neither about the history nor the theory and
> just want to get things done. A different attitude and different needs will
> very likely increase the demands for XML to find its own voice.
Yes. And all the questions "are declarations good", "should we remove constants
to headers, or allow inline declarations?", "why isnt everything an element,
wouldnt that be simpler?", and "why cant we leave out these strings, since they
are not needed for parsing?" and so on.
The trouble with slagging off at SGML is that because there is no
difference in technology between XML and WebSGML, it all ends
up in personal attacks on people who have been able to use SGML, or
on the people who invented it, or even just us innocent bystanders who
happen to go to committee meetings. I have seen this happen many times
before. (I am not saying you are doing this Simon, merely that I have
seen it many times. In anycase, you are writing a book and your antagonism
will teach a new generation of XML people, who may therefore feel
less likely to buy my book :-)
Please say "XML is simpler than SGML '86" and "XML is better for small
systems than SGML '86" and "SGML has many things that are not needed"
but not "SGML people are trying to make everything complicated and make
XML as bad, complex, over-engineered and stinky as SGML". This demonizing
of "SGML people" is a bad way to win people over to XML.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Nov 26 18:12:13 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:06 2004
Subject: MS XML parser only works with IE...
References: <199711251850.KAA16423@boethius.eng.sun.com> <347B8421.E77@w4.ca>
Message-ID: <347C6745.DFDE457@technologist.com>
Shibl Mourad wrote:
>
> Jon Bosak wrote:
>
> > This is a license to repeat the browser wars of the last three years
> > and hold users hostage to particular software packages.
>
> I know that I am going to be hated for saying this, but the browser wars
> was a phenomenal success and prompted the development of excellent and
> useful technology very rapidely.
Yes, competition is good. But proprietary extensions like those made to
HTML *retard* competition by raising the bar for new participants.
> Compare this with standards first technology (eg SGML) where the rate of
> progress is much slower and the end benefits to the user (not the
> devloper) much smaller.
Please back up this statement. Do we consider a Fortune 500 company
slashing their technical writing budget "a user"? Of would we call the
individual technical writers, who can reduce duplication, find
information faster and reuse it more effectively "the user."
In either case, how would you argue that SGML, Java, C++, IPNG, CORBA
and other "standards first" technologies have "few benefits to the
user."
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Nov 26 18:17:29 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:06 2004
Subject: MS XML parser only works with IE...
References:
Message-ID: <347C688D.533E0037@technologist.com>
> Short tagging is NOT an optional
> feature of XML, and should NOT be a feature of MSXML either. If it is allowed
> to be an optional feature, than my XYZ parser is either going to have to
> accept Microsoft's 'extensions' or reject a lot of documents created by people
> who only tested on the Microsoft tools.
If a user enables a non-standard option, they get what they deserve.
It's as simple as that. Every compiler I have ever used has had flags
for non-standard options. When Microsoft serves non-standard documents
over the Web, that's another issue. The web is the place for
interoperability.
But in Microsoft's own source code, they can embed an RTF parser if they
bloody well feel like. They do have a responsibility to make clear the
distinction between the RTF features and the XML features, of course,
but they don't have a responsibility to make software that exclusively
handles W3C XML.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Nov 26 18:19:23 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:06 2004
Subject: SGML and XML
References:
Message-ID: <347C6514.CB6804E8@technologist.com>
Simon St.Laurent wrote:
> XML says it is SGML. Fine. But should the future development of XML be aimed
> at gradually including SGML features, or should it be aimed at meeting the
> needs of the developing XML community?
This is a completely false dichotomy. XML will grow *both* to gradually
include SGML features and to extend SGML in ways specific to the Web
community. The relevant example is the short-tag syntax. This is *much
more* appropriate on the Web, where everyone is used to editing things
by hand, than in the SGML world, where we often buy expensive editors or
use emacs. It is also much more appropriate in XML, which does not have
tag minimization than in general SGML, which does. In other words, the
Microsoft people were trying to solve a problem for Web users by
recognizing a good idea in SGML. This is exactly *why* XML was designed
to be a subset of SGML (it didn't have to be).
> If a company wants to use something more powerful, why don't they consider
> 'real' SGML an get a parser designed for that instead of creating documents
> that are called XML but are no longer XML? Using this suggestion effectively
> will require a new series of standards to define what features of SGML have
> been added to a set of documents so that people don't blindly run them through
> XML parsers with the switch set wrong. Data interchange will be a mess, once
> again.
I can't believe that this is your logical extrapolation from an
*undocumented* switch in a parser for a language that doesn't exist yet.
The mere hint of extra features is enough to bring the Web crashing to
its knees.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Wed Nov 26 18:29:43 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:59:06 2004
Subject: SGML and XML
Message-ID: <3.0.32.19971126103137.00a9b6e0@mailhost.criinc.com>
At 01:18 AM 11/27/97 +1100, Rick Jelliffe wrote:
>> From: Simon St.Laurent
>> I gave a seminar
>> two weeks ago in Washington DC to the ACM ...
>> ... Every time I brought up
>> SGML (in connection with XML, CSS, and the DOM), I was greeted with
questions
>> about "is that really necessary?" "Are those SGML people trying to change
>> _our_ world?" These questions didn't just come from the HTML beginners;
many
>> of them came from the developers who had worked with SGML, some quite
>> extensively. At lunch the discussion quickly turned to XML, and I had
to do a
>> lot of convincing to get people 'past' SGML.
>
>However, I do think that a lot of the antagonism against SGML is actually
>antagonism against the standard ISO 8879 (which is not intended to be
remotely
>entry-level or novice-friendly) mixed with antagonism against the early HTML
>DTDs (which were overly-complicated, IMHO, in structure for their
readerships,
>as it turned out).
I would tend to disagree. I have talked to a number of people who are
antagonistic against SGML because the standard is so complicated. The fact
that it takes a book that large to really give an implementor enough
information to build a parser says something. As does the fact that SP is
roughtly 1Mb compiled. There are reasons for all of this, but people tend
to avoid things which take too long to understand, and react adversely when
they are forced to use something which they don't understand. Part of the
problem falls back to the tools, but if the initial standard had been more
directed to a specific audience, then the tools would have been easier.
Generality has its pros and cons. SGML was so general that it was
extremely complicated and only the determined could wade through the
initial waves of confusion. Thus there were very few people who
'understood' this SGML thing, so organizations trying to use SGML had to
get by with people who "didn't get SGML," and as a result had a horrid time
at it. Thus there are a number of people who think SGML is "a bad thing"
because 3/4 projects using it crashed and burned... (the preceeding figure
is purely random. I personnaly have watched a number of projects fail, but
I claim no knowledge of a general success/failure rate....)
This is not to say SGML is a bad thing. SGML is based on some extreemly
sound ideas, which are real driving requirements in a number of industries.
(otherwise SGML would have been dead a long time ago) XML (hopefully) is
the necessary compromises to get SGML used in more of the cases where it
can really provide benefit.
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Nov 26 18:51:12 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:06 2004
Subject: SGML and XML, parsers, etc.
In-Reply-To: <199711261619.DAA22835@jawa.chilli.net.au>
Message-ID: <3.0.1.16.19971126185622.0a2728da@pop3.demon.co.uk>
The initial (and I hope) the current idea of this list is for 'XML
developers'. This is very widely interpreted and there has been a very high
percentage of top quality contributions. A few recent ones have tended to
be statements of opinions and, although I certainly don't want to stifle
discussion, they don't contribute to the *development* of XML.
There is still a serious lack of resources in the public arena. Maybe there
are lots of people waiting to announce things as soon as the spec is
'frozen' :-). At present, however, we do not have any/sufficient :
- test documents
- tutorials
- editing tools
- post-parser applications
- class libraries for common functions (e.g. entitySubstitution)
Some posters have felt that XML is too rigid (i.e. we should break the
specs), completely broken, not powerful enough etc. It's not helpful to
elaborate these views here as they don't contribute to the development of XML.
However, as a compromise, if anyone wishes to post such views here, I
think we can allow them in a WF or valid XML document (self-contained,
please). Use as much markup as you can so that we can test parsers to
destruction :-) In that way we can accumulate a body of XML documents...
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mrc at allette.com.au Wed Nov 26 22:01:29 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun 7 16:59:06 2004
Subject: SGML and XML
References:
Message-ID: <347C9C17.F74678C7@allette.com.au>
Simon St.Laurent wrote:
> The market has spoken that SGML does a great job for managing enormous amounts
> of information. It has also spoken that SGML presents enormous barriers to
> entry (steep learning curve, cost of development, etc.) that have kept a lot of
> people from using it. SGML does a great job in many systems. The "many" there,
> however, is a tiny select few compared to the many that a simpler syntax (i.e.
> XML) could reach. The scale of those projects is very different from those XML
> makes possible.
Our company ramped into SGML by doing conversions from one proprietary format to
another. Even on relatively small data sets, we frequently used SGML in the middle
because tools like OmniMark made it easy to gather semantic information and apply
context-sensitive formatting on the down-translate. This meant that many of our
clients didn't even know that they used SGML. If you looked at this intermediate
data, you would not be able to classify it as SGML or XML - it is both, leaving
the only difference the tools that you use to manipulate the data.
You sound somewhat bitter about SGML, perhaps due to a large and difficult
project, but there are numerous small, simple SGML implementations around as well.
I'm not suggesting this approach is necessarily the norm, but nor do I don't think
that the delineation between what should be an SGML or XML project is as clear as
you imply - in many cases we plan to call the normalised output from an SGML
parser XML. Why not?
--
Regards
Marcus Carr email: mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia) email: info@allette.com.au
Level 10, 91 York Street www: http://www.allette.com.au
Sydney 2000 NSW Australia phone: +61 2 9262 4777
fax: +61 2 9262 4774
_______________________________________________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Wed Nov 26 23:20:22 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:06 2004
Subject: XML Example and DTD Archive?
Message-ID: <01bcfac1$6a9d3110$0100007f@localhost>
Fellow XML Developers,
I have searched for but could not find an extensive archive of XML examples
and DTD. If there is such an archive, please let me know. If not, I would
like to build one so we can all benefit.
Don "JStud" Park
Consultant
donpark@quake.net
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Thu Nov 27 02:15:03 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:06 2004
Subject: EMBED and validation
Message-ID:
This may be obvious, but I can't find it in the spec.
In XML-Link, does XML content that is included by EMBED in a valid document
have to go through validation like the other parts of the document? Is
EMBEDded content considered part of the document for styling purposes, grove
manipulation, etc.? This could potentially have an enormous impact on two
DTDs I'm developing. At present, the material I would like to embed will
validate anyway, but it may not always be the case in the future. Information
embedded after the document has loaded appears to create an entirely new set
of parsing and styling problems, but hopefully there's an answer already - the
tool is too good to pass up.
There's always ANY...
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cbullard at hiwaay.net Thu Nov 27 03:25:20 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun 7 16:59:06 2004
Subject: SGML and XML
References: <199711261619.DAA22835@jawa.chilli.net.au>
Message-ID: <347CE7DD.23FC@hiwaay.net>
Rick Jelliffe wrote:
>
> Please say "XML is simpler than SGML '86" and "XML is better for small
> systems than SGML '86" and "SGML has many things that are not needed"
> but not "SGML people are trying to make everything complicated and make
> XML as bad, complex, over-engineered and stinky as SGML". This demonizing
> of "SGML people" is a bad way to win people over to XML.
And as we have seen again and again, it will be the same arguments
that the next person will use on XML and Simon's book.
That's the way it's done.
len bullard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Nov 27 07:33:34 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:07 2004
Subject: XML Example and DTD Archive?
In-Reply-To: <01bcfac1$6a9d3110$0100007f@localhost>
Message-ID: <3.0.1.16.19971127010002.3f5f9ff6@pop3.demon.co.uk>
At 15:17 26/11/97 -0800, Don Park wrote:
>Fellow XML Developers,
>
>I have searched for but could not find an extensive archive of XML examples
>and DTD. If there is such an archive, please let me know. If not, I would
>like to build one so we can all benefit.
Don,
This is a most exciting offer!
You are right that there is no *extensive* archive of XML material and we
are suffering because of that lack. Certain people have contributed things
which may (or may not) be consistent with the latest draft :-) - that's one
of the problems. The places where these are reported are:
- XML-DEV , and I try to extract things like this into XML-JEWELS at
http://www.vsms.notingham.ac.uk/vsms/xml/jewels.html
- http://www.sil.org/sgml/xml.html - Robin Cover keeps an eagle eye for
anything of value.
Jon Bosak's Shakespeare, and religion are pre-eminent and are a good test
for whether a system can cope with 'real documents'. I haven't looked at
religion, but Shakespeare has a clean and natural markup without
attributes. So it's not a torture test. (I don't think there are DTDs - I
think I hacked my own). I don't think there is any mixed content in
Shakespeare
Michael Sperberg-McQueen wrote a torture-test for XML parsers early this
year. We seriously need this up-to-date - maybe Michael is reading this :-)
I have written a lot of Chemical markup language (CML) at
http://www.vsms.nottingham.ac.uk/vsms/java/jumbo
and it uses attributes heavily. However there is NO mixed content in CML,
and the output is disappointing without a chemical browser :-)
There are snippets of XML in the XSL spec, an the RDF spec and in the
MathML spec. None of these have (I think) DTDs [MathML has one in principle].
I have now tested 3.5 parsers under JUMBO and have found that there is
sufficient variation between them that we really need some test documents.
(Some of the variation is behavioural - i.e. should a browser fail if it reads
and foo.dtd doesn't exist.)
In my view, collaborative *action* is worth many kilowords of discussion,
and if you can help put together such a resource it would be extremely useful.
Best Wishes
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Nov 27 08:30:52 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:07 2004
Subject: EMBED and validation
In-Reply-To:
Message-ID: <3.0.1.16.19971127090853.3dbf5b44@pop3.demon.co.uk>
At 02:13 27/11/97 UT, Simon St.Laurent wrote:
>This may be obvious, but I can't find it in the spec.
No - it's not obvious and - yes, it isn't in the spec. Deliberately, I think.
>
>In XML-Link, does XML content that is included by EMBED in a valid document
>have to go through validation like the other parts of the document? Is
>EMBEDded content considered part of the document for styling purposes, grove
>manipulation, etc.? This could potentially have an enormous impact on two
>DTDs I'm developing. At present, the material I would like to embed will
>validate anyway, but it may not always be the case in the future.
Information
>embedded after the document has loaded appears to create an entirely new set
>of parsing and styling problems, but hopefully there's an answer already -
the
>tool is too good to pass up.
When XML-LINK came out I asked (probably to the point of boredom) what the
semantics associated with XML-LINK are. The answer (I hope I'm being fair)
is that its completely application-dependent. In particular this applies to
the word 'EMBED'. If, as I believe, the spec will stay in its very crisp
and semantic-free form, then I believe it is critical for the XML community
to get at least some communal consensus on XML-LINK semantics or I think we
shall have
serious interoperability problems. That's only *my* view - others seem
either more relaxed, or seem to think it's a totally insoluble problem.
That is why I have suggested that we use XDEV as a way of at least
identifying different approaches.
I believe that the motivation for
AUTO + EMBED was to replicate the construct in HTML
(USER+REPLACE corresponds to so long as replace is the whole
'resource' (again I have asked repeatedly for clarification as to what a
'resource' is.) A 'resource' seems to be (according to different
authorities) :
- a nodes in trees (Eliot Kimber on XML-DEV)
- the content of the linking element (e.g. the content of ...
- the whole containing element (i.e. as above but including the and
tags
- the whole 'document' in which the link occurs (this emulates in
HTML).
There seems to be no concern or urgency to clarify this further to
webhackers like me, so perhaps I am the only one who sees a problem :-)
*What* embed *does* is even less talked about and defined by the experts.
It is clearly seen as being able to support has any
semantics suggesting that the document linked to should become part of the
current document (I hope you understand what I mean :-). In this way
linked-to 'resources' could become 'included' or 'transcluded' in the
current document.
JUMBO has the capability of doing this, though I haven't switched it on
because I wanted someone other than me to come up with ideas.
Example:
Pat
--
==============================================================
bonhomme@loria.fr | Office : B.228
http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37
--------------------------------------------------------------
* Projet Aquarelle : http://aqua.inria.fr
* Serveur Silfide : http://www.loria.fr/Projet/Silfide
==============================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Nov 27 11:48:59 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:07 2004
Subject: ]]> within a CDATA marked section ?
In-Reply-To: <199711270901.KAA22662@chimay.loria.fr>
References: <199711270901.KAA22662@chimay.loria.fr>
Message-ID: <199711271148.GAA00360@unready.microstar.com>
Patrice Bonhomme writes:
> Is it possible to put the sequence ]]> within a CDATA marked section ?
No -- in XML, there is no way at all.
In full SGML, you could use RCDATA instead of CDATA:
(In the DTD)
(In the document instance)
]]>
I don't think this is a big problem, though, since CDATA marked
sections are simply a typing convenience.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jarle.stabell at dokpro.uio.no Thu Nov 27 12:17:51 1997
From: jarle.stabell at dokpro.uio.no (Jarle Stabell)
Date: Mon Jun 7 16:59:07 2004
Subject: ]]> within a CDATA marked section ?
Message-ID: <01BCFB36.A746EBE0@xyplex34.uio.no>
Pat wrote:
<<<<
Is it possible to put the sequence ]]> within a CDATA marked section ?
Exemple:
Here is the beginning of the CDATA marked section:
Here is the true end.
]]>
>>>>
[JS] I don't think so. A "workaround" is to close the first CDATA section, write the ]]> (or for compatibility it seems you have to use ]]> and then open up a new CDATA section to continue.
Example:
Here is the beginning of the CDATA marked section:
]]>
Here is the true end.
]]>
BTW: Do people think XML parsers generally will/should complain about a ]]> when it for *compatibility* should be ]]> ?
(Or do I misinterpret the draft text:
'and must for compatibility, be escaped using ">" or a character reference when it appears in the string "]]>", when that string is not marking the end of a CDATA section'
Does it mean that the user should better use ">" to be compatible with SGML, or that the XML parser should report this as an error if not escaped using ">"?)
I have some concerns related to & and < when not followed by a char which can start a name (or "nil then doSomething" as legal content is because it is better for users that & and < are consistently not allowed for anything than markup, but I'm not convinced about this.
(At least it seems trivial for parsers to check this situation)
Cheers,
Jarle
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Thu Nov 27 12:21:32 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:07 2004
Subject: MONDO Design Document v0.3.1
Message-ID:
Not to be a notification pest (or turkey), but I decided the v0.3 MONDO
document was missing some sections that were important to explaining the
MONDO ObjectBuilder. So I added them, fixed a number of other sections,
and put up a new version at:
http://www.chimu.com/projects/mondo/design/mondoDesign.pdf
The additions and changes include the following:
v0.3.1 971127 Added sections 4.2 through 4.5 (Building and Recipes),
fixed the conclusion of chapter 5 and added a
comparison table. Cleaned up chapter 10.
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Patrice.Bonhomme at loria.fr Thu Nov 27 13:11:48 1997
From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme)
Date: Mon Jun 7 16:59:07 2004
Subject: A Personnal XML release of the TEI Lite DTD
Message-ID: <199711271308.OAA23421@chimay.loria.fr>
Hi,
As i am working both with TEI and XML, i am pleased to announce the
availability of my personnal XML release of the TEI Lite DTD. The xteilite
DTD and 2 famous TEI lite encoded documents are available at the following URL:
http://www.loria.fr/~bonhomme/xml.html
It is not an official release of the TEI Lite. A lot of things remains to be
done, for example the use of the XML-LINK (XLL). And some of the problems are
still pending (inclusion / exclusion on content model).
Both of the XML documents have been tested with the MSXML parser (v. 1.6) and
the Lark parser (v. 0.92).
I am also trying to make an XML compatible version of the big TEI P3
DTD(s), but the task is much more difficult as it requires almost a complete
rewriting of the TEI DTD modules.
Of course, all feedback is very appreciated.
Pat.
--
==============================================================
bonhomme@loria.fr | Office : B.228
http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37
--------------------------------------------------------------
* Projet Aquarelle : http://aqua.inria.fr
* Serveur Silfide : http://www.loria.fr/Projet/Silfide
==============================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Thu Nov 27 14:33:33 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:07 2004
Subject: ]]> within a CDATA marked section ?
References: <01BCFB36.A746EBE0@xyplex34.uio.no>
Message-ID: <347D8586.2DB5D07C@technologist.com>
Jarle Stabell wrote:
> BTW: Do people think XML parsers generally will/should complain about a ]]> when it for *compatibility* should be ]]> ?
I think that they should. This requirement seems strange at first, but
it stops mistakes like the one you made. You can never accidently make a
CDATA marked section end be content.
> I assume the reasons for *not* allowing "if x<>nil then doSomething" as legal content is because it is better for users that & and < are consistently not allowed for anything than markup, but I'm not convinced about this.
> (At least it seems trivial for parsers to check this situation)
Parser writers are rebelling at the number of trivial things that they
must manage.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Thu Nov 27 14:37:11 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:07 2004
Subject: EMBED and validation
References:
Message-ID: <347D8643.64CFE9DD@technologist.com>
Simon St.Laurent wrote:
>
> In XML-Link, does XML content that is included by EMBED in a valid document
> have to go through validation like the other parts of the document?
No. Validation is defined for XML documents. XML Link is a completely
different spec and has no bearing on the definition of an XML document.
You seem to rather be thinking of the XML "hyperdocument" (in hytime
terms).
> Is
> EMBEDded content considered part of the document for styling purposes, grove
> manipulation, etc.?
XML has no style language yet and also has no definition of a grove. So
the answer is "nobody knows yet."
> Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
Do you really have an XML book coming out in January? What spec will it
be based upon?
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Thu Nov 27 14:52:30 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:07 2004
Subject: XML and standards (was Re: Integrity in the Hands of the Client)
References:
Message-ID: <347D89F2.362B8E61@technologist.com>
Mark Baker wrote:
>
> On Mon, 24 Nov 1997, Paul Prescod wrote:
> > > What if that troff document contained a link to an implementation of a
> > > troff formatter? What if that implementation described its interface using
> > > XML?
> >
> > What if it didn't? What if it described its interface using CORBA or
> > some proprietary language that is more powerful than CORBA? You don't
> > lose any flexibity or expressive power, you just have to write another
> > parser for CORBA or your proprietary language.
>
> My point is that if it did, then no longer are clients responsible for
> interpreting the semantics of the data - a contained/referenced
> implementation is.
Well at the hardware level, it is still the client. I think you are
distinguishing between clients being hard-wired to accept a fixed number
of notations and being extensible (e.g. through Java). That sounds
reasonable.
> In comp doc frameworks, when a new stream of data is introduced into a
> container, the framework decides the type of the data and then attempts
> to find an editor based on that type. The editor knows what to do with
> that data, and negotiates with the container for the real-estate for its
> presentation.
I think this is more tricky then it sounds, especially that bit about
"negotiating for real estate" (unless you are talking about unit
squares). But okay.
> So if a well-formed document comes streaming into our container, the
> framework would start parsing it, come across a tag called 'troff', and
> then proceed to try and discover and install a chunk of code that knows
> how to parse/render troff. Or the document could provide its own ref(s)
> (more likely for scalability purposes). Either way, it's not the
> container (the client) that's responsible for interpreting the semantics
> of the data. It's the document itself that is responsible.
You seem to be arguing in favour of self-labelling data formats, which I
agree could be quite useful. But XML doesn't give you that "for free" in
any sense. There is no standard for having XML documents, entities or
elemenets link to Java Beans or Active-X controls that can render them.
You must invent such a standard and it will be only marginally easier to
invent an XML-based one than to use OpenDoc or OLE Structured Storage
which handle this already. XML has the benefit that it has momentum
today and may "take over the universe." It has the serious downside that
it cannot (reasonably) encode binary information so .GIFs and .JPEGs
cannot be self-describing in this way (whereas they could be in OpenDoc
Bento or OLE Structured Storage).
In other words, something like Bento or OLESS is probably still needed.
We could surely find a way to recreate it with XML and (e.g.ZIP), but it
seems to me that that would be more of a political decision than a
technical one. The SGML standards family has something called "SDIF".
There is also mime/multipart, Amiga IFF and probably a hundred other
kicks at this can.
Anyhow, I think that a high priority of the XML WG/Community should be
inventing the XML equivalent of the JAR file. It is way too much of a
hassle to ship multipart documents (whether they be SGML, HTML or XML).
It needent be much harder than shipping around Word Docs (which are
really multipart documents). This XAR files should be able to label
their contents.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Thu Nov 27 15:16:24 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:07 2004
Subject: EMBED and validation
Message-ID:
Given the wide variety of possible interpretations Peter has enumerated, it
looks like I'll be taking the most conservative road possible and developing
documents and links in such a way that they will remain valid whether or not
the EMBEDded material is included as part of the document. So far, I think
I'll only need one ANY. The documents I want to link (at this point) all
share the same DTD - I hate to imagine what will happen if I need to open that
up. Still, this is a considerable improvement on the tools I've worked with
before.
This EMBED issue raises even more bizarre questions for styling -
context-dependent styling could well be forced to adjust if EMBEDded material
is considered part of the document tree. Taking this into account will be an
interesting challenge that may force me to use some old-style CLASS
attributes, but we'll see. CSS will have some problems, but they may be
surmountable. XML styling hasn't exactly happened yet, but I hope the
developers are keeping this in mind.
XDEV sounds like a much-needed idea given the latitude of interpretation
allowed to applications. It may also be needed (or need to be extended
further) given some of the switches we may need for turning on and off these
SGML features people seem to want included in their parsers. But maybe we
_can_ make everyone happy.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rrseibel at att.com Thu Nov 27 17:09:35 1997
From: rrseibel at att.com (Seibel, Robert R)
Date: Mon Jun 7 16:59:07 2004
Subject: Have we settled on XML and related mime types?
Message-ID: <11BF90556669D01195F3080009B3AC813CA9A2@nj8102po01.lz.att.com>
Team:
I've seen bits and pieces of mail regarding XML mime types.
Does anyone know of the official list of mime types for XML and
related support applications like XSL?
The ones I have seen are:
1) text/xml with .xml extension
2) application/xml with .xml extension
3) text/xsl with .xsl extension
Are these correct? Are there more?
Thanks for your help,
Bob Seibel
AT&T WorldNet
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jarle.stabell at dokpro.uio.no Thu Nov 27 18:12:38 1997
From: jarle.stabell at dokpro.uio.no (Jarle Stabell)
Date: Mon Jun 7 16:59:07 2004
Subject: Rebelling parser writers (Was: ]]> within a CDATA marked section ?)
Message-ID: <01BCFB68.64DAE950@xyplex34.uio.no>
I wrote:
<<<
> I assume the reasons for *not* allowing "if x<>nil then doSomething" as legal content is because it is better for users that & and < are consistently not allowed for anything than markup, but I'm not convinced about this.
> (At least it seems trivial for parsers to check this situation)
>>>
Paul Prescod wrote:
<<<
Parser writers are rebelling at the number of trivial things that they
must manage.
>>>
[JS] I'm actually surprised that I haven't heard much rebelling here. :-)
I think there are lots of *non-trivial* things parser writers must manage in XML, so I don't think they care much about trivial things if they actually are useful to many users.
I'm afraid of making my parser look stupid/stubborn, because that very likely means higher support costs, and also lowers the average user's impression of the quality of the product. Gurus may know why the parser complains, but perhaps not the average support personell, and certainly not the average user
My current "favourite XML annoyance" is the rules for entity expansion, which makes writing the name AT&T in an entity rocket science for the average XML user, and probably gives some implementors gray hairs.
(I understand that these rules gives maximum power, but I can hardly see the need for it. (Or is it "often" needed because one has chosen " or ' to mark the end of an entity value?))
I'll try to explain why it probably will give me some gray hairs when I'll implement it:
After attempting to process a document containing errors, I want to present to the user a list of error messages, and when the user clicks on one of these messages, I want to highlight the exact part of the document where the error occurs.
The problem with entity expansion is that the parser isn't parsing what the user literally wrote into the entity definitions, it is parsing a processed/"virtual" version, which *may* not be a real subpart of the document, so one has to map "virtual" locations/positions to physical (real document) positions, which doesn't seem trivial to me. It is also likely to give slightly confusing error messages, as it may be mentioning expanded stuff ("") which the user never wrote, the user may have written "<xxx>" etc.
This single issue is likely to give me many hours of thinking (and programming) , while allowing stuff like "x < 5" in content only takes me a single line to handle. I sometimes get the impression that XML contains many hard to implement (and understand) things (which won't be useful to anyone but the gurus), while disallowing things that are easy to implement and also useful to the average user.
Ok, enough rebelling for now... :-)
Cheers,
Jarle
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tms at ansa.co.uk Thu Nov 27 18:24:13 1997
From: tms at ansa.co.uk (Toby Speight)
Date: Mon Jun 7 16:59:07 2004
Subject: Have we settled on XML and related mime types?
In-Reply-To: "Seibel, Robert R"'s message of "Thu, 27 Nov 1997 12:07:34 -0500"
References: <11BF90556669D01195F3080009B3AC813CA9A2@nj8102po01.lz.att.com>
Message-ID:
A non-text attachment was scrubbed...
Name: not available
Type: text/plain (pgp signed)
Size: 1416 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971127/87031bad/attachment.bin
From serres-doug at usa.net Thu Nov 27 19:38:44 1997
From: serres-doug at usa.net (Doug Serres)
Date: Mon Jun 7 16:59:07 2004
Subject: Wanted: C/C++ based Validating XML Parser
Message-ID: <347DCBF5.78A31AB6@usa.net>
Hi,
I'm looking for a C/C++ based Validating XML Parser. I see references to a few
Java based ones and a TCL based one on the W3C page but none in C or C++. Any
ideas?
Thanks
--
Doug Serres
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Nov 27 19:47:56 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:08 2004
Subject: Wanted: C/C++ based Validating XML Parser
In-Reply-To: <347DCBF5.78A31AB6@usa.net>
References: <347DCBF5.78A31AB6@usa.net>
Message-ID: <199711271947.OAA05381@unready.microstar.com>
Doug Serres writes:
> I'm looking for a C/C++ based Validating XML Parser. I see
> references to a few Java based ones and a TCL based one on the W3C
> page but none in C or C++. Any ideas?
Get James Clark's SP:
http://www.jclark.com/sp/
To use the command-line version with XML, you need to use the -wxml
flag and prepend the SGML declaration included in the distribution;
i.e.
nsgmls -wxml /usr/lib/sgml/sgmldecl/xml.dcl myfile.xml
For easier use, make up a shell script or batch file.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Nov 27 21:39:40 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:08 2004
Subject: A Personal XML release of the TEI Lite DTD
In-Reply-To: <199711271308.OAA23421@chimay.loria.fr>
Message-ID: <3.0.1.16.19971127221239.3a4f6ef6@pop3.demon.co.uk>
At 14:08 27/11/97 +0100, Patrice Bonhomme wrote:
>
>Hi,
>
>As i am working both with TEI and XML, i am pleased to announce the
>availability of my personnal XML release of the TEI Lite DTD. The xteilite
>DTD and 2 famous TEI lite encoded documents are available at the following
URL:
>
This is a wonderful thing to have, thanks. I glanced at the DTD - haven't
had time to download. It's certainly an excellent thing to test all our
stuff on.
[Perhaps the official custodians of the TEI could say how they see TEI
being XMLised?]
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mrc at allette.com.au Thu Nov 27 22:37:28 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun 7 16:59:08 2004
Subject: XML Example and DTD Archive?
References: <3.0.1.16.19971127010002.3f5f9ff6@pop3.demon.co.uk>
Message-ID: <347DF5ED.151DD891@allette.com.au>
Peter Murray-Rust wrote:
> At 15:17 26/11/97 -0800, Don Park wrote:
> >I have searched for but could not find an extensive archive of XML examples
> >and DTD. If there is such an archive, please let me know. If not, I would
> >like to build one so we can all benefit.
>
> You are right that there is no *extensive* archive of XML material and we are
> suffering because of that lack.
Has anyone considered using OmniMark's 'The Compleat SGML' CD as a starting place? It
was designed as a conformance suite for SGML parsers, with over 10,000 documents in
various states of validity, size and degrees of complexity. I'm not sure of the legal
issues related to copyright - it might be worth an inquiry to OmniMark - but it was a
marketable product at a cost of about $200. With the number of XML parsers in the
pipeline, a full conformance suite might even turn a few dollars.
--
Regards
Marcus Carr email: mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia) email: info@allette.com.au
Level 10, 91 York Street www: http://www.allette.com.au
Sydney 2000 NSW Australia phone: +61 2 9262 4777
fax: +61 2 9262 4774
_______________________________________________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Thu Nov 27 23:17:17 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:08 2004
Subject: ]]> within a CDATA marked section ?
References: <01BCFB36.A746EBE0@xyplex34.uio.no>
Message-ID: <347D91A2.B0689DB9@jclark.com>
Jarle Stabell wrote:
> BTW: Do people think XML parsers generally will/should complain about a ]]> when it for *compatibility* should be ]]> ?
> (Or do I misinterpret the draft text:
>
> 'and must for compatibility, be escaped using ">" or a character reference when it appears in the string "]]>", when that string is not marking the end of a CDATA section'
>
> Does it mean that the user should better use ">" to be compatible with SGML, or that the XML parser should report this as an error if not escaped using ">"?)
A conforming XML parser *must* report this as an error. "For
compatibility" just gives the rationale for the requirement; it doesn't
lessen the requirement on parsers to report the error. The spec's
definition of "for compatibility" makes this clear:
for compatibility
A feature of XML included solely to ensure that XML remains
compatible with SGML.
Note that "for compatibility" is quite different from "for
interoperability".
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Thu Nov 27 23:36:19 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:08 2004
Subject: EMBED and validation
Message-ID: <3.0.32.19971127135240.00b7d260@pop.intergate.bc.ca>
At 02:13 AM 27/11/97 UT, Simon St.Laurent wrote:
>In XML-Link, does XML content that is included by EMBED in a valid document
>have to go through validation like the other parts of the document?
No; it's not part of the document; it's a hyperlink to something
completely different; there's no reason to expect what it points at
to be XML. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Nov 28 01:33:28 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:08 2004
Subject: Revelling parser writers (was Rebelling)
In-Reply-To: <01BCFB68.64DAE950@xyplex34.uio.no>
Message-ID: <3.0.1.16.19971128022334.36a76ba0@pop3.demon.co.uk>
JUMBO now has an interface to 3.5 parsers including Lark and NXP. This
means that the user can parse the same document with different parsers or
can (in principle) use a different parser for the initial document than for
the XML-LINKed ones (I haven't actually include a 'Change Parsers' button.
It has been 'quite easy'. Authors have generally provided a set of test
routines to be either hacked or subclassed (see Lark for examples.) I think
this is a good model for distribution, as it's a quite way to make minor
changes and get them hooked into your system. It shouldn't take more than
about 2 hours per parser - I can't spare more.
I have not done the MSXML system because I don't know if it has been
WORA'ed yet... have I missed it?
JUMBO may not be a complete test bed as it builds a tree and can then do
things from that. It may lose information (it doesn't store comments at
present). Since it was written before the WG decided on joined-up writing
for XML names, it still uppercases everything and I'm waiting for the white
smoke before I make that change. It *does* store PIs as children of the
immediately preceding non-PCDATA Element. It does not store NOTATIONs as
it has never seen one and doesn't know what to do with one when it gets it.
It is also not very good on things like IMPLIED attribute values since it
may not always have a DTD. If anyone can come up with simple rules for what
a tree should contain, that could be useful. [Not a grove at this stage, as
no one seems to write their parsers to create groves.]
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Nov 28 01:55:35 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:08 2004
Subject: Editing text
In-Reply-To: <347CE7DD.23FC@hiwaay.net>
References: <199711261619.DAA22835@jawa.chilli.net.au>
Message-ID: <3.0.1.16.19971128024541.36a7e07e@pop3.demon.co.uk>
I am writing an editor for JUMBO where I expect most of the characters like
'"<>& to have been converted into entities (e.g. &apos, etc.). [I do not
expect any raw ;
}
I assume there is no short cut...
I applaud the work of the WG on the Internationalisation and I don't want
to detract from it. What I would suggest is that because of the extremely
likelihood of error if individuals do try to hack their own isNameChar(),
and because if ever this list is revised software will be invalidated, that
the WG, or W3C or whoever, maintain an isNameChar() routine in the common
languages
(C, C++, Java) so that we know we shall all be working with the same one.
There may be other similar aspects of the spec where it is worth having a
central curated resource...
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Fri Nov 28 05:10:55 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:08 2004
Subject: EMBED and validation
Message-ID:
>No; it's not part of the document; it's a hyperlink to something
>completely different; there's no reason to expect what it points at
>to be XML. -Tim
While there is no reason to expect the target to be XML (which I strongly
approve of), I have to wonder what's supposed to happen if the target _is_
XML. If the target is another complete XML document, including a document
type declaration, then I can see the wisdom of parsing it separately and
keeping it separate. If the target is XML but not a complete document, for
instance a set of elements returned by a reference using XPointers, I'm not
sure about what the application should do.
Is the application supposed to treat this chunk as (hopefully) well-formed XML
in a separate parsing process? Would it be legitimate for an application to
fold EMBEDded chunks into the document containing the link for purposes of
styling in particular but also validation in certain circumstances? Many
situations will arise in which EMBEDded content needs to be styled, but the
chunk of XML referenced by the link contains neither document type declaration
or styling information.
My instinct is to be as conservative as possible and make sure that all XML
chunks EMBEDded by a link could be folded into the linking document without
making it invalid, but this is a more radical constraint than I expect most
developers would like. Leaving this behavior up to the application is
probably the only course available at present, but I suspect this practice may
lead to considerable chaos.
XML-Link has opened up realms of capability that go far beyond those provided
by entities and notations, and I look forward to using them.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Fri Nov 28 05:30:25 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:08 2004
Subject: Editing text
Message-ID: <199711280528.QAA04939@jawa.chilli.net.au>
> From: Peter Murray-Rust
> I assume there is no short cut...
On the contrary, there *IS* a short cut: the most obvious one!
Just treat the name as a token (i.e. terminated by whitespace or >,
or any other delimiter if you want to be careful). Any valid XML will
work with just that!
If you want to completely validate your XML, then the more sophisticated
checks are appropriate. The intent (as I see it) is to let people use
customary words in their language and script, if they want to. It is bad
practise to use crazy symbols and uncommon characters in markup, because
the purpose of markup is to reveal meaning, not hide it. The complexity
of the rules merely encodes that to give guidance in the peripheral cases.
> I applaud the work of the WG on the Internationalisation and I don't want
Yes, they have been exemplory in this, I think. They have taken the issue
very seriously, and kept their eyes on the goal. It is very easy for I18N
to bamboozle people, in that there is always a fuzzy and heaving morass of
quibbling that makes people want to give up. But in the case of XML, we
can have our cake (the fans of strict, codified naming rules can exactly
specify what is allowed) *AND* eat it (bewildered parser-writers can just
use simple tokenizing).
> to detract from it. What I would suggest is that because of the extremely
> likelihood of error if individuals do try to hack their own isNameChar(),
> and because if ever this list is revised software will be invalidated, that
> the WG, or W3C or whoever, maintain an isNameChar() routine in the common
> languages
It is possible that isNameChar() will be adequate. The issue of how complex
the naming rules should be is under last-minute finalization. The important
thing is not to bee distracted by how detailed the official list is. If
you do not have a validating XML processor (which means you in fact are
assuming that your documents are valid) then a much simpler tokenizing regime
should work fine. That was a thing explicit in the discussions for the
naming system: it must be straightforward to implement a (non-validating)
XML parser.
> (C, C++, Java) so that we know we shall all be working with the same one.
There is a draft ISO technical report on this issue, for future programming
language standards. This technical report has clearly been influenced by
XML and SGML's approaches to the problem. I know that the WG representatives
who are looking after finalizing the naming rules are looking at that
as well.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Fri Nov 28 06:54:27 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:08 2004
Subject: Editing text
Message-ID: <3.0.32.19971127225432.00bf150c@pop.intergate.bc.ca>
At 02:45 AM 28/11/97, Peter Murray-Rust wrote:
>Appendix B lists six and a half pages of potential NameChars for which
>JUMBO has to test - is this correct? If so I have code of the form:
Be warned; Appendix B will change again. Anyhow, if you really want an
isNameChar() function, I recommend something along the lines of
isNameChar(char c)
{
if (c < 128)
return BooleanArrayOfSize127WithTrueInNameCharPositions[c];
else
return DoIckyLookupInBigTableFromAppendixB(c);
}
Actually, I posted some Java code that reads the XML spec and
generates a reasonably efficient Java version of DoIckyLookup...
the Lark distribution currently has a CharClasses.java. I'll
re-test and re-generate and re-distribute after the next cut of the
spec. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Fri Nov 28 07:51:52 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:08 2004
Subject: REQ: XML Example and DTD Catalog Submissions
Message-ID: <01bcfbd2$0d6889b0$0100007f@localhost>
Fellow XML Developers,
I have put up a catalog of XML Examples and DTDs to serve as the place to
get links to samples and definition files. For now, the catalog is just a
web page divided into sections for each XML applications. It is my hope to
fill the catalog with links to most of available XML samples and DTDs out
there.
If you have XML example files or DTDs you would like to see in the catalog,
please send its URL to me. I can not use the actual files because I can not
handle the volume on my website.
The catalog is at: http://www.quake.net/~donpark/xmlcat.html
My sincere thanks in advance,
Don "JStud" Park
Java/MFC Consultant
donpark@quake.net
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at light.demon.co.uk Fri Nov 28 07:55:34 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun 7 16:59:08 2004
Subject: Rebelling parser writers (Was: ]]> within a CDATA marked section ?)
In-Reply-To: <01BCFB68.64DAE950@xyplex34.uio.no>
Message-ID:
In message <01BCFB68.64DAE950@xyplex34.uio.no>, Jarle Stabell
writes
>After attempting to process a document containing errors, I want to present to
>the user a list of error messages, and when the user clicks on one of these
>messages, I want to highlight the exact part of the document where the error
>occurs.
>The problem with entity expansion is that the parser isn't parsing what the
user
>literally wrote into the entity definitions, it is parsing a
processed/"virtual"
>version, which *may* not be a real subpart of the document, so one has to map
>"virtual" locations/positions to physical (real document) positions, which
>doesn't seem trivial to me. It is also likely to give slightly confusing error
>messages, as it may be mentioning expanded stuff ("") which the user never
>wrote, the user may have written "<xxx>" etc.
I don't think this is as much of a problem as you fear. Every entity is
physically declared somewhere in a real source - usually a good ol' file
on disc. Of course, that file may not be the one you started from ...
My RunSP program (http://www.light.demon.co.uk/runsp) does exactly what
you describe (for nsgmls). It runs it under Windows and then allows the
user to navigate from one error message to the next, in a simple editor
environment that lets them sort out the problems they find. All I did
was to parse the error messages, pick out file name, line number and
character offset, and place a bookmark at the relevant point in the file
concerned. This works equally well for errors in the DTD or SGML
Declaration as for those in what we think of as the 'real document'.
(Which is something that never occurred to me when designing RunSP - but
of course the Declaration and DTD are equally part of the document as
far as the parser is concerned.)
Richard Light.
Richard Light
SGML/XML and Museum Information Consultancy
richard@light.demon.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Nov 28 08:18:50 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:09 2004
Subject: EMBED and validation
In-Reply-To:
Message-ID: <3.0.1.16.19971128091613.34efaa0c@pop3.demon.co.uk>
At 05:09 28/11/97 UT, Simon St.Laurent wrote:
>>No; it's not part of the document; it's a hyperlink to something
>>completely different; there's no reason to expect what it points at
>>to be XML. -Tim
No - and JUMBO can eat about 17 types of non-XML files (e.g. *.txt, *.gif,
and lots of lovely chemistry). If *any of you* want to write a simple
routine for RTF, Word binary, MAC BinHex, it would be marvellous. All you
need to do is decide on the tree structure - JUMBO can then output it in
shining XML.
>
>While there is no reason to expect the target to be XML (which I strongly
>approve of), I have to wonder what's supposed to happen if the target _is_
You approve that it must/needNot be XML. For me the latter is essential.
Sometime ago I proposed an extra attribute MIME to describe the MIME type
of the target HREF. (Note that this is NOT always available from
contentType since it may be a local file. If this doesn't get into the
SPEC, I suggest we need an XDEV attribute and I proposed that 2 days ago...
>XML. If the target is another complete XML document, including a document
>type declaration, then I can see the wisdom of parsing it separately and
>keeping it separate. If the target is XML but not a complete document, for
>instance a set of elements returned by a reference using XPointers, I'm not
This is (I believe) 'application-dependent. I see the following
possibilities.
(A) Render the tree and paint the referred elements blue. JUMBO does this.
You don't get a choice of colours at present
(B) Render the event stream and paint the elements red. JUMBO cannot do
joined up writing yet, but is gradually learning how to render event
streams (it can do most of HTML 2.0)
(B) Regard this as a query (remember our discussions here?) and use the
nodes in some other way. That's why I think XLL Xpointer syntax is the
appropriate base for a query language.
>sure about what the application should do.
The more I think about this, the more I think we have to delineate the
possible actions and systematise them here. I think some people will want
to treat XML-LINK as simply like HTML, others will want automatic
inclusion. Since I am not a hypermedia expert, I am hoping to get some
guidance.
The question is ACUTATE="AUTO" SHOW="EMBED". There are several options.
A. treat it as a separate object (possibly a BLOB like a gif), work out how
big it is (pixel wise), create a pretty box and render it in there . JUMBO
started to do this, but got lost in flowObjects. Now I think it would do
better. But you need to be able to handle flowObjects in your metaphor.
B. parse it as a tree and replace the XML-LINK node. This would then look
very similar to &foo;. The advantages are that the target can use a
different DTD (although writing out the combined tree could be hairy). One
disadvantage is we need a switch to do this, which is why I proposed
XDEV:INCLUDE. A more serious disadvantage is that recursive following of
EMBED/AUTO could give rise to all sorts of fun things, like cyclic
recursion, getting into hairy areas, actuating buttons on nuclear power
stations and so on.
C. render it as a thumbnail and get the user to click it
In many ways EMBED/AUTO can do everything that &foo; does and (as far as I
can see) everything that NOTATION does. The attraction is that it can be
further customised through attributes. &foo; cannot refer to non-XML
objects, NOTATION seems to have an additional level of indirection and I
don't understand it yet, since I've never seen it used.
>Is the application supposed to treat this chunk as (hopefully) well-formed
XML
>in a separate parsing process? Would it be legitimate for an application to
As with all tricky questions on XML the answer is 'application-dependent'.
So - if we can agree some semantics here that would be very helpful.
>fold EMBEDded chunks into the document containing the link for purposes of
>styling in particular but also validation in certain circumstances? Many
Yes, if the application has been written to do so :-)
>situations will arise in which EMBEDded content needs to be styled, but the
>chunk of XML referenced by the link contains neither document type
declaration
>or styling information.
I shall make something like this available in JUMBO. All the guts are
there, it's just agreeing on the public face - i.e. whether there is an
XDEV attribute
Again it may be possible to request the application to supply styling and
DTD (e.g. through an XDEV attribute or PI). Again I'd like to see public
discussion on this.
>
>My instinct is to be as conservative as possible and make sure that all XML
>chunks EMBEDded by a link could be folded into the linking document without
>making it invalid, but this is a more radical constraint than I expect most
I think it is far better to have the semantics explicit and in the open,
rather than for different application developers to think what is best
here. This is an area where - without XDEV - we have severe problems of
interoperability. I know that a lot of people think that interoperable XML
applications is Quixotic, but *I* believe it's possible if we have the
communal will. Otherwise the average user will pick up application A and
find their HREFs folded in and swear and curse when application B doesn't.
Remember, if you don't like what I'm suggesting, you don't even have to
read it :-)
>developers would like. Leaving this behavior up to the application is
>probably the only course available at present, but I suspect this practice
may
>lead to considerable chaos.
See the idealistic ideas above :-)
>
>XML-Link has opened up realms of capability that go far beyond those
provided
>by entities and notations, and I look forward to using them.
Yup - it has revolutionised my thinking. It means I can through away 50% of
my code because there are general solutions. XML-LINK EXTENDED is even more
fun. I shall have some proposals there :-)
P.
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Nov 28 08:24:36 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:09 2004
Subject: Editing text
In-Reply-To: <199711280528.QAA04939@jawa.chilli.net.au>
Message-ID: <3.0.1.16.19971128084309.36a76b24@pop3.demon.co.uk>
At 16:27 28/11/97 +1100, Rick Jelliffe wrote:
>
>
>> From: Peter Murray-Rust
>
>> I assume there is no short cut...
>
>On the contrary, there *IS* a short cut: the most obvious one!
>
>Just treat the name as a token (i.e. terminated by whitespace or >,
>or any other delimiter if you want to be careful). Any valid XML will
>work with just that!
I think I had a brownout over this. I thought that it could be difficult to
find the balancing semicolon without scanning the NameChars. But rereading
the spec (e.g. 2.4) convinces me that there isn't a problem. [Perhaps I
thought that AT&T was now legal in PCDATA. I'm glad it isn't :-)]
But since the text is being *edited* it's probably a good thing to run
isNameChar() over new entities, tagNames, etc. JUMBO can just about think
as quickly as a human typing in.
P.
But I think we need IckyLookup as a communal resource :-)
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at light.demon.co.uk Fri Nov 28 12:29:55 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun 7 16:59:09 2004
Subject: New version of RunSP
Message-ID: <7dcYKBAiMrf0Ewm6@light.demon.co.uk>
I have just updated my RunSP program so that you can specify command-
line arguments. This means that it can now be used to run NSGMLS on XML
documents (with the -wno-valid switch introduced in version 1.2).
See http://www.light.demon.co.uk/runsp/ for details. (It may be up to a
day before the new version is made available by my ISP.)
Richard Light
SGML/XML and Museum Information Consultancy
richard@light.demon.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Fri Nov 28 12:30:32 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:09 2004
Subject: NameChar (was: Editing text)
In-Reply-To: <3.0.1.16.19971128024541.36a7e07e@pop3.demon.co.uk>
References: <199711261619.DAA22835@jawa.chilli.net.au>
<347CE7DD.23FC@hiwaay.net>
<3.0.1.16.19971128024541.36a7e07e@pop3.demon.co.uk>
Message-ID: <199711281230.HAA00341@unready.microstar.com>
Peter Murray-Rust writes:
> I am writing an editor for JUMBO where I expect most of the characters like
> '"<>& to have been converted into entities (e.g. &apos, etc.). [I do not
> expect any raw transformed by the parser. On the other hand there may be other entities
> which have not been expanded (e.g. &foo;
>
> My understanding of the spec [71] is that an entity is a Name and that Names
> [4], [5] and [6] are constructed from letters, digits and numbers. In
> determining whether something is an entity, I have to look for a string of
> the form: '&'(Letter | '_' | ':') (NameChar)* ';'
> NameChars are Digits, MiscNames and Letters.
>
> Appendix B lists six and a half pages of potential NameChars for which
> JUMBO has to test - is this correct? If so I have code of the form:
>
> public boolean isNameChar(char ch) {
> return ;
> }
>
> I assume there is no short cut...
I have not checked them for alignment, but there is a good chance that
you could use Java's built-in java.lang.Character.isLetterOrDigit()
predicate to eliminate most of it, something like this:
public boolean isNameChar (char ch) {
return java.lang.Character.isLetterOrDigit(ch) | isMiscChar(ch);
}
public boolean isMiscChar (char ch) {
switch(ch) {
case '.':
case '-':
case '_':
case ':':
return true;
default:
return isCombining(ch) || isIgnorable(ch) || isExtender(ch);
}
}
public boolean isIgnorable (char ch) {
int c = (int)ch;
return ((c >= 0x200c && c <= 0x200f) ||
(c >= 0x202a && c <= 0x202e) ||
(c >= 0x206a && c <= 0x206f));
}
public boolean isExtender (char ch) {
int c = (int)ch;
switch (c) {
case 0x00b7:
case 0x02d0:
case 0x02d1:
case 0x0387:
case 0x0640:
case 0x0e46:
case 0x0ec6:
case 0x3005:
return true;
default:
return ((c >= 0x3031 && c <= 0x3035) ||
(c >= 0x309b && c <= 0x309e) ||
(c >= 0x30fc && c <= 0x30fe));
}
}
public boolean isCombining (char ch) {
// lots of stuff
}
The only long one left is isCombining(), which I haven't bothered to
fill in. Before anyone uses these, please check them against both the
XML spec and the Java Language Spec, to see if isLetterOrDigit()
really aligns properly.
> I applaud the work of the WG on the Internationalisation and I don't want
> to detract from it. What I would suggest is that because of the extremely
> likelihood of error if individuals do try to hack their own isNameChar(),
> and because if ever this list is revised software will be invalidated, that
> the WG, or W3C or whoever, maintain an isNameChar() routine in the common
> languages
> (C, C++, Java) so that we know we shall all be working with the same one.
Not a bad idea, but it is unlikely that everyone would want to use the
same one. The fastest solution would be to maintain a static 65,536
(or at least 32,768) entry array, with bit flags for different
character properties. That would be fine for big programs, but it
would kill Java applets and other size-sensitive applications unless
it were already built-into the Java environment.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at cogsci.ed.ac.uk Fri Nov 28 14:09:00 1997
From: richard at cogsci.ed.ac.uk (Richard Tobin)
Date: Mon Jun 7 16:59:09 2004
Subject: NameChar (was: Editing text)
In-Reply-To: David Megginson's message of Fri, 28 Nov 1997 07:30:19 -0500
Message-ID: <199711281408.OAA00647@stevenson.cogsci.ed.ac.uk>
> The fastest solution would be to maintain a static 65,536
> (or at least 32,768) entry array, with bit flags for different
> character properties. That would be fine for big programs, but it
> would kill Java applets
Bear in mind that the main problem of size for Java applets is the
time taken for downloading, rather than the memory used at runtime.
So it may well be practical to store the data in a compact-but-slow
form and use that to initialise a large-but-fast lookup table.
-- Richard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Fri Nov 28 16:06:22 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:09 2004
Subject: Rebelling parser writers (Was: ]]> within a CDATA marked section ?)
References:
Message-ID: <347ED4F1.C772F211@jclark.com>
Richard Light wrote:
>
> In message <01BCFB68.64DAE950@xyplex34.uio.no>, Jarle Stabell
> writes
>
> >After attempting to process a document containing errors, I want to present to
> >the user a list of error messages, and when the user clicks on one of these
> >messages, I want to highlight the exact part of the document where the error
> >occurs.
> >The problem with entity expansion is that the parser isn't parsing what the
> user
> >literally wrote into the entity definitions, it is parsing a
> processed/"virtual"
> >version, which *may* not be a real subpart of the document, so one has to map
> >"virtual" locations/positions to physical (real document) positions, which
> >doesn't seem trivial to me. It is also likely to give slightly confusing error
> >messages, as it may be mentioning expanded stuff ("") which the user never
> >wrote, the user may have written "<xxx>" etc.
>
> I don't think this is as much of a problem as you fear. Every entity is
> physically declared somewhere in a real source - usually a good ol' file
> on disc. Of course, that file may not be the one you started from ...
>
> My RunSP program (http://www.light.demon.co.uk/runsp) does exactly what
> you describe (for nsgmls). It runs it under Windows and then allows the
> user to navigate from one error message to the next, in a simple editor
> environment that lets them sort out the problems they find. All I did
> was to parse the error messages, pick out file name, line number and
> character offset, and place a bookmark at the relevant point in the file
> concerned. This works equally well for errors in the DTD or SGML
> Declaration as for those in what we think of as the 'real document'.
> (Which is something that never occurred to me when designing RunSP - but
> of course the Declaration and DTD are equally part of the document as
> far as the parser is concerned.)
SP does exactly the sort of virtual location to physical location
mapping that Jarle was talking about. For example, given a file
test.xml:
]>
&e2;
nsgmlsu -e will report:
In entity e2 included from test.xml:6:9
nsgmlsu:test.xml:3:16:E: "ELEMENT" declaration not allowed in instance
The position it reports (column 16 in line 3 of test.xml) is the
position of "ELEMENT" in test.xml. It has kept track of the fact that
the 3rd character in the replacement text of e2 came from the 2nd
character in the replacement text of e1 and that the 1st character in
the replacement text of e1 was specified at line 3 column 16 of
test.xml. Implementing this is not trivial.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Fri Nov 28 16:06:51 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:09 2004
Subject: Editing text
References: <199711280528.QAA04939@jawa.chilli.net.au>
Message-ID: <347ECD85.49069FD@jclark.com>
Rick Jelliffe wrote:
> But in the case of XML, we
> can have our cake (the fans of strict, codified naming rules can exactly
> specify what is allowed) *AND* eat it (bewildered parser-writers can just
> use simple tokenizing).
Not if they want to be conforming. All conforming XML processors are
required to detect well-formedness errrors. If a XML document uses a
character in a name that is not allowed, the document is not well-formed
and every conforming XML parser is required to report it and is required
not to process the document.
I think it would be better if well-formedness allowed simple tokenizing
to be used, and the detailed checking of name characters was needed only
for validity, but that's not how the spec is currently.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Fri Nov 28 16:21:40 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:09 2004
Subject: NameChar (was: Editing text)
In-Reply-To: <199711281408.OAA00647@stevenson.cogsci.ed.ac.uk>
References: <199711281408.OAA00647@stevenson.cogsci.ed.ac.uk>
Message-ID: <199711281620.LAA00769@unready.microstar.com>
Richard Tobin writes:
> > The fastest solution would be to maintain a static 65,536
> > (or at least 32,768) entry array, with bit flags for different
> > character properties. That would be fine for big programs, but it
> > would kill Java applets
>
> Bear in mind that the main problem of size for Java applets is the
> time taken for downloading, rather than the memory used at runtime.
> So it may well be practical to store the data in a compact-but-slow
> form and use that to initialise a large-but-fast lookup table.
(I hear that memory _is_ a problem right now on Windows systems, since
both Netscape and (especially) MSIE 4 bloat to ridiculous sizes,
sometimes double or triple the typical 32MB of RAM on people's
systems; however, an extra 64k or so would make little difference).
The best optimisation will depend on your expected usage. If, for
example, you expect that 80% of all characters would be <=0x007f, then
Tim's approach of using a bit-array for those characters and jumping
to a hairy lookup method for the rest would make sense; if, however,
you expected that some documents might be almost entirely encoded with
characters >=0x0080 (say, in Han Chinese characters), then a 64K
lookup table would be necessary for acceptable performance. If you
were keeping only one bit for each character, then you could encode a
compact lookup table in only 4K.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Jon.Bosak at eng.Sun.COM Fri Nov 28 18:01:29 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun 7 16:59:09 2004
Subject: XML Example and DTD Archive?
In-Reply-To: <3.0.1.16.19971127010002.3f5f9ff6@pop3.demon.co.uk> (message from Peter Murray-Rust on Thu, 27 Nov 1997 01:00:02)
Message-ID: <199711281800.KAA17411@boethius.eng.sun.com>
[Peter Murray-Rust:]
| Jon Bosak's Shakespeare, and religion are pre-eminent and are a good
| test for whether a system can cope with 'real documents'. I haven't
| looked at religion, but Shakespeare has a clean and natural markup
| without attributes. So it's not a torture test. (I don't think there
| are DTDs - I think I hacked my own). I don't think there is any mixed
| content in Shakespeare
The current distributions at
http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/shakespeare.1.02.xml.zip
http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/religion.1.02.xml.zip
include the DTDs. For the curious, I append them below; they are
achingly simple. Frankly, I have lost track of whether they are
conformant with our current case rules; I think so, but I would be
grateful for any corrections from the parser writers.
Jon
========================================================================
========================================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Fri Nov 28 19:54:44 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:09 2004
Subject: Revelling parser writers (was Rebelling)
References: <3.0.1.16.19971128022334.36a76ba0@pop3.demon.co.uk>
Message-ID: <347EE127.623D7A12@technologist.com>
Peter Murray-Rust wrote:
> [Not a grove at this stage, as
> no one seems to write their parsers to create groves.]
I'm not sure what this means. Building a grove is not the job of a
parser. Typically the parser outputs the events and some other process
builds the grove from the information. The only way a parser could be
not written to create groves is if the parser did not output sufficient
information to build a grove conforming to a particular grove plan.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Fri Nov 28 20:31:41 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:09 2004
Subject: EMBED and validation
In-Reply-To:
Message-ID:
At 5:09 AM -0000 11/28/97, Simon St.Laurent wrote:
>While there is no reason to expect the target to be XML (which I strongly
>approve of), I have to wonder what's supposed to happen if the target _is_
>XML. If the target is another complete XML document, including a document
>type declaration, then I can see the wisdom of parsing it separately and
>keeping it separate. If the target is XML but not a complete document, for
>instance a set of elements returned by a reference using XPointers, I'm not
>sure about what the application should do.
It's a quotation. One thing you could do is put an embedded scrollable
window in the linking document, so that he reader sould read the entire
linked-to document in context.
Or you might want to format it inline as a "long quote" or something. or
you might want to simply note that a citation was made in the form of a
quote of a particular region of the linked-to document as part of a
citation-gathering process.
The Link records a relationship between a document and a portion of a
another document. I think the term EMBED is fr from ideal because it
encourages an operational definition that is not always appropriate
(thought it is probably the proper definition for simple browsing apps).
Asd with most generic markup, how it is to be displayed or processed is
something that information providers and users must be free to change as
supporting technology and the use of the document evolve.
>Is the application supposed to treat this chunk as (hopefully) well-formed
>XML
>in a separate parsing process?
If that makes sense.
>Would it be legitimate for an application to
>fold EMBEDded chunks into the document containing the link for purposes of
>styling in particular but also validation in certain circumstances?
Not for XML validation, ever, because XML validation is only done according
to the rules in the XML standard. Your application and DTDs might require
such an extra kind of validation, although I think that this would be a
very bad decision for a general-purpose XML processor since that
requirement will _not_ be hinired by many documents.
Obviously, inline formatting is a reason for the processing hint intended
by the word IMBED in the first place.
> Many
>situations will arise in which EMBEDded content needs to be styled, but the
>chunk of XML referenced by the link contains neither document type
>declaration
>or styling information.
To the extent this is done, such documents may be hard to process with soem
applicaitions. However, there's nothing to prevent a resonable formatting
script from being provided as part of the format specifiation for the
linking document that can properly format the EMBEDed data. In fact, that
would probably be a requirement for providing such documents to browsers.
>My instinct is to be as conservative as possible and make sure that all XML
>chunks EMBEDded by a link could be folded into the linking document without
>making it invalid, but this is a more radical constraint than I expect most
>developers would like. Leaving this behavior up to the application is
>probably the only course available at present, but I suspect this practice
>may
>lead to considerable chaos.
It will only lead to chaos if people assume that an application is
responsible for figuring out what to do in such cases.
if you are providing such documents as part of a publication process, you
are well-served by providing stylesheets that will format the link _as you
want_. if you are creating some form of repository, you need to document
the intended meaning of such links so that future creators of presentation
and interaction specifications can provide appropriate implementations for
them.
>XML-Link has opened up realms of capability that go far beyond those provided
>by entities and notations, and I look forward to using them.
Definitely. One thing that takes a while to get used to is the
"declarative" way of thinking require to make effective use of XML and
content markup generally. Then, once you've done that, you need to apply
the same abstractions to hypertext structures: a link is not just shorthand
for a particular interaction behavior, but a description of a relationship
between document portions that might be displayed, analzed, or otherwise
used in many different ways.
This was common wisdom in the hypertext community, but is having to be
rediscovered on the Web. (It's particularly ironic, since Tim Berners-Lee
understood this from the beginning, even though he didn't invent it).
-- David
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Fri Nov 28 20:31:54 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:09 2004
Subject: EMBED and validation
In-Reply-To: <3.0.1.16.19971128091613.34efaa0c@pop3.demon.co.uk>
References:
Message-ID:
At 9:16 AM -0000 11/28/97, Peter Murray-Rust wrote:
>You approve that it must/needNot be XML. For me the latter is essential.
>Sometime ago I proposed an extra attribute MIME to describe the MIME type
>of the target HREF. (Note that this is NOT always available from
>contentType since it may be a local file. If this doesn't get into the
>SPEC, I suggest we need an XDEV attribute and I proposed that 2 days ago...
That's what NOTATION is for. Use an external entity, and makes its notation
be the MIME type of the content, and then you're all set.
>The more I think about this, the more I think we have to delineate the
>possible actions and systematise them here. I think some people will want
>to treat XML-LINK as simply like HTML, others will want automatic
>inclusion. Since I am not a hypermedia expert, I am hoping to get some
>guidance.
This is all a question for the stylesheet/processing langagues, and not for
XML per se.
>>Is the application supposed to treat this chunk as (hopefully) well-formed
>XML
>>in a separate parsing process? Would it be legitimate for an application to
>
>As with all tricky questions on XML the answer is 'application-dependent'.
>So - if we can agree some semantics here that would be very helpful.
not strictly correct. the stylesheet processing is application dependent.
The validation is _not allowed_ as part of XML validation. You can of
course require that your application limit the valid XML documents that it
will process, but then you are limiting the documents that it will process,
which may not be a good idea.
>>fold EMBEDded chunks into the document containing the link for purposes of
>>styling in particular but also validation in certain circumstances? Many
>
>Yes, if the application has been written to do so :-)
Styling yes, validation no.
>I shall make something like this available in JUMBO. All the guts are
>there, it's just agreeing on the public face - i.e. whether there is an
>XDEV attribute
>
>Again it may be possible to request the application to supply styling and
>DTD (e.g. through an XDEV attribute or PI). Again I'd like to see public
>discussion on this.
For an XML document, you can refer to the whole document and use extended
pointers to pick out the linked sub-part -- this lets you get DTD, and
content-type (via NOTATION).
>>My instinct is to be as conservative as possible and make sure that all XML
>>chunks EMBEDded by a link could be folded into the linking document without
>>making it invalid, but this is a more radical constraint than I expect most
This is not consevative, but radical, sit it imposes an ad-hoc constrain on
linking, based on a limited processing model.
>I think it is far better to have the semantics explicit and in the open,
>rather than for different application developers to think what is best
>here. This is an area where - without XDEV - we have severe problems of
>interoperability. I know that a lot of people think that interoperable XML
>applications is Quixotic, but *I* believe it's possible if we have the
>communal will. Otherwise the average user will pick up application A and
>find their HREFs folded in and swear and curse when application B doesn't.
>Remember, if you don't like what I'm suggesting, you don't even have to
>read it :-)
No, interoperable in the sense you mean (interoperable without a
particualar stylesheet to interpret against) is in fact a gigantic mistake
that XML is designed to help peopole avoid. The point of XML is to separate
rendering and other processing from document representation and semantics.
This means that no viewer can process an XMl document for display wothout a
separate specification of what display is desired. It will be possible to
test application compatibility when applications take XSL document + XML
document pairs. An XML document in isolation intentionally does _not_ have
a single correct display. _Any_ display driven by a consistent
transformation from the XML source is in some sense a sensible view _for
some application._
>>developers would like. Leaving this behavior up to the application is
>>probably the only course available at present, but I suspect this practice
>may
>>lead to considerable chaos.
>
>See the idealistic ideas above :-)
Without a formatting specification you can't display an XML document
"interoperably". With such a specification, it's a relatively simple matter
of program correctness to determine whether you have or not.
>Yup - it has revolutionised my thinking. It means I can through away 50% of
>my code because there are general solutions. XML-LINK EXTENDED is even more
>fun. I shall have some proposals there :-)
I'm glad to hear this. None of the problems you are mentioning are
insignificant, but they are almost all problems for XSL, and _not_ XML
itself.
-- David
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Fri Nov 28 20:56:12 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:10 2004
Subject: Editing text
Message-ID: <199711282054.HAA27168@jawa.chilli.net.au>
> From: James Clark
> I think it would be better if well-formedness allowed simple tokenizing
> to be used, and the detailed checking of name characters was needed only
> for validity, but that's not how the spec is currently.
That sounds sensible: any chance of it James? It was discussed before,
but in the salad days of case insensitity.
There have been several proposals for what grain the naming rules should
have: opinions range from "allow nearly everything" to "the grain of Unicode
blocks" to "whatever Unicode says for identifiers" to "whatever the new
ISO report on identifiers says" to "whatever the Java function does" to
"almost nothing: just ASCII" to "lets look at each character individually
and judge".
Having quite a large grain (e.g., divide Unicode into 256 rows and disable
or allow whole rows {but with special treatment for row 0}) also gets
the SGML declaration into a less daunting size. This might be be good
enough namechecking for XML, in line with the 80% rule.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Jon.Bosak at eng.Sun.COM Fri Nov 28 21:30:59 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun 7 16:59:10 2004
Subject: XML Example and DTD Archive?
Message-ID: <199711282129.NAA17698@boethius.eng.sun.com>
Thanks to several people, especially Eve Maler, for pointing out XML
errors in my play and tstmt DTDs. In particular:
| >
|
| Illegal; must be .
|
| >
|
| Illegal; must be .
|
| >
]>
...
This is Pythagoras' theorem:
&pythagoras;
and I run it through a parser what will happen? The answer is
parser-dependent. It might:
- always include and validate external entities in which case there will
be a validation error (MathML uses a different DTD from HTML). If the
entity is valid, then it creates a 'single document' which is easy to
search, etc. One disadvantage is that (for Java) the document could get too
big for the JVM.
- offer a commandline switch that allows inclusion of external entities OR
defers their expansion to the application/processor. In that case the
*application* has to be able to able to run a parser over the 'included'
MathML.
(JUMBO can do this at present - it can even use a different parser from the
initial one, which may be useful if they have different behaviours).
P.
Note, of course, that an application may also want to run a validating
parser over the targets of HREF and JUMBO can do this as well.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Nov 29 16:05:46 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:10 2004
Subject: XML-DEV membership
In-Reply-To:
Message-ID: <3.0.1.16.19971129170008.2f57ad26@pop3.demon.co.uk>
At 06:14 29/11/97 UT, Simon St.Laurent wrote:
[...]
>nor does it sound like the readership of this list
Henry tells me that the membership is about 500-600.
It is a relatively unrewarding task managing any mailing list, and Henry
has been much encouraged by messages of thanks for this service. We are
both keen to see the scientific publishing process move beyond marks on
paper, which is not yet a universal vision.
We are making slow but steady progress in getting XML/CML known in the
molecular sciences. At least it's now known and regarded as 'yet another
file format'. So we have a little way to go yet. The likelihood that the
rest of the world will adopt it will hopefully have a modest effect as well
:-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Sat Nov 29 20:36:47 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:10 2004
Subject: No subject
Message-ID:
>I don't think I've seen it explicitly suggested here, so here goes. If you
>want to ensure that what's pointed to is real XML, and "belongs" in that
>location, how about using a plain old external text entity? With a
>validating XML processor, you can guarantee that (a) the entity will be
>expanded in place before it even gets to the application and that (b) it
>will be validated in context.
Entities are extremely useful, to be sure, but don't offer the flexibility of
XPointers by a long margin. The project I'm working on is really a
feasibility study at this point, and that flexibility is key to this
particular project. It could in fact be implemented with entities, but that
would require creating far more files that we have hoped for, as the data we
want to reference comes from different and overlapping portions of a small set
of documents. Implementing this as entities would require far more
maintenance whenever a change came down the pike.
The processing model we'd like to see for EMBED is very similar to that used
for a text entity, but it doesn't look like we'll be getting there soon.
Entities and NOTATIONs serve their purposes, but XML-Link seems far more
flexible, especially for our needs.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Sat Nov 29 22:03:09 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:10 2004
Subject: EMBED and validation
In-Reply-To:
Message-ID:
At 6:14 AM -0000 11/29/97, Simon St.Laurent wrote:
>>It's a quotation. One thing you could do is put an embedded scrollable
>>window in the linking document, so that he reader sould read the entire
>>linked-to document in context.
>
>Yes, it is effectively a quotation. Long quotations, however, often have
>more
>structure and require more formatting than a short quotation in a print
>document, and I would like somehow to preserve that structure. Providing an
>embedded scrollable window is a good idea for things you COULD do, but is not
>something that can be counted on. We don't plan to create applications
>specific to this document set at present; though we may do so in the future,
>this document set would still be likely to cross into foreign applications.
I also listed several other things you could do with the link, such as
formatting it inline. My point was that the _presentation_ of the linked
data is a matter for the application and/or stylesheet -- not the XML
document. Do stylesheets need to be able to process such stuff -- yes, that
was the point I was trying to make.
I chose the nested window as a main example because I think it's a good
idea that is _not_ supported by current software, but that could be enabled
by the this type of link. And it's a grewatr example of how you could
instantly take your old-fashoned "long quotes" and turn them into
"Browsers, Inc's Web-o-Namic Nested Dcouments (tm)" without changing your
markup at all.
>>As with most generic markup, how it is to be displayed or processed is
>>something that information providers and users must be free to change as
>>supporting technology and the use of the document evolve.
>
>While freedom to change is certainly valuable, freedom to work consistently
>with a variety of applications is of considerably more importance on this
>project. Communicating more clearly the way in which these documents should
>be treated by applications appears to be a necessity, as XML itself
>appears to
>provide no such support, nor does it sound like the readership of this list
>(with a few exceptions, of course) is particularly interested in providing
>such support at this time.
No, you need to make an appropriate decision about stylesheets if you are
providing documents. Any requirements that you have for XSL would be well
made public, as input to that standardization process, now underway. If you
will be delivering your documents before the stylesheet work is complete
you will have to work with prerelease software or roll your own, or use CSS
or compile to HTML, or something else. The lack of such presentational
details in XML itself is still a good thing. You are free to create content
today that will work with XSL -- even when XSL does not exist yet -- and
can design your own processor if you need to. On the other hand if you
invent a bunch of "conventions" that import presentation details into your
documents you will simply be doing work that you will at best have to throw
away, and that at worst may lead to bad encodings of your document
semantics and send you back to square one to re-markup your documents.
XML is the content part of the equation, and that's what it's for. XSL and
its possible competitors (there _will be competitors_, because formatting
is a place the people will want to compete) will be the way to realize the
presentations you prefer (whatever they are) using the same
technology-independent source files.
>
>>However, there's nothing to prevent a resonable formatting
>>script from being provided as part of the format specifiation for the
>>linking document that can properly format the EMBEDed data.
>
>Formatting script? I think we were hoping to use something more in the order
>of CSS or eventually XSL. While XSL will provide scripting capabilities, it
>seems like we're piling on additional complexity and new problems for
>applications. Though I haven't tried it yet, it seems like it will be an odd
>challenge to create a specification for the linking document that will
>contain
>styles for linked information that isn't included as part of the parser tree
>for the linking document, particularly if the type of information to be
>linked
>isn't known at the time the styles for the linking document are established.
>It can be done; I just don't look forward to it.
I used the term script to emphasize that any programmatic trasnformation
for viewing is usable -- some people have been confused by my use of the
term stylesheets for link-rendering, so I've tried to avoid using _only_
that term. Sorry for the confusion. CSS or XSL are _exactly_ where this
whole problem belongs.
>>if you are providing such documents as part of a publication process, you
>>are well-served by providing stylesheets that will format the link _as you
>>want_. if you are creating some form of repository, you need to document
>>the intended meaning of such links so that future creators of presentation
>>and interaction specifications can provide appropriate implementations for
>>them.
>
>What I would like to be able to do is provide stylesheets and documentation
>that can be understood by a variety of processing applications that will work
>in a consistent manner with linked material. The paragraph above describes
>quite neatly what I want; the rest of this conversation, however, has
>indicated that you can't get there from here.
No, it's indicated that without stylesheets or some other programmatic
process, you can't get there. This is not a big surprise, as that's the
point of content markup. Yes, stylesheets are going to have to handle
links. In some cases you may have to write scripts to perform interactions
you want, and embed them into stylesheets. That's just part of the work of
document delivery.
>>>My instinct is to be as conservative as possible and make sure that all XML
>>>chunks EMBEDded by a link could be folded into the linking document without
>>>making it invalid, but this is a more radical constraint than I expect most
>
>>This is not consevative, but radical, sit it imposes an ad-hoc constrain on
>>linking, based on a limited processing model.
>
>Radical? Radically practical, or so I thought. I'm hardly saying that
>developers _should_ obey this, or that application developers should
>implement
>a limited processing model. Being conservative in this instance means
>accepting a reasonably loose set of rules designed to make certain that
>documents can still be processed in a wide variety of application processing
>contexts. As I'm developing document sets here, and not applications per se,
>I'm not sure this ad-hoc constraint is anything but a simple concession to
>the
>vagaries of the standard.
If you're just talking about a rule that you want to adopt as part of your
authoring process, there's no problem. You posed a question about software,
and so I answered in terms of how the software should work. If you're a
document provider, you will need to specify how to format linked material
if you want it formatted inline. If you want it to just show up
pre-scrolled in another window, I expect that XSL will have away to do
that -- then you'll lose some context, but gain by having simpler
stylesheets. This is a tradeoff that is independent of linking strategy. If
you intend to use inline formatting, and are writing the stylesheets as
well as the documents, such a discipline may well make your documents
cleaner, and your stylesheets simpler
The question is what these presentation details have to do with XML?
>I had hoped the standard would be clearer on these issues, but the wide
>latitude given applications will have a dramatic, though not especially
>painful, impact on this document set and others I may create. Paul Prescod
>pointed out that yes, of course, applications CAN follow several of the
>models
>I proposed, but that this behavior cannot be counted upon. CAN is not good
>enough in many situations, so I'll develop the document set so that it WILL
>work regardless of the processing model applied. Seems simple enough, though
>it requires some extra effort.
Paul is right, but applications that purport to implement XSL, however,
will not have so much latitude when the are processing a document according
to a stylesheet. In fact, they will be constrained by the XSL standard in
exactly what liberties may be taken.
So I think you're worrying about a non-problem, as long as you will be
providing stylesheets for your documents.
>Sooner or later I'll write some applications, and maybe I'll be able to take
>advantage of the freedom given application developers. In the meantime, I'll
>explore the constraints set upon document developers that are imposed by that
>freedom. The discipline will probably produce better DTDs and documents
>anyway.
I don't really know what this last means, but certainly we all look forward
to imporved and richer displays once the semantics of documents and the
format for displaying them can be separated.
-- David
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Sat Nov 29 22:03:15 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:11 2004
Subject: EMBED and validation
In-Reply-To: <3.0.1.16.19971129164634.20effec6@pop3.demon.co.uk>
References: <3.0.32.19971129100857.00ab9150@village.doctools.com>
Message-ID:
At 4:46 PM -0000 11/29/97, Peter Murray-Rust wrote:
>The only area of fuzziness is what the default and optional behaviours of a
>parser (sic) are. If I write:
I don't think there is any fuzziness at all.
>
>
>]>
>
>...
>
>
This is Pythagoras' theorem:
>&pythagoras;
>
>
>
>
>and I run it through a parser what will happen? The answer is
>parser-dependent. It might:
> - always include and validate external entities in which case there
>will
>be a validation error (MathML uses a different DTD from HTML). If the
>entity is valid, then it creates a 'single document' which is easy to
>search, etc. One disadvantage is that (for Java) the document could get too
>big for the JVM.
If the MathML elements are not declared in the DTD, _no_ validating parser
can ever accept this as legal.
>
> - offer a commandline switch that allows inclusion of external
>entities OR
>defers their expansion to the application/processor. In that case the
>*application* has to be able to able to run a parser over the 'included'
>MathML.
No, external entities are parsed in place. WF-only applications might not
follow the entities (under user choice, whether interactive or
command-line), or they might folliow them and present the information.
Parsing relative to a different DTD would be unfortunate behavior, since
validation should be done according to the rules of XML.
Of course, a WF application might jsut swallow the elements and use its own
stylesheet language to format some math.
>(JUMBO can do this at present - it can even use a different parser from the
>initial one, which may be useful if they have different behaviours).
You mean if they have bugs?
>Note, of course, that an application may also want to run a validating
>parser over the targets of HREF and JUMBO can do this as well.
sure... it could, but that would be odd, since you can't include a _valid_
XML document into either a valid or a well-formed document, since the
Doctype delcaration is not legal in the isntance.
You would have to to refer to the external entity using an ENTITY
attribute, rather than expanding it via an entity reference if you want to
make valid use of this kind of processing based on entities.
-- David
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Sat Nov 29 23:06:40 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:11 2004
Subject: Revelling parser writers (was Rebelling)
In-Reply-To:
Message-ID:
Simon St.Laurent wrote:
> One key piece of the XML puzzle that has consistently driven me crazy
is the
> lack of explanation for which part of an application is supposed to handle
> which part of processing.
You may want to look at the MONDO architecture and processing model. Its
components and flows (IMO) subsume all the processing models I have seen
for SGML and XML documents. It is also unusually flexible and general.
The basic "forward" (from text to application functionality) flow is:
1. [Parser] Parse the text (say XML) and turn it into a
recipe (what objects to build and what ingredients to use)
2. [ObjectBuilder] Build the recipe and construct objects within
the ObjectBase
3. [ObjectBase & App] Interact with the resulting objects
Note that the recipe is usually virtual: The interface between (1) and
(2) could be approximated with parse event notifications. The interface
between (2) and (3) is done (usually) with Factories that know how to
build particular types of objects. As an example of an ObjectBuilder, a
GroveBuilder is a particular type of ObjectBuilder that builds a
Grove-based object model (possibly using a GroveObjectFactory).
---
At the point of (3) the application can do whatever it wants, but it is
likely to want to:
3.a. Visit the objects [traverse from one to another doing some task]
3.b. Inspect their properties
3.c. Modify the objects or ask for more sophisticated behavior
3.d. Create new objects that transform the old ones
3.e. Produce changes to the world outside of the ObjectBase. For example:
3.e.1 Present the objects to the UI
3.e.2 Write the objects to a database
3.e.3 Convert the objects to an external stream
Although not complete, the above describes common behavior that
applications are likely to want to do with information.
The high level architecture and component responsibilities can be useful
for organizing an application. There are also well known techniques and
available code for all of these pieces of functionality. MONDO itself is
supplying an architecture, frameworks, and some of the functionality
listed above. But many tools could do the same work.
======
Another good source for architecture and flow models for SGML/XML is:
Developing SGML DTDs: From Text to Model to Markup.
Eve Maler and Jeanne El Andaloussi.
Prentice Hall, Upper Saddle River, NJ, 1996.
ISBN: 0-13-309881-8
This is one of my favorite SGML books. It describes how to think about
and put together SGML processing systems from the for-everyone basics to
large-scale system issues. It also very readable.
======
For information on MONDO, see:
http://www.chimu.com/projects/mondo/
You may also be interested in:
http://www.chimu.com/publications/oopsla96tutorial23/
That tutorial slides may be a little impenetrable if you can not see the
connection to an XML/SGML flow. SGML/XML and the MONDO architecture
populate the DomainModel layer of the system. The rest of the
architecture then shows how you can present or store that DomainModel.
The architecture may look a bit like overkill but it actually reduces to
simpler models easily or parts can be fully realized without much pain
(for example, by using a good MVC UI framework).
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Sat Nov 29 23:36:23 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:11 2004
Subject: EMBED and validation
In-Reply-To:
Message-ID:
Peter Murray-Rust wrote:
> >Note, of course, that an application may also want to run a validating
> >parser over the targets of HREF and JUMBO can do this as well.
David Durant wrote:
> sure... it could, but that would be odd, since you can't include a _valid_
> XML document into either a valid or a well-formed document, since the
> Doctype delcaration is not legal in the isntance.
I think Peter Murray-Rust was suggesting that a running application may
want to subsequently read in and process another XML document based on a
reference to it (at a semantic level) in a first document. That is at a
stage after the actual "Parsing", but not very far after (in MONDO it is
called Building) because the application simply wants to see all the
information together when the stage is done.
As people have mentioned, one of the difficulties is in seperating what
SGML and XML as "parsers"/technology do from what SGML and XML as
"concepts" (all the possible applications) encompass. I think the terms
are commonly used as concepts and only rarely used to mean the precise
technology. As precise technology XML is currently just a
semi-configurable parser specification, so what ever back end you want to
place on a parser is up to you. The rest is all nebulous "spirit".
I think this is a bit difficient. Even if flexibility should be allowed,
some precise definition of the goals and meaning of XML markup would be
useful to those building applications. Having a general parser is not
that useful (parsers are pretty easy to create nowadays), but having a
general model for encoding information and interpretting the meaning of
that information (I feel) is extremely useful.
A standardized DTD plus common applications for that DTD provide an
interpretation for a particular domain (either large [TEI] or small
[HTML]). I believe MONDO provides a useful overall picture that provides
structure and meaning even before the applications have been developed.
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Nov 30 00:04:18 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:11 2004
Subject: EMBED and validation
In-Reply-To:
References: <3.0.1.16.19971129164634.20effec6@pop3.demon.co.uk>
<3.0.32.19971129100857.00ab9150@village.doctools.com>
Message-ID: <3.0.1.16.19971130004046.2237dd84@pop3.demon.co.uk>
At 17:02 29/11/97 -0500, David G. Durand wrote:
>At 4:46 PM -0000 11/29/97, Peter Murray-Rust wrote:
>>The only area of fuzziness is what the default and optional behaviours of a
>>parser (sic) are. If I write:
>
>I don't think there is any fuzziness at all.
Well, please pardon my slowness and be patient - it has taken me a long
time to get this far with SGML. The spec repeatedly uses the word 'may',
which I take to be optional behaviour (e.g. 4.3.3 'may, but need not,
include the entity's replacement text.' I expect that some parsers may
allow the user to decide, some may take unilateral action. Perhaps
'fuzziness' was the wrong word - a 'variety of options with which the user
may be confronted' might be more accurate. Other actions which a parser
'may' take could include:
- whether to read the external DTD subset
- whether to read the internal subset
- whether to validate
- whether to expand the external entities or not
Some of these may be defined clearly in the new spec, some may not. It may
be that most parsers end up with a list of commmandline options like sgmls.
>>
[...]
>>be a validation error (MathML uses a different DTD from HTML). If the
>>entity is valid, then it creates a 'single document' which is easy to
>>search, etc. One disadvantage is that (for Java) the document could get too
>>big for the JVM.
>
>If the MathML elements are not declared in the DTD, _no_ validating parser
>can ever accept this as legal.
Fair enough - what I wrote was incorrect :-) Sorry.
>
>>
>> - offer a commandline switch that allows inclusion of external
>>entities OR
>>defers their expansion to the application/processor. In that case the
>>*application* has to be able to able to run a parser over the 'included'
>>MathML.
>
>No, external entities are parsed in place. WF-only applications might not
>follow the entities (under user choice, whether interactive or
>command-line), or they might folliow them and present the information.
>Parsing relative to a different DTD would be unfortunate behavior, since
>validation should be done according to the rules of XML.
>
>Of course, a WF application might jsut swallow the elements and use its own
>stylesheet language to format some math.
Understood. Thanks.
>
>>(JUMBO can do this at present - it can even use a different parser from the
>>initial one, which may be useful if they have different behaviours).
>
>You mean if they have bugs?
No. They may deliberately have different behaviours. Some may be very good
at handling large documents, others may be validating and possibly slower.
Some may offer more information as a result of the parse.
P.
Thanks for the help - I keep learning :-)
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From gfrer at luna.nl Sun Nov 30 00:26:37 1997
From: gfrer at luna.nl (Gerard Freriks)
Date: Mon Jun 7 16:59:11 2004
Subject: EMBED and validation
In-Reply-To: <3.0.1.16.19971130004046.2237dd84@pop3.demon.co.uk>
References:
<3.0.1.16.19971129164634.20effec6@pop3.demon.co.uk>
<3.0.32.19971129100857.00ab9150@village.doctools.com>
Message-ID:
As an outsider I follow the discussions about the topic.
Within Health CAre I forsee a need to achieve the following:
- there will be one Universal DTD (or whatever)
- based on this one DTD users will select portions of it to construct messages
- these messages might contain other messages or references to it
- depending on circumstances decided upon by the user he might or might not
want to view the whole collection of data as one piece (merged) or as data
plus references
- messages will be added to a receiving master patient record and either be
shown as references or merged.
So which way you organise it, I don't mind.
And Oh Yes.
We in medicine count upon the fact that all DTD's and subDTD's will be
stored in an Internet repository.
Keep up the good work :-)
Gerard Freriks
Gerard Freriks,huisarts, MD
C. Sterrenburgstr 54
3151JG Hoek van Holland
the Netherlands Telephone: (+31) (0)174-384296/ Fax: -386249
Mobile : (+31) (0)6-54792800
ARS LONGA, VITA BREVIS
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sun Nov 30 01:19:55 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:11 2004
Subject: WORA help required
Message-ID: <3.0.32.19971129172115.00955a70@pop.intergate.bc.ca>
At 04:25 PM 29/11/97, Peter Murray-Rust wrote:
>C javac/jvc
>
> I have problems compiling Lark V097 under javac. It throws a compiler
>error trying to load Lark.class. (java.io.UTFDataFormatException). This
>suggest that Lark which may have been compiled with jvc (Tim?) will not
>load with javac. (I does *run* with java)
OK, I've figured out what's going on. Lark.java contains some
compiled data structure, stored, for compactness, as strings. Example:
static final String sOCT =
"\u003c\u0302\u3c03\u0321\u0903\u2f84\u033f\u0405\u3f07\u063f\u0807" +
"\u3e8f\u083e\u8f09\u2d0a\u0944\u0109\u5b0e\u0a2d\u0b0b\u2d0c\u0c2d" +
"\u0d0d\u3e8f\u0e43\u010f\u5b10\u105d\u1111\u5d12\u123e\u8f15\u3e8f" +
"\u1550\u1615\u531b\u155b\u2116\u5501\u1822\u1f18\u2524\u1827\u1e19" +
"\u221f\u1925\u2419\u271e\u1b59\u011d\u221f\u1d25\u241d\u271e\u1e27" +
"\u8f1f\u228f\u203e\u8f20\u5b21\u2125\u2421\u3c27\u215d\u2222\u3e8f" +
"\u225d\u2323\u3e21\u253b\u8f26\u3e8f\u2721\u2827\u3f04\u282d\u2d28" +
"\u4101\u2845\u5c28\u5b29\u2925\u2429\u492a\u2a47\u012a\u4e01\u2b5b" +
"\u212c\u5b21\u2d2d\u2e2e\u2d2f\u2f2d\u3030\u3e21\u3225\u2433\u3e21" +
"\u3425\u2434\u3e21\u3625\u2436\u2837\u3643\u0136\u453d\u3649\u4636" +
... and so on for lots more.
Possibly javac detects that one or more of the characters may not be
legal per Unicode? Or it's just tough to compile... I can repeatably force
jvc to generate bogus code by changing the *indentation* of the stuff
above :)
I stopped using javac when I got Sun's "fastjavac" that comes with jws,
it has never had this problem. I'll give it a try and report. Peter,
is the use of javac strategic to you at this time? -T.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Sun Nov 30 01:19:59 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:11 2004
Subject: WORA help required
In-Reply-To: <3.0.1.16.19971129162530.1fef159c@pop3.demon.co.uk>
Message-ID:
Peter Murray-Rust :
> I am having a number or problems turning JUMBO into a WORA-compliant animal
> (WriteOnceReadAnywhere)....
You might want to also ask these questions on advanced-java. There are a
lot of good people on that list and your questions are interesting and
applicable [the only ones "questionable" to the charter of the list have
to do with applets, but your code is for both]. I will try to answer
some of them here. All my answers will be 1.1 oriented since this is all
I have been using recently.
> A JDK1.02/JDK1.1.x
> I have refrained from converting to 1.1. since I have been told
that not
> all browsers supported it. Is this still true? Or should I convert now?
I think you will be "trapped" supporting 1.02 for a while if you want to
support as many browsers as possible. For example, I still use Netscape
3.x because it is extremely stable. By my web logs that a lot of 3.x
(and earlier) browsers still out there.
Note that at the VM level there are very, very few changes between 1.0
and 1.1. The real problem is that the class libraries have migrated and
if you migrate also you will not be backward compatible.
> B.1 Is there a function I can call to tell whether I am in an applet or
> application?
Yes (maybe) but I don't remember what it is :-( It also somewhat depends
what you want to know. You can find out about the overall environment
with System.getProperties() [if you want to find out about the host VM
which may indicate appletness] and you can find out about the ClassLoader
and SecurityManager from their respective sources. I think the only
object that really knows its an Applet is the Applet itself, so to
propogate this knowledge outward requires either a web of associations to
the Applet or some "static" information. The later can be very clean if
you simply have a registry where you can put applet information (which
obviously includes their existence). It sounds like you are doing
something like that anyway. Again, I recall there existing another
approach but don't remember. Someone on advanced-java would probably know.
> B.2 I use ancillary files located in the *.class directories (e.g
> icon.gif). A nice extension in JUMBO is a per-class schema.xml file, with
> additional class information. Since CLASSPATH may contain many components,
> how can I tell which component was used for the class I am now running, so
> I can locate these files?
Use Class#getResource(...).
If you are in a particular object:
URL url = this.getClass().getResource(relativePathName+fileName);
The relativePathName should be from the current Class (in your case it
would probably be empty).
If you are in a static method you need to explicitly specify the class
object:
URL url = ThisClass.class.getResource...
Because static methods are not connected to any object (they are
completely resolved at compile time).
> D java/jview
> There are significant differences here, especially with
filenames/URLs.
No solutions, just a couple comments. The 100% Pure tester warns about
all hardcoded '/' and '\', so these are obviously considered non-WORA.
For most of these you can use either the functionality within a class
(File concatenation) or the System.properties:
*
file.separator
File separator ("/" on Unix)
*
user.home
User home directory
*
user.dir
User's current working directory
*
java.home
Java installation directory
URL's should generally work with standard URL notation (it is up to the
implementation to work correctly).
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Nov 30 08:31:16 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:11 2004
Subject: WORA help required
In-Reply-To:
References: <3.0.1.16.19971129162530.1fef159c@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971130091051.3e6f6f2e@pop3.demon.co.uk>
At 17:19 29/11/97 -0800, Mark L. Fussell wrote:
Many thanks Mark.
>
[...]
>
>No solutions, just a couple comments. The 100% Pure tester warns about
>all hardcoded '/' and '\', so these are obviously considered non-WORA.
>For most of these you can use either the functionality within a class
>(File concatenation) or the System.properties:
> *