From tbray at textuality.com Sun Nov 1 03:21:14 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:08 2004
Subject: CDATA by any other name... (was The raw and the cooked)
Message-ID: <3.0.32.19981031192043.00b1b200@pop.intergate.bc.ca>
At 10:36 AM 10/30/98 -0500, david@megginson.com wrote:
>So, Henry's asking whether this is valid:
>
>
>
>
> ]>
>
>
>I'd like to hear Tim Bray's opinion, unless I've missed it already in
>this thread (are you reading this, Tim, or alternatively, do you have
>an e-mail filter that looks for your name?).
Yes and no, respectively. I've been lurking, hoping that someone
would post something definitive.
The more I think about it, the more I think it's valid, because white
space between child elements is OK, and the fact that the white space
is in a CDATA section doesn't mean it's not white space. Chris
Lovett argued that it would be OK if the white space were in
an entity reference, which I think is a strongly linked problem (although
I couldn't follow Chris' reasoning about why MSXML thinks this the
CDATA section is invalid). Larval agrees with me, by the way, because
the CDATA recognizer does its work first and the validator only ever sees
white space.
However, the rule that applies is section 3., validity constraint
"Element Valid", list item 2, which I quote:
2. The declaration matches children and the sequence of child elements
belongs to the language generated by the regular expression in the
content model, with optional white space (characters matching the
nonterminal S) between each pair of child elements.
Of course, the interpolation "(characters matching the nonterminal S)"
could lead a pedant to claim that "
Message-ID: <363BDAB1.26A13201@eng.sun.com>
As another data point -- Sun's validating parser accepts Henry's
original example, no problems. (And it does so very quickly,
but you knew that! ;-)
A pragmatic answer "why": it uses the data model implied by
SAX, which treats characters "quoted" by ""
like any other characters (but without using '&' and '<' as
markup delimiters).
I think that's the right model. It's clear from 2.7 that the
text inside a CDATA section is character data, not markup; the
example is clear, if the text could be misunderstood.
Since 2.4 makes clear (sentence 1!) that the _only_ two sorts
of stuff in XML are "character data" and "markup", so there's
no way I could justify treating space inside a CDATA section
differently from other characters (in terms of data model).
Hence it's not possible to distinguish whitespace characters
that are the content of a CDATA section from the same text
that's outside of a CDATA section.
- Dave
p.s. Yes, if there's confusion, the spec probably needs to
be clarified. Not a crime in any 1.0 spec.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From db at Eng.Sun.COM Sun Nov 1 04:15:17 1998
From: db at Eng.Sun.COM (David Brownell)
Date: Mon Jun 7 17:06:08 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <000001be0498$180e4100$b3e887cb@NT.JELLIFFE.COM.AU>
Message-ID: <363BDF62.3B59998D@eng.sun.com>
Rick Jelliffe wrote:
>
> marked sections actually mark up
> notations: at ISO there has been discussion of whether to allow something
> like (for example)
>
While I applaud the ongoing proliferation of real Java(tm), I admit I
don't like that either ... has
worked just as well, and does no damage to XML. (Not as pretty though!)
> This is not something that I would expect to make its way into XML (and I
> think the ISO people are now more keen to help XML/WebSGML than on tidying
> up SGML) but I think the idea that a marked section
... but XML has only "CDATA" sections. There's no such thing as
a "marked" section, and "CDATA" is specified to be character data
terminated by a "]]>" sequence. No notion of marking/labeling.
> not only alters
> delimiter recognition but also labels the data can be seen (in embryo or
> residually) in DOMs elevation of CDATAsection to node-worthiness, which has
> so perplexed Henry.
Keep in mind that DOM implementations are not required to import
an XML document using CDATASection nodes ... Sun's just uses it
to determine _how to write out_ the text, if someone adds such a
node to a DOM tree they've constructed. The "<" and "&" markup
delimiters don't get quoted like they must be for normal text,
and "]]>" gets funkified differently.
> I think the answer is clear from the spec:
> [43] content ::= (element | CharData | Reference | CDSect | PI | Comment)*
> so a CDSect is not CharData. Therefore a CDSect is only valid in mixed
> content, even though it is well-formed to have it in element content.
I can't buy that conclusion. Among other things, that production has
no constraints relating to "mixed content". Is this an argument that
cosmetic whitespace, comments, and PIs likewise must exist only inside
mixed content?
- Dave
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Sun Nov 1 06:15:14 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:06:08 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <363BDAB1.26A13201@eng.sun.com>
Message-ID: <000001be055f$2a480eb0$abe887cb@NT.JELLIFFE.COM.AU>
> From: David Brownell
> A pragmatic answer "why": it uses the data model implied by
> SAX, which treats characters "quoted" by ""
> like any other characters (but without using '&' and '<' as
> markup delimiters).
Aha! I think this is the big difference in approach. The David's are saying
that CDATAsects are tags which switch in and out an effect, while I am
saying that the CDATAsect is markup which delimits a range and labels it.
Personally, I hope CDATAsects are removed from the mooted XML profile (does
it have a code name? EZX?), I have never thought they were a particularly
good idea. But I guess they wont be, in that it would make it possible to
generate EZX documents which were not WF XML documents. [I'd guess EZX would
remove DTDs (no entities!), make UTF-8 the only charset, allow but deprecate
PIs (in particular before and after fragments or root-elements), allow but
deprecate CDATAsects, build-in the ISO public entity sets with HTMLsymbols,
and build-in namespaces. I suppose this would then become the syntax used
for HTML 5, or whatever HTML+XML is called. That would be a nice little
language.]
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From lauren at sqwest.bc.ca Sun Nov 1 16:58:56 1998
From: lauren at sqwest.bc.ca (Lauren Wood)
Date: Mon Jun 7 17:06:08 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <000001be0498$180e4100$b3e887cb@NT.JELLIFFE.COM.AU>
References:
Message-ID: <199811011654.IAA02758@sqwest.bc.ca>
On 31 Oct 98, at 17:31, Rick Jelliffe wrote:
> Henry Thompson wrote:
>
> > The DOM made a serious mistake here in my opinion: it's
> > stranded in no-person's-land between raw and cooked, without being
> > either. It's not cooked, because it gives you EntityReference and
> > CDATA nodes. It's not raw, because it DOESN'T give you character
> > entity references.
>
> CHARACTER REFERENCES
> I think Henry means "numeric character reference", and this is the heart
> of the matter. A numeric character is not an entity, any more than a
> directly-entered character is. It is just an alternative encoding of the
> character, and should be of no more interest to a general API than the
> charset encoding of the document was. (I am putting words into his mouth:
> or does Henry mean the [XMLs4.6] predefined entities?)
This is the reason that the DOM doesn't give you access through
the DOM to the numeric characters. It's perfectly acceptable for the
application to give access if it's necessary for that application, but
the DOM WG, after a *lot* of discussion, decided that the
alternative encodings of a document were not up to the DOM to
decide.
As for CDATA sections and the DOM - we decided that the DOM
could not, in and of itself, decide whether the CDATA section was
purely an escaping mechanism that the application (such as an
editor) could use or not as it chose or whether the CDATA section
had deeper significance. Making CDATA sections nodes means
that the application can choose which is true. If the CDATA section
is simply an escaping mechanism, then the data can be
transformed before being passed to the DOM, in which case the
DOM will never see a CDATA section. Should the CDATA section
have some other significance, the parser can leave it as a CDATA
section and pass it to the DOM, which will respect it.
Lauren
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Mon Nov 2 00:10:58 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:06:08 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <199811011654.IAA02758@sqwest.bc.ca>
Message-ID: <363CF738.C389AB26@technologist.com>
Lauren Wood wrote:
>
> If the CDATA section
> is simply an escaping mechanism, then the data can be
> transformed before being passed to the DOM, in which case the
> DOM will never see a CDATA section. Should the CDATA section
> have some other significance, the parser can leave it as a CDATA
> section and pass it to the DOM, which will respect it.
Is there a standard way for the DOM client software to say whether it
wants access to CDATA sections or not?
--
Paul Prescod - http://itrc.uwaterloo.ca/~papresco
"I don't want you to describe to me -- not ever -- what you were doing
to that poor boy to make him sound like that; but if you ever do it
again, please cover his mouth with your hand," Grandmother said.
-- John Irving, "A Prayer for Owen Meany"
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From david at megginson.com Mon Nov 2 00:48:58 1998
From: david at megginson.com (david@megginson.com)
Date: Mon Jun 7 17:06:08 2004
Subject: Optional Nodes in the DOM
In-Reply-To: <363CF738.C389AB26@technologist.com>
References:
<199811011654.IAA02758@sqwest.bc.ca>
<363CF738.C389AB26@technologist.com>
Message-ID: <13884.65089.320319.709774@localhost.localdomain>
Paul Prescod writes:
> Is there a standard way for the DOM client software to say whether it
> wants access to CDATA sections or not?
I don't think so -- I'd imagine that that would have to be an option
to the DOM builder, which is left unspecified. I could imagine
something like this:
public class DOMFactory
{
public final static int NONE = 0;
// Optional node types
public final static int COMMENTS = 1;
public final static int ENTITYREFS = 2;
public final static int CDATA = 3;
public final static int ALL = COMMENTS | ENTITYREFS | CDATA;
public Document createDocument (int flags)
throws DOMFactoryException, IOException; // etc.
}
and then
Document doc = factory.createDocument(DOMFactory.NONE);
I can also imagine many DOM builders that always leave these nodes
types out, since that information is not needed for most XML
applications (authoring and repository tools excepted).
All the best,
David
--
David Megginson david@megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Mon Nov 2 01:07:27 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:08 2004
Subject: Optional Nodes in the DOM
Message-ID: <3.0.32.19981101170427.00b28950@pop.intergate.bc.ca>
At 07:47 PM 11/1/98 -0500, david@megginson.com wrote:
>I can also imagine many DOM builders that always leave these nodes
>types out, since that information is not needed for most XML
>applications (authoring and repository tools excepted).
Well, authoring anyhow. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Mon Nov 2 02:17:01 1998
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 17:06:08 2004
Subject: ANN: XML-APP Mailing List
Message-ID: <007c01be0606$b0af9520$2ee044c6@arcot-main>
I would like to announce the opening of the XML-APP mailing list.
WHAT:
XML-APP is a mailing list specifically for those interested in applying the
XML technology to real world applications. It is not a place to discuss
general XML issues nor is it a place for the naive.
WHY:
I have felt that the quality of messages on the XML-DEV mailing list was too
high and too esoteric to encourage sharing of information among those who
are experienced enough to see the value of XML at first glance yet don't
give a hoot about things like architectural forms. In other words, it is
difficult for carpenters to talk shop while architects are about.
HOW:
You can subscribe by sending a blank message to:
mailto:xml-app-subscribe@sunsite.auc.dk
The mailing list itself is at:
mailto:xml-app@sunsite.auc.dk
BUT:
It is my opinion that XML-DEV is the center of all XML activities. I have
been and will continue to be an active member of the XML-DEV community.
XML-APP should be considered a subgroup of XML-DEV community and never as a
competing mailing list.
FYI:
XML-APP is being hosted by SunSITE Denmark. List owners are Don Park of
Docuverse and J?rgen Nielsen of SunSITE Denmark.
Best,
Don Park
Docuverse
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From lauren at sqwest.bc.ca Mon Nov 2 02:17:38 1998
From: lauren at sqwest.bc.ca (Lauren Wood)
Date: Mon Jun 7 17:06:08 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <363CF738.C389AB26@technologist.com>
Message-ID: <199811020212.SAA03758@sqwest.bc.ca>
On 1 Nov 98, at 18:05, Paul Prescod wrote:
> Lauren Wood wrote:
> >
> > If the CDATA section
> > is simply an escaping mechanism, then the data can be
> > transformed before being passed to the DOM, in which case the
> > DOM will never see a CDATA section. Should the CDATA section
> > have some other significance, the parser can leave it as a CDATA
> > section and pass it to the DOM, which will respect it.
>
> Is there a standard way for the DOM client software to say whether it
> wants access to CDATA sections or not?
No. If people think it would be useful, we could potentially add
some sort of "turn CDATA section into Text" method (and/or vice
versa) in Level 2. Then the DOM client could run that before
accessing the data. I'm not sure how else we could determine
access without losing information (I assume you don't mean that
information in the CDATA sections should be invisible to the client
application).
Lauren
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jborden at mediaone.net Mon Nov 2 02:41:48 1998
From: jborden at mediaone.net (Borden, Jonathan)
Date: Mon Jun 7 17:06:09 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <199811020212.SAA03758@sqwest.bc.ca>
Message-ID: <001c01be060a$35c316d0$d3228018@jabr.ne.mediaone.net>
Lauren Wood wrote:
>
> No. If people think it would be useful, we could potentially add
> some sort of "turn CDATA section into Text" method (and/or vice
> versa) in Level 2. Then the DOM client could run that before
> accessing the data. I'm not sure how else we could determine
> access without losing information (I assume you don't mean that
> information in the CDATA sections should be invisible to the client
> application).
>
Alternatively, you can expose both CDATA as an element of type CDATA and as
text within the TEXT element. This would preserve the intended behavior
w.r.t. text as well as allowing the option of iterating over CDATA elements
for interested parties.
Jonathan Borden
JABR Technology
http://jabr.ne.mediaone.net
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From lauren at sqwest.bc.ca Mon Nov 2 04:36:02 1998
From: lauren at sqwest.bc.ca (Lauren Wood)
Date: Mon Jun 7 17:06:09 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <001c01be060a$35c316d0$d3228018@jabr.ne.mediaone.net>
References: <199811020212.SAA03758@sqwest.bc.ca>
Message-ID: <199811020431.UAA03914@sqwest.bc.ca>
On 1 Nov 98, at 21:40, Borden, Jonathan wrote:
> Alternatively, you can expose both CDATA as an element of type CDATA and
> as
> text within the TEXT element. This would preserve the intended behavior
> w.r.t. text as well as allowing the option of iterating over CDATA
> elements for interested parties.
Both CDATASection and Text inherit from CharacterData. Is this
what you mean? Alternatively, you could use the flattening
properties from Node and just look for NodeValue on each. This
returns the content of each node without the need for casting.
Lauren
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rbourret at ito.tu-darmstadt.de Mon Nov 2 09:09:28 1998
From: rbourret at ito.tu-darmstadt.de (Ronald Bourret)
Date: Mon Jun 7 17:06:09 2004
Subject: XSchema 1.0 released
Message-ID: <01BE0648.3348E5E0@grappa.ito.tu-darmstadt.de>
XSchema 1.0 is now final. Thanks to everyone on XML-Dev who helped make it a reality, and especially to Simon St. Laurent for getting the ball rolling. You can find the general XSchema page at:
http://purl.oclc.org/NET/xschema
and the final spec at:
http://www.simonstl.com/xschema/spec/xscspecv5.htm
-- Ron Bourret
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From aheitor at ef.pt Mon Nov 2 14:47:31 1998
From: aheitor at ef.pt (Ana Heitor)
Date: Mon Jun 7 17:06:09 2004
Subject: xsl... tables... colspan...
Message-ID:
Hi,
Someone can tell me, how can I write rules in xsl for:
1) have two tables;
2) One table with a variable number of columns.
For example:
I want evaluate the colspan dynamicaly in function of columns number.
Thanks
AH
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From simpson at polaris.net Mon Nov 2 15:04:57 1998
From: simpson at polaris.net (John E. Simpson)
Date: Mon Jun 7 17:06:09 2004
Subject: xsl... tables... colspan...
Message-ID: <3.0.32.19981102100504.006a6a2c@polaris.net>
Ana --
At 02:43 PM 11/2/98 +0000, Ana Heitor wrote:
> [Question about XSL rules]
You might find a quicker answer to your question on the XSL mailing list.
Information on joining it, and the archives, are at:
http://www.mulberrytech.com/xsl/xsl-list
Good luck!
=============================================================
John E. Simpson | It's no disgrace t'be poor,
simpson@polaris.net | but it might as well be.
| -- "Kin" Hubbard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Mon Nov 2 15:28:47 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:09 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <199811011654.IAA02758@sqwest.bc.ca> <363CF738.C389AB26@technologist.com>
Message-ID: <363DCFB4.36A59E21@locke.ccil.org>
Paul Prescod wrote:
> Is there a standard way for the DOM client software to say whether it
> wants access to CDATA sections or not?
No. In fact, the DOM level 1 does not define any API for the
creator, only for the accessor. You can add elements and other
things, but you can't create or populate a DOM using only the
standard API.
DOM Level 1 is an ugly mess, and the only justification for it
is to keep Netscape and Microsoft from implementing even uglier
incompatible DOMs.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Mon Nov 2 15:32:00 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:09 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <001c01be060a$35c316d0$d3228018@jabr.ne.mediaone.net>
Message-ID: <363DD08C.1A11CAE8@locke.ccil.org>
Borden, Jonathan wrote:
> Alternatively, you can expose both CDATA as an element of type CDATA and as
> text within the TEXT element. This would preserve the intended behavior
> w.r.t. text as well as allowing the option of iterating over CDATA elements
> for interested parties.
Actually, the only thing you can do with a Text node that you can't
do with a CDATA node is merge it with an adjacent Text node, a very
minor capability which can easily be simulated.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From bckman at ix.netcom.com Mon Nov 2 16:10:58 1998
From: bckman at ix.netcom.com (Frank Boumphrey)
Date: Mon Jun 7 17:06:09 2004
Subject: DTD,s
Message-ID: <008401be067b$4db05b60$3bacdccf@ix.netcom.com>
Does any one know of a site where XML dtd's are available for general use?
If not
1.Would there be a need for such a site.
2.Would anyone be prepared to donate some dtd's to such a site.
I have several xml dtd's including xml dtd's for html strict and
transitional that I could make available.
regards,
Frank
Frank Boumphrey
XML and style sheet info at Http://www.hypermedic.com/style/index.htm
Author: - Professional Style Sheets for HTML and XML http://www.wrox.com
CoAuthor: Professional XML applications form Wrox Press, www.wrox.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at cogsci.ed.ac.uk Mon Nov 2 16:28:51 1998
From: richard at cogsci.ed.ac.uk (Richard Tobin)
Date: Mon Jun 7 17:06:09 2004
Subject: CDATA by any other name... (was The raw and the cooked)
Message-ID: <199811021628.QAA09805@cogsci.ed.ac.uk>
One reason to regard a CDATA section as equivalent to the characters
in it is that it is prefectly reasonable for a processor to transform
a CDATA section into plain character data with character entities
where required. A processor that outputs canonical XML will do this.
Suppose we regard as invalid in element-only content.
If the processor is non-validating, it will not check this, and
will produce valid output from invalid input.
You might for example run a document through such a processor merely
to change its character encoding. It would be unfortunate if this
process changed the document's validity.
Slightly less plausibly, a processor might decide to output all character
data as CDATA sections to avoid using character entities. This process
could make a valid document invalid.
-- Richard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jtauber at jtauber.com Mon Nov 2 16:46:47 1998
From: jtauber at jtauber.com (James Tauber)
Date: Mon Jun 7 17:06:09 2004
Subject: DTD,s
Message-ID: <006001be067f$f4ec6960$0300000a@othniel.cygnus.uwa.edu.au>
-----Original Message-----
From: Frank Boumphrey
>Does any one know of a site where XML dtd's are available for general use?
That is what schema.net is for. At present it is a catalogue but will soon
house DTDs.
(Actually it already has an increasing number of entities, thanks to Rick
Jelliffe)
It also has an SGML Open catalog that uses my delegate idea to allow
resolution of formal public identifiers. The XBEL DTD developed by the
Python XML-SIG has already made use of this.
>I have several xml dtd's including xml dtd's for html strict and
>transitional that I could make available.
Send them my way and I'll add them to schema.net.
James
--
James Tauber / jtauber@jtauber.com / www.jtauber.com
Associate Researcher, Electronic Commerce Network
Curtin University of Technology, Perth, Western Australia
Maintainer of : www.xmlinfo.com, www.xmlsoftware.com and www.schema.net
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From db at Eng.Sun.COM Mon Nov 2 17:25:53 1998
From: db at Eng.Sun.COM (David Brownell)
Date: Mon Jun 7 17:06:09 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <199811011654.IAA02758@sqwest.bc.ca> <363CF738.C389AB26@technologist.com> <363DCFB4.36A59E21@locke.ccil.org>
Message-ID: <363DE9F3.2EDF9979@eng.sun.com>
John Cowan wrote:
>
> DOM Level 1 is an ugly mess, and the only justification for it
> is to keep Netscape and Microsoft from implementing even uglier
> incompatible DOMs.
I think everyone recognizes some of the compromises that went
into DOM, and has a list of some mistakes they'd fix. But I
don't think there's a good consensus on which things are mistakes
rather than features ...
To put it differently: is there really room for another API
to represent XML structure?
I tend to think that DOM, warts and all, is "good enough" for
most purposes. And for those other purposes, I suspect that
no standard API could suit.
- Dave
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From elharo at sunsite.unc.edu Mon Nov 2 17:43:17 1998
From: elharo at sunsite.unc.edu (Elliotte Rusty Harold)
Date: Mon Jun 7 17:06:09 2004
Subject: Web pages in non-Roman scripts
In-Reply-To: >
Message-ID:
For my next book about XML I am seeking examples of Web pages in non-Roman
scripts: Cyrillic, Greek, Chinese, Japanese, etc. The purpose is to
include before and after screen shots showing them with and without the
proper fonts and encodings. If you maintain such a web site, and you're
willing to sign a permissions form dreamed up by IDG's lawyers, please
email me the URL, your snail mail address and FAX number and I'll send you
the permissions letter to sign. To thank you for your trouble, I'll also
send you a copy of my current book--XML: Extensible Markup Language--when I
get the signed permission agreement back. Finally, if your site is included
in the finished book (sites will also have to be approved by my editors)
I'll also send you a copy of the next book when it's published.
Pretty much any site in a non-Roman script will do. However, I do have a
preference for interesting pages like one discussing China's human rights
record in Chinese or the text of War and Peace in Russian, as opposed to
corporate home pages. But I'll take whatever I can get. If you're
interested, please send private email to elharo@sunsite.unc.edu. Thanks.
+-----------------------+------------------------+-------------------+
| Elliotte Rusty Harold | elharo@sunsite.unc.edu | Writer/Programmer |
+-----------------------+------------------------+-------------------+
| XML: Extensible Markup Language (IDG Books 1998) |
| http://www.amazon.com/exec/obidos/ISBN=0764531999/cafeaulaitA/ |
+----------------------------------+---------------------------------+
| Read Cafe au Lait for Java News: http://sunsite.unc.edu/javafaq/ |
| Read Cafe con Leche for XML News: http://sunsite.unc.edu/xml/ |
+----------------------------------+---------------------------------+
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eddie.sheffield at enterworks.com Mon Nov 2 19:39:47 1998
From: eddie.sheffield at enterworks.com (Eddie Sheffield)
Date: Mon Jun 7 17:06:09 2004
Subject: DTD,s
References: <006001be067f$f4ec6960$0300000a@othniel.cygnus.uwa.edu.au>
Message-ID: <363E08D0.C0390CDD@enterworks.com>
James Tauber wrote:
> -----Original Message-----
> From: Frank Boumphrey
>
> >Does any one know of a site where XML dtd's are available for general use?
>
> That is what schema.net is for. At present it is a catalogue but will soon
> house DTDs.
> (Actually it already has an increasing number of entities, thanks to Rick
> Jelliffe)
There is also the CommerceNet XML Exchange at http://www.xmlx.com that claims to
have the same purpose, but I've been checking on them for several months now and
there is absolutely no action there. Lack of promotion, I guess. But it is
organized around forums (such as Automotive, History, Genealogy, Workflow, etc.),
apparently with the notion that people would come together, post DTDs, and
actively develop them online. But except for September archives which only
contain the forum welcome messages, there is nothing.
Does anyone know of anywhere else where one can go to discuss the development of
DTDs? XML Exchange sounded perfect, but there are not really any appropriate
catagories for my ideas, and there was no response to my request to create a new
"Household" or "Consumer" forum. I have several home user ideas I'd like to see
fleshed out - recipes, grocery lists, collections (stamps, comics, etc.), TV
listings, etc. but I don't really have the experience in DTD design to tackle it.
Maybe I need to tackle David's (Megginson) book again. ;-)
BTW, did anyone ever give any references for "Z" from the recent CDATA thread? My
curiosity is perked, but I have a feeling searching for "Z" on Hotbot or Yahoo or
wherever would prove frustrating!
Thanks.
Eddie Sheffield
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From hussain at granularity.com Mon Nov 2 20:02:08 1998
From: hussain at granularity.com (G. Hussain Chinoy)
Date: Mon Jun 7 17:06:10 2004
Subject: DTD,s
In-Reply-To: <008401be067b$4db05b60$3bacdccf@ix.netcom.com>
Message-ID:
Are people aware of this site?
A repository of public sgml/xml texts
http://www.ucc.ie/cgi-bin/PUBLIC
which is referenced on...
The SGML/XML Web Page (robin cover): XML/SGML Name Registration
http://www.oasis-open.org/cover/xml.html#xmlNameRegistry
and related to..
The GCA's public identifier registration process
http://www.gca.org/publicid/
-----------------------------------------
G. Hussain Chinoy
hussain@granularity.com
Chief Information Architect, CEO
Granularity Information Architecture, Inc.
http://www.granularity.com/
On Mon, 2 Nov 1998, Frank Boumphrey wrote:
> Does any one know of a site where XML dtd's are available for general use?
>
> If not
> 1.Would there be a need for such a site.
> 2.Would anyone be prepared to donate some dtd's to such a site.
>
> I have several xml dtd's including xml dtd's for html strict and
> transitional that I could make available.
>
> regards,
> Frank
> Frank Boumphrey
>
> XML and style sheet info at Http://www.hypermedic.com/style/index.htm
> Author: - Professional Style Sheets for HTML and XML http://www.wrox.com
> CoAuthor: Professional XML applications form Wrox Press, www.wrox.com
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From DKACKMAN at agchem.com Mon Nov 2 21:11:08 1998
From: DKACKMAN at agchem.com (Don Kackman)
Date: Mon Jun 7 17:06:10 2004
Subject: Retrieving attributes from an internal entity
Message-ID: <8424EFA3C1F7D1118B7300A0C9C57C192084D4@mpl_nt9.agchem.com>
Hello,
I'm using Microsoft's XML parser that comes as part of IE 5 beta 1 as a
component of an application that will use XML as its document format.
Since IE5 is still a beta I'm having some trouble determining if certain
behaviors are bugs in the current version of their parser or correctly
reflect the W3C specification.
Namely I'm using an internal entity declaration as follows:
OM">
as part of the internal part of the DTD.
I can load the document into MSXML (thier parser) and traverse the node
tree. When I get to the node where I am refering to the &om; entity I
get OM back as the value of that node but I cannot retrieve the
targetset attribute.
It is my understanding that internal entities should be parsed in place
when they are refered to, which should mean that I can treat that node
as I would any other. This does not seem to be the case with the MS
parser.
Is this a limitation of the MS beta parser or am I misunderstanding how
entities are used in XML?
Thank you,
Don Kackman
dkackman@agchem.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Mon Nov 2 21:17:52 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:10 2004
Subject: Retrieving attributes from an internal entity
Message-ID: <3.0.32.19981102131628.00aef4a0@pop.intergate.bc.ca>
At 03:10 PM 11/2/98 -0600, Don Kackman wrote:
>OM">
>
>I can load the document into MSXML (thier parser) and traverse the node
>tree. When I get to the node where I am refering to the &om; entity I
>get OM back as the value of that node but I cannot retrieve the
>targetset attribute.
>
>Is this a limitation of the MS beta parser or am I misunderstanding how
>entities are used in XML?
You're fine, you should be able to retrieve that attribute. You
should report this back to Microsoft ASAP, I'm sure they'll fix it. -T.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From DKACKMAN at agchem.com Mon Nov 2 21:37:28 1998
From: DKACKMAN at agchem.com (Don Kackman)
Date: Mon Jun 7 17:06:10 2004
Subject: Retrieving attributes from an internal entity
Message-ID: <8424EFA3C1F7D1118B7300A0C9C57C192084D6@mpl_nt9.agchem.com>
Thanks for the quick reply Tim.
How about this one...
I'm declaring the following entity:
">
When I try to load this with MSXML I get this error:
A name was started with an invalid character.
Line 0000003: ...getset='om'>OM">
Pos 0000070: ...------------------------------------------^
If I change % to any other character it loads fine. Can I use the %
symbol in an entity declaration? I suspect it thinks I'm trying to
insert a parameter entity.
Thanks again,
Don
-----Original Message-----
From: Tim Bray [mailto:tbray@textuality.com]
Sent: Monday, November 02, 1998 3:18 PM
To: Don Kackman; 'XML Dev'
Subject: Re: Retrieving attributes from an internal entity
At 03:10 PM 11/2/98 -0600, Don Kackman wrote:
>OM">
>
>I can load the document into MSXML (thier parser) and traverse the node
>tree. When I get to the node where I am refering to the &om; entity I
>get OM back as the value of that node but I cannot retrieve the
>targetset attribute.
>
>Is this a limitation of the MS beta parser or am I misunderstanding how
>entities are used in XML?
You're fine, you should be able to retrieve that attribute. You
should report this back to Microsoft ASAP, I'm sure they'll fix it. -T.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Philippe.Le_Hegaret at sophia.inria.fr Tue Nov 3 01:23:28 1998
From: Philippe.Le_Hegaret at sophia.inria.fr (Philippe Le Hégaret)
Date: Mon Jun 7 17:06:10 2004
Subject: ANN: KOML 1.1 released
Message-ID: <363E5B04.FA5AF290@sophia.inria.fr>
KOML is an XML application to serialize Java Objects
in an XML document. This application is called KOML
for Koala Object Markup Language.
This new version includes bug fix and a minor
change in the language. It is backward compatible
with the version 1.0 .
values (except transient) have a name attribute.
(thanks to Robert Nielsen for his feedback)
Bug fix with Class objects.
Bug fix in close() methods. (thanks to Raj)
Remove File constructors. Now you have:
KOMLSerializer(Writer out, boolean buffered)
KOMLDeserializer(Reader out, boolean buffered)
A new KOML document :
Regards,
Philippe.
---------
Philippe Le Hegaret
Philippe.Le_Hegaret@sophia.inria.fr -- http://www.inria.fr/koala/plh/
KOALA/DYADE/BULL @ INRIA - Sophia Antipolis
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jtauber at jtauber.com Tue Nov 3 01:34:23 1998
From: jtauber at jtauber.com (James Tauber)
Date: Mon Jun 7 17:06:10 2004
Subject: Z references (was Re: DTD,s)
Message-ID: <00fb01be06c9$b0228cc0$0300000a@othniel.cygnus.uwa.edu.au>
-----Original Message-----
From: Eddie Sheffield
>BTW, did anyone ever give any references for "Z" from the recent CDATA
thread? My
>curiosity is perked, but I have a feeling searching for "Z" on Hotbot or
Yahoo or
>wherever would prove frustrating!
Yahoo was easy. Try:
http://dir.yahoo.com/Computers_and_Internet/Programming_Languages/Z/
which leads to an excellent site:
http://www.comlab.ox.ac.uk/archive/z.html
James
--
James Tauber / jtauber@jtauber.com / www.jtauber.com
Associate Researcher, Electronic Commerce Network
Curtin University of Technology, Perth, Western Australia
Maintainer of : www.xmlinfo.com, www.xmlsoftware.com and www.schema.net
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jieli at cs.umbc.edu Tue Nov 3 02:23:26 1998
From: jieli at cs.umbc.edu (Li Jiefeng)
Date: Mon Jun 7 17:06:10 2004
Subject: How to call JS function in .xsl file?
Message-ID:
Hello,
I am wondering how to call a JavaScript function in .xsl file.
For instance,
...
I tried
abc(x);
abc(x)
"=abc(x)"
but all failed.
Thx for your help.
Jiefeng
---------------------------------------------------------------------
Jiefeng Li, CSEE, UMBC
(410)455-2837(L), (410)455-3094(O), (410)242-9610(H)
http://www.cs.umbc.edu/~jieli
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Nov 3 08:02:23 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:06:10 2004
Subject: Web pages in non-Roman scripts
In-Reply-To:
Message-ID: <000b01be0700$7d00c730$11e887cb@NT.JELLIFFE.COM.AU>
> From: Elliotte Rusty Harold
> Pretty much any site in a non-Roman script will do. However, I do have a
> preference for interesting pages like one discussing China's human rights
> record in Chinese or the text of War and Peace in Russian, as opposed to
> corporate home pages.
Not much Chinese XML here in Taiwan yet, because of technology lag. There
are some interesting projects in the pipes though. I dont know about other
Chinese countries.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Tue Nov 3 10:39:46 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:06:10 2004
Subject: CDATA by any other name... (was The raw and the cooked)
Message-ID: <002301be0715$a205f680$7008e391@bra01wmhkay.bra01.icl.co.uk>
>I do *not* agree that XML won't come into its own until we bypass
>all the syntax and think only in terms of abstract data structures.
>Having watched this profession for 20 years ago, I have come to
>believe that a truly interoperable API is very nearly an oxymoron;
>but syntax is something we know how to interoperate with. Also I
>just don't believe that there is One True data model for XML.
I agree that defining what is and is not well-formed and valid XML ought to
be a readily achievable goal, and it is a little surprising to find an area
where the spec is ambiguous on the matter. Hence my suggestion for a formal
analysis to discover whether there are other unsuspected problems.
I also agree that defining what a conformant XML processor should do with
that XML (not to mention what it should do with erroneous XML) is
considerably harder, though I think the problem becomes tractable if the
behaviour is defined in terms of a concrete API such as SAX or DOM.
I agree with those who have pointed out that formalisms like Z are not a
good vehicle for communicating a standard to a wide audience. In my own
experience, however, the kind of thinking required to produce a formal
specification in Z is invaluable when trying to produce an unambiguous one
in clear English. I don't believe that precision and readability are
incompatible goals.
There is information about Z, by the way, on
http://www.non.com/news.answers/z-faq.html
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Tue Nov 3 10:47:23 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:06:10 2004
Subject: CDATA by any other name... (was The raw and the cooked)
Message-ID: <002901be0716$b5121a50$7008e391@bra01wmhkay.bra01.icl.co.uk>
>> marked sections actually mark up
>> notations: at ISO there has been discussion of whether to allow something
>> like (for example)
>>
>
>While I applaud the ongoing proliferation of real Java(tm), I admit I
>don't like that either ... has
>worked just as well, and does no damage to XML. (Not as pretty though!)
Neither really works well, because "]]>" can legitimately occur in a Java
program. For example, it is quite likely to occur in a Java program that
generates XML.
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Nov 3 11:52:09 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:06:10 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <002901be0716$b5121a50$7008e391@bra01wmhkay.bra01.icl.co.uk>
Message-ID: <000001be0720$918764f0$aee887cb@NT.JELLIFFE.COM.AU>
A CDATA marked section is not only a way to prevent delimiter recognition.
It is also a way to declare that the characters in that section are limited
to ones available in the direct document encoding of the originating system.
(SGML has a CDATA keyword you can use instead of content models: XML was
felt not to need it because you could use From: Michael Kay
> Sent: Tuesday, 3 November 1998 21:43
> To: xml-dev@ic.ac.uk
> Subject: Re: CDATA by any other name... (was The raw and the cooked)
>
>
> >> marked sections actually mark up
> >> notations: at ISO there has been discussion of whether to
> allow something
> >> like (for example)
> >>
> >
> >While I applaud the ongoing proliferation of real Java(tm), I admit I
> >don't like that either ... has
> >worked just as well, and does no damage to XML. (Not as pretty though!)
>
>
> Neither really works well, because "]]>" can legitimately occur in a Java
> program. For example, it is quite likely to occur in a Java program that
> generates XML.
The idea was not that JAVA would be a "CDATA marked section",
but an "RCDATA marked section", which means that special character
references and entity references would be allowed. XML does not have RCDATA
marked sections, in the interests of simplicity. So "]]>" might have been a
possibility for SGML, but it is not for XML.
Why have anything like this? The primary reason (apart from orthogonality)
to me is the contention that if you make element structure do too much, you
make the structure difficult to model with simple schema notations.
For example, think of a "wrapper" element type. (This is a pattern, by the
way.) For example, the RDF elements. Using a foreign wrapper element in a
document means that
* you will have to rewrite the content models in order to validate the
document. Or,
* you have to create a more complicated schema convention (e.g.,
** call the existing DTD an architecture and make it external, then use
the RDF DTD as the DTD of the current document and make dummy declarations
with ANY content models for all the old document or
** make up schema definition languages that rely on more than one level of
context)
But if, instead of a wrapper element, you used PIs for the wrappers, then
the content model is undisturbed, and the element structure keeps its
previous simplicity and the goals of its original authors. It would be nice
if W3C allowed this, but the less that a PI can be treated (by XLL or DOM or
SAX or whatever) as a kind of element, the less that this kind of simplicity
is possible. I have little sympathy for some of the people who say content
models are inexpressive, when they deliberately choose to ignore other the
markup options available.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From david at megginson.com Tue Nov 3 12:01:31 1998
From: david at megginson.com (david@megginson.com)
Date: Mon Jun 7 17:06:10 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <000001be0720$918764f0$aee887cb@NT.JELLIFFE.COM.AU>
References: <002901be0716$b5121a50$7008e391@bra01wmhkay.bra01.icl.co.uk>
<000001be0720$918764f0$aee887cb@NT.JELLIFFE.COM.AU>
Message-ID: <13886.61181.97569.898100@localhost.localdomain>
Rick Jelliffe writes:
> A CDATA marked section is not only a way to prevent delimiter
> recognition. It is also a way to declare that the characters in
> that section are limited to ones available in the direct document
> encoding of the originating system. (SGML has a CDATA keyword you
> can use instead of content models: XML was felt not to need it
> because you could use mind of the XML WG at that time, in that they were down-playing the
> need for schemas.) It declares "this section does not use character
> references or entities or subelements". So, conceptually, it could
> sometimes be markup, not merely delimiter recognition.
While I agree that there are always interesting new uses for markup
constructions, I think that we're straining here. My basic rule in
system design is to keep things as simple and obvious as possible; if
I wanted to signal to my application that an element contained only a
certain type of information (such as a limited character repetoire), I
would use an attribute that made that point clear, either a NOTATION
attribute or a simple CDATA attribute named something like
"character-encoding".
That said, I don't see the usefulness of limiting content to a
specific character repetoire arbitrarily; I *do* see the usefulness in
combination with an "xml:lang" or "mime-type" attribute, though. An
intelligent editor could already act on xml:lang to limit character
selection, if such a thing were desirable.
All the best,
David
--
David Megginson david@megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From north at Synopsys.COM Tue Nov 3 13:00:16 1998
From: north at Synopsys.COM (Simon North)
Date: Mon Jun 7 17:06:10 2004
Subject: XML in IE5 beta PR2
In-Reply-To: <000001be0720$918764f0$aee887cb@NT.JELLIFFE.COM.AU>
References: <002901be0716$b5121a50$7008e391@bra01wmhkay.bra01.icl.co.uk>
Message-ID: <199811031257.NAA09118@goofy.gr05.synopsys.com>
Hi Gurus,
I'm now experimenting with XML in IE5 beta preview release 2. It's
nice to be able to parse the XML on load and to actually to be able
to navigate the structure tree. It does, however, complain about not
being able to load the XSL code (though it does seem to do a good job
of supporting CSS). Has anyone got any further than this?
Thanks,
Simon North
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Tue Nov 3 13:49:22 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:06:11 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <199811011654.IAA02758@sqwest.bc.ca> <363CF738.C389AB26@technologist.com> <363DCFB4.36A59E21@locke.ccil.org> <363DE9F3.2EDF9979@eng.sun.com>
Message-ID: <363F060D.47FB0E2C@technologist.com>
David Brownell wrote:
>
> To put it differently: is there really room for another API
> to represent XML structure?
>
> I tend to think that DOM, warts and all, is "good enough" for
> most purposes. And for those other purposes, I suspect that
> no standard API could suit.
I find it odd that we can have "standard APIs" for the full complexity of
relational data, and probably eventually for object database data, but it
is perceived to be impossible to do the same for the parse tree of XML
data. I mean it is just annotated tree structures: it shouldn't be rocket
science (but neither is it trivial).
No, we don't have such a thing yet, because it is not easy to develop and
nobody is willing to stop and think things through. Over time,
organizations like TechnoTeacher and ISOGEN *are* thinking it through. I
don't claim we've got the problem solved, but our direction is already
much more scalable, generalized and rigorous than what we are seeing in
the DOM realm.
Our approach is, we think, the same as the one taken by the relational
database people: first think of a model that supports the range of
applications that we want to support (including editing applications,
repositories, simple read-only processors) and data types that we want to
support (documents, DTDs, schemas, "link maps", vector and bitmap
graphics,... all media). Having defined the model, we need a way to
customize it for a particular application: a schema, just as they have
schemas in the relational and object database worlds. Our schemas are
property sets (the schema language needs to be stronger, if it is to
support read-write applications...we know that part needs work).
Then we develop an API to encapsulate the model. We are working on that
API right now.
Anyone who wants to follow our thinking can start with the tutorial on
groves at http://www.prescod.net/groves/shorttut As you can see from the
tutorial, the model is simpler than the relational model and yet seems
more or less complete (I know of one suggestion for enhancement). As I
said before, the schema language and the APIs are the parts that must
change now.
If there is a reason that this generalized approach *must* fail and cannot
be the basis of a variety of applications, then I would like to hear about
it sooner than later, so I invite comments from skeptics.
Paul Prescod - http://itrc.uwaterloo.ca/~papresco
"I always wanted to be somebody, but I should have been more
specific." --Lily Tomlin
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at cogsci.ed.ac.uk Tue Nov 3 14:16:01 1998
From: richard at cogsci.ed.ac.uk (Richard Tobin)
Date: Mon Jun 7 17:06:11 2004
Subject: Retrieving attributes from an internal entity
In-Reply-To: Don Kackman's message of Mon, 2 Nov 1998 15:37:17 -0600
Message-ID: <199811031414.OAA12936@cogsci.ed.ac.uk>
> ">
This is indeed being (correctly) interpreted as a malformed parameter
entity. Use a character entity to refer to the percent character:
">
-- Richard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From david at megginson.com Tue Nov 3 14:31:00 1998
From: david at megginson.com (david@megginson.com)
Date: Mon Jun 7 17:06:11 2004
Subject: Standard XML APIs (was Re: CDATA by any other name...)
In-Reply-To: <363F060D.47FB0E2C@technologist.com>
References:
<199811011654.IAA02758@sqwest.bc.ca>
<363CF738.C389AB26@technologist.com>
<363DCFB4.36A59E21@locke.ccil.org>
<363DE9F3.2EDF9979@eng.sun.com>
<363F060D.47FB0E2C@technologist.com>
Message-ID: <13887.4266.208216.446955@localhost.localdomain>
Paul Prescod writes:
> I find it odd that we can have "standard APIs" for the full
> complexity of relational data, and probably eventually for object
> database data, but it is perceived to be impossible to do the same
> for the parse tree of XML data. I mean it is just annotated tree
> structures: it shouldn't be rocket science (but neither is it
> trivial).
Let's divide the use of XML into two fairly arbitrary groups
(acknowledging that there's considerable overlap):
1. Documents
2. Data
Group #1 (documents) is characterised by long sequences of mixed
content inside block-level containers (often paragraphs, but possibly
subtasks or steps in technical documentation); group #2 (data) is
characterised by fairly rigid hierarchies with plain character data
inside named fields (often, but not always, short) appearing in
predictable orders.
A standard XML-oriented API like the DOM is entirely suitable for
group #1, but the DOM is probably overkill for group #2, which
requires a domain-specific API (of course, for small or
non-speed-critical applications, the domain-specific API could be
implemented as an adapter on top of the DOM).
That said, you still need some kind of API to get at the XML to
populate the domain-specific model. Sometimes, the DOM will be
appropriate, but given that many models in group #2 tend to be simple
and need to be processed on busy servers, a light-weight, event-based
API like XML::Parser or SAX usually makes the most sense for that
group.
Ideally, most programmers using XML for group #2 will never see an
XML-specific API -- we should hide it.
All the best,
David
--
David Megginson david@megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Tue Nov 3 15:23:50 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:11 2004
Subject: Retrieving attributes from an internal entity
Message-ID: <3.0.32.19981103071922.00ae7820@pop.intergate.bc.ca>
At 02:14 PM 11/3/98 GMT, Richard Tobin wrote:
>> ">
>
>This is indeed being (correctly) interpreted as a malformed parameter
>entity. Use a character entity to refer to the percent character:
Oops. Oh dear, Richard is right. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 16:18:02 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:11 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <002901be0716$b5121a50$7008e391@bra01wmhkay.bra01.icl.co.uk>
Message-ID: <363F2CDA.3BF7BB3D@locke.ccil.org>
Michael Kay wrote:
> Neither really works well, because "]]>" can legitimately occur in a Java
> program. For example, it is quite likely to occur in a Java program that
> generates XML.
If "]]>" is needed as a string literal or part of one, it can easily be
replaced by "]]\76" or "]]\u003E".
If it appears in program text, then "]] >" will be a sufficient
replacement.
Similar workarounds are used to avoid ETAGOs ("") in the SCRIPT and
STYLE elements of HTML 4.0; they are necessarily language-specific.
For example, Javascript uses "<\/".
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 16:38:38 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:11 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <000001be0720$918764f0$aee887cb@NT.JELLIFFE.COM.AU>
Message-ID: <363F31B6.59F3FFC7@locke.ccil.org>
Rick Jelliffe wrote:
> A CDATA marked section is not only a way to prevent delimiter recognition.
> It is also a way to declare that the characters in that section are limited
> to ones available in the direct document encoding of the originating system.
True. However, since the standard encodings of XML include all the
characters there are (and if they don't include yours, just you
wait, 'Enry 'Iggins), that isn't as much of an issue.
> (SGML has a CDATA keyword you can use instead of content models: XML was
> felt not to need it because you could use shows the mind of the XML WG at that time, in that they were down-playing
> the need for schemas.)
CDATA elements are eeeeeevil. They terminate at any ETAGO followed by
a name-start character, and they make it impossible to change your
mind later, if you decide you need an entity or two. See the excellent
articles at. They were rightly discarded from XML.
> For example, I cannot see why a smart editor could not use the CDATA section
> to cofine editing to whatever the repertoire of the character set of the
> encoding attribute of the XML header says.
IMHO, a *smart* editor would realize that a CDATA section cannot cope,
and would terminate it around the problem character. For example,
an attempt to insert a dagger (U+2020) into a CDATA section within
an 8859-1 document would produce this:
... ]]>† In the case of editing the XML
> specification, for example, when there is a CDATA marked section being
> edited, and the editor types "<", a smart section should know not to replace
> it with "<" or expect it to be a STAGO.
XED indeed has this property, although it just feeps if you attempt
to type a character that would cause "]]>" to appear in a CDATA
section, rather than splitting the section (which admittedly would
be painful to undo after a Backspace character).
> It would be nice
> if W3C allowed this, but the less that a PI can be treated (by XLL or DOM or
> SAX or whatever) as a kind of element,
The current XPointer draft allows PIs to be referred to on equal terms with
elements (except for not having a GI or attributes or sub-elements).
The DOM has a ProcessingInstruction node, though pseudo-attribute parsing
is not performed.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From lauren at sqwest.bc.ca Tue Nov 3 16:50:11 1998
From: lauren at sqwest.bc.ca (Lauren Wood)
Date: Mon Jun 7 17:06:11 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <363F31B6.59F3FFC7@locke.ccil.org>
Message-ID: <199811031644.IAA14775@sqwest.bc.ca>
On 3 Nov 98, at 11:39, John Cowan wrote:
> The current XPointer draft allows PIs to be referred to on equal terms
> with elements (except for not having a GI or attributes or sub-elements).
>
> The DOM has a ProcessingInstruction node, though pseudo-attribute parsing
> is not performed.
This would be a possibility, but this reading of the content of a PI
isn't in the XML spec, so the DOM WG didn't want to add
semantics that weren't in the spec. So we stuck to the simple
target+data approach, at least for Level 1.
Lauren
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at cogsci.ed.ac.uk Tue Nov 3 17:07:51 1998
From: richard at cogsci.ed.ac.uk (Richard Tobin)
Date: Mon Jun 7 17:06:11 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: Rick Jelliffe's message of Tue, 3 Nov 1998 22:53:27 +1100
Message-ID: <199811031707.RAA19890@cogsci.ed.ac.uk>
> (SGML has a CDATA keyword you can use instead of content models: XML was
> felt not to need it because you could use shows the mind of the XML WG at that time, in that they were down-playing
> the need for schemas.)
Surely the unanswerable argument against CDATA elements in XML was
they prevent you from parsing a document without the DTD. Just like
optional start/end tags, and unmarked empty elements.
-- Richard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 17:10:47 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:11 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <000001be0720$918764f0$aee887cb@NT.JELLIFFE.COM.AU> <363F31B6.59F3FFC7@locke.ccil.org>
Message-ID: <363F38FE.5893FDE9@locke.ccil.org>
Blunderingly I wrote:
> See the excellent
> articles at.
That should have been:
"at http://www.oasis-open.org/cover/topics.html#CDATA ."
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 17:12:40 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:11 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <199811031644.IAA14775@sqwest.bc.ca>
Message-ID: <363F3939.84DD643@locke.ccil.org>
Lauren Wood replied to me:
> > The DOM has a ProcessingInstruction node, though pseudo-attribute parsing
> > is not performed.
>
> This would be a possibility, but this reading of the content of a PI
> isn't in the XML spec, so the DOM WG didn't want to add
> semantics that weren't in the spec. So we stuck to the simple
> target+data approach, at least for Level 1.
In the words of Hyman Kaplan: "I described. I did not condemn."
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Nov 3 17:24:03 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:06:11 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <199811031707.RAA19890@cogsci.ed.ac.uk>
Message-ID: <000001be074e$e2ec19c0$dae887cb@NT.JELLIFFE.COM.AU>
> From: Richard Tobin [mailto:richard@cogsci.ed.ac.uk]
> Surely the unanswerable argument against CDATA elements in XML was
> they prevent you from parsing a document without the DTD. Just like
> optional start/end tags, and unmarked empty elements.
A good reason, but you could always say that "every CDATA element must have
an attribute xml:content-mode='CDATA'". So not unanswerable (though neither
nor the attribute commend themselves). And not unthinkable, as
xml:lang and xml:space prove.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From greynolds at datalogics.com Tue Nov 3 17:32:13 1998
From: greynolds at datalogics.com (Reynolds, Gregg)
Date: Mon Jun 7 17:06:11 2004
Subject: Z references (was Re: DTD,s)
Message-ID: <51ED3F5356D8D011A0B1006097C30734014E5B3D@martinique>
Since it's kind of hard to find at the site mentioned below, here is a
link to some drafts of the ISO Z standard:
http://www.cs.york.ac.uk/~ian/zstan/
The Z Reference Manual, by J.M. Spivey, is frequently referred to as the
de facto standard; lucky for us, it has gone out of print and Mr. (Ms?)
Spivey has been kind enough to make it available on the net at:
http://spivey.oriel.ox.ac.uk/~mike/zrm/
I've found "The Way of Z" very useful:
http://www.radonc.washington.edu/prostaff/jon/z-book/
-----Original Message-----
From: James Tauber [mailto:jtauber@jtauber.com]
Sent: Monday, November 02, 1998 7:31 PM
To: xml mailing list
Subject: Z references (was Re: DTD,s)
-----Original Message-----
From: Eddie Sheffield
>BTW, did anyone ever give any references for "Z" from the recent CDATA
thread? My
>curiosity is perked, but I have a feeling searching for "Z" on Hotbot
or
Yahoo or
>wherever would prove frustrating!
Yahoo was easy. Try:
http://dir.yahoo.com/Computers_and_Internet/Programming_Languages/Z/
which leads to an excellent site:
http://www.comlab.ox.ac.uk/archive/z.html
James
--
James Tauber / jtauber@jtauber.com / www.jtauber.com
Associate Researcher, Electronic Commerce Network
Curtin University of Technology, Perth, Western Australia
Maintainer of : www.xmlinfo.com, www.xmlsoftware.com and www.schema.net
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Nov 3 17:40:38 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:06:11 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <363F31B6.59F3FFC7@locke.ccil.org>
Message-ID: <000101be0751$1ee370c0$dae887cb@NT.JELLIFFE.COM.AU>
> From: John Cowan
> True. However, since the standard encodings of XML include all the
> characters there are (and if they don't include yours, just you
> wait, 'Enry 'Iggins), that isn't as much of an issue.
(An optimistic view of ISO10646: there are dozens of new Han ideographs
created every day, apart from other scripts.)
The situation I am thinking of is, for example, where I am creating an XML
document which will be used, after processing by a non-XML Macintosh
application that only understands MacRoman. The CDATA marked section is the
only constraining/signalling mechanism in XML which could be applied, and it
goes without saying that it is a pretty poor one, but I don't want to say it
is useless.
If the consensus of developers is that they dont want to allow marked
sections to be used in this way, I hope that the schema people will look at
a solution for constraining strings to use certain repertoires of
characters. I believe the Balise parser and SGML/XML processing system has a
"sanity checking" option for names in markup for this kind of
repertoire-limitation purpose.
> The current XPointer draft allows PIs to be referred to on equal
> terms with elements (except for not having a GI or attributes or
> sub-elements).
> The DOM has a ProcessingInstruction node, though pseudo-attribute
> parsing is not performed.
Which is my point. By not even providing some minimal kind of token-locating
within a processing instruction for people to use if they need it, PIs are
barely useful as far as I can see. People will always try to use what is
provided, rather than extend an API, so it is almost the kiss of death to
PIs except for the more sophisticated applications. When schemas come along
with better lexing of attribute values and PCDATA, I wonder if they will
also bother to allow scanning of the PI into tokens too.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 17:47:46 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:11 2004
Subject: XML APIs
References: <3.0.5.32.19981102121408.00c96c30@pophost.arbortext.com> <363E08B6.2EB84756@locke.ccil.org> <363E2F22.4EE275D9@locke.ccil.org>
Message-ID: <363F41DE.A57015FE@locke.ccil.org>
Stephen R. Savitzky wrote:
> I do not understand your point. They certainly hold as far as I can tell;
Provided the tree doesn't get modified while traversing it.
> Piling additional complication into the specification in order to ensure
> that every node in the tree will continue to be visited no matter what gets
> done between calls to "toNext", which I believe is what the last spec that
> included iterators attempted to do, is WRONG, because it makes the simple
> implementation impossible and because it becomes too complicated for a
> programmer looking at the spec to guess how it's going to behave.
Well, I disagree with you. If iterators are to be useful, they must
be robust against changes to the structure being iterated over, or
at the very least they must warn that the iterator is no longer valid,
like the new Java 1.2 enumerators.
> The API is designed to have an obvious model that looks
> like a parse tree. Any programmer, looking at that API, will ``see'' the
> parse tree in her mind's eye and be able to make intuitive and accurate
> predictions about how it will behave.
Indeed. But soon after learning about live node lists, this model
will have to be changed or her programs will be dreadfully erroneous.
> They will then discover that, in
> the details of the specification, the intuitive view of the DOM as the API
> for tree-structured documents is WRONG, and that a great deal of non-obvious
> machinery has to be added in order to make it work.
You betcha.
> I'm going to go a little further, and define ``natural model.'' The natural
> model of an interface is a class in which all attributes are represented by
> instance variables, and no other instance variables are present.
The trouble with such a "natural model" is that it's dead. It works
perfectly for values (which have no state, i.e. are immutable),
and for "dart boards" that react to whatever's posted to (thrown at?)
them, but not for anything with any liveness. A robot modeled by such a
"natural model" would be more like a Barbie doll: poseable, but unable
to move by itself.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 17:50:27 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:11 2004
Subject: Walking the DOM (was: XML APIs)
References: <3.0.5.32.19981102121408.00c96c30@pophost.arbortext.com> <363E08B6.2EB84756@locke.ccil.org> <363E2F22.4EE275D9@locke.ccil.org>
Message-ID: <363F4252.4F27F6EB@locke.ccil.org>
Stephen R. Savitzky wrote:
> [T]he classic algorithm for traversing a tree is:
>
> traverse(node) {
> visit(node);
> if (node.firstChild != null) traverse(node.firstChild);
> if (node.nextSibling != null) traverse(node.nextSibling);
> }
The trouble with that algorithm is that it is recursive. It will
blow up if the tree is sufficiently deep. Indeed, in
languages that cannot be relied on to do tail recursion, like
Java, it will blow up if the tree is merely sufficiently wide.
Furthermore, if there is any end-of-node processing to do, such as
emitting an end tag indication, then the algorithm is no longer
even partly tail recursive and will blow up on both depth and
width even in safe-tail-recursion languages.
The algorithm I use in DOMParser, therefore, is non-recursive:
traverse(Node node) {
Node currentNode = node;
while (currentNode != null) {
visit(currentNode);
// Move down to first child
Node nextNode = currentNode.getFirstChild();
if (nextNode != null) {
currentNode = nextNode;
continue;
}
// No child nodes, so walk tree
while (currentNode != null) {
revisit(currentNode) // do end-of-node processing, if any
// Move to sibling if possible.
nextNode = currentNode.getNextSibling();
if (nextNode != null) {
currentNode = nextNode;
break;
}
// Move up
if (currentNode = node)
currentNode = null;
else
currentNode = currentNode.getParentNode();
}
}
}
Because of the reliability of this algorithm vis-a-vis the recursive
one, I believe it should be the standard way of walking DOM trees,
and therefore it is essential that DOM implementations make the
structural access methods fast.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Nov 3 17:54:58 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:06:11 2004
Subject: Specifying virtual fonts in XML for handling variant characters
In-Reply-To: <000001be074e$e2ec19c0$dae887cb@NT.JELLIFFE.COM.AU>
Message-ID: <000201be0752$f62b2d10$dae887cb@NT.JELLIFFE.COM.AU>
Has anyone come up with a solution for specifying virtual ("synthetic")
fonts in XML?
I need more than just saying "Latin block uses font x, greek block uses font
y", I need to be able to say "This character should use font x, that
character should use font y".
Has anyone come up with a standard way to markup which characters in the
private-use block are being used. If the Maths people are using parts of the
block, it probably would be a good idea to have some system whereby when our
documents are merged your private-use area does not overlay my private use
area.
Does anyone know what the current status of webfonts is, and what the
relation to Netscape's "Dynamic Fonts"?
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 18:07:28 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:12 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <002901be0716$b5121a50$7008e391@bra01wmhkay.bra01.icl.co.uk>
<000001be0720$918764f0$aee887cb@NT.JELLIFFE.COM.AU> <13886.61181.97569.898100@localhost.localdomain>
Message-ID: <363F467F.3D832CF0@locke.ccil.org>
David Megginson wrote:
> I *do* see the usefulness in
> combination with an "xml:lang" or "mime-type" attribute, though. An
> intelligent editor could already act on xml:lang to limit character
> selection, if such a thing were desirable.
Such an editor would have to be a durn sight more intelligent than
anything now available, because the repertoire of a language is
a sticky wicket. In the domain of "xml:lang='en-US'", am I to be
forbidden to write "na?ve" or "co?perate"? How about "r?sum?" or
"Qu?b?c"?
Harald Alvestrand worked for some years trying to nail down the
repertoires (r?pertoires?) of various European languages. His
latest (1995) draft at http://www.alvestrand.no/ietf/lang-chars.txt
warns how incomplete the results still are.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jtauber at jtauber.com Tue Nov 3 18:10:31 1998
From: jtauber at jtauber.com (James Tauber)
Date: Mon Jun 7 17:06:12 2004
Subject: Specifying virtual fonts in XML for handling variant characters
Message-ID: <005601be0754$94abffe0$0300000a@othniel.cygnus.uwa.edu.au>
-----Original Message-----
From: Rick Jelliffe
>Has anyone come up with a solution for specifying virtual ("synthetic")
>fonts in XML?
>
>I need more than just saying "Latin block uses font x, greek block uses
font
>y", I need to be able to say "This character should use font x, that
>character should use font y".
I actually need this for FOP[1]. FOP is taking in Unicode but outputting PDF
using Type 1 fonts with AdobeStandardEncoding.
James
[1] http://www.jtauber.com/fop/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Tue Nov 3 18:16:56 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:06:12 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <000001be0720$918764f0$aee887cb@NT.JELLIFFE.COM.AU>
Message-ID: <363F0962.C0F19885@technologist.com>
Rick Jelliffe wrote:
>
> For example, I cannot see why a smart editor could not use the CDATA section
> to cofine editing to whatever the repertoire of the character set of the
> encoding attribute of the XML header says.
Because it would be redundant. If the XML header says what characters are
available then the editor can directly enforce that constraint.
Overloading markup in this way is, in my opinion, a bad idea. It can do
nothing but bring harm in the long term because no two applications will
agree on the overloaded semantics and thus no two applications will treat
the data in the same way.
> The idea was not that JAVA would be a "CDATA marked section",
> but an "RCDATA marked section", which means that special character
> references and entity references would be allowed.
This would eliminate the most interesting thing about CDATA sections:
character suppression. Making ") and only looks for something like ]JAVA]> to end the section.
Paul Prescod - http://itrc.uwaterloo.ca/~papresco
"I always wanted to be somebody, but I should have been more
specific." --Lily Tomlin
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 18:20:44 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:12 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <000101be0751$1ee370c0$dae887cb@NT.JELLIFFE.COM.AU>
Message-ID: <363F4951.E05D1C5D@locke.ccil.org>
Rick Jelliffe wrote:
> (An optimistic view of ISO10646: there are dozens of new Han ideographs
> created every day, apart from other scripts.)
True but irrelevant, since no specifiable character set can hold these.
> I hope that the schema people will look at
> a solution for constraining strings to use certain repertoires of
> characters.
And I hope that they allow no such thing, except perhaps as a fall-out
from some regex or other local syntax mechanism.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 18:24:39 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:12 2004
Subject: Apologies for misdirected mail
Message-ID: <363F49FD.9285BD6D@locke.ccil.org>
The two messages "Re: XML APIs" and "Walking the DOM" from me
should have gone to the DOM mailing list (where they now have been
sent) rather than to XML-DEV.
My apologies to those who do not care about them DOM, and especially
to those who will now see the messages twice.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Tue Nov 3 18:29:15 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:12 2004
Subject: Walking the DOM (was: XML APIs)
Message-ID: <3.0.32.19981103102617.00afddb0@pop.intergate.bc.ca>
At 12:50 PM 11/3/98 -0500, John Cowan wrote:
>Stephen R. Savitzky wrote:
>
>> [T]he classic algorithm for traversing a tree is:
>> traverse(node) {
...
>> }
>
>The trouble with that algorithm is that it is recursive. It will
>blow up if the tree is sufficiently deep. Indeed, in
>languages that cannot be relied on to do tail recursion, like
>Java, it will blow up if the tree is merely sufficiently wide.
Wouldn't the effects of recursion will be lost in the static,
compared to the effects of loading the doc into memory to facilitate
tree processing? Even if you are doing some persistent-ancillary-
info trick to do a virtual tree, in my experience for very large
docs you really have to wrangle memory carefully. It seems
really counter-intuitive that the stack & local variables overhead
caused by recursion is going to get you before one of these
other things. Unless of course you recurse in some huge
sloppy badly-written routine with lots of local junk.
BTW, what languages can be relied on to do tail recursion?
Also, shorter algorithms are better. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 18:32:43 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:12 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <000001be0720$918764f0$aee887cb@NT.JELLIFFE.COM.AU> <363F0962.C0F19885@technologist.com>
Message-ID: <363F4BD1.16B2DC78@locke.ccil.org>
Paul Prescod replied to Rick Jelliffe:
> > The idea was not that JAVA would be a "CDATA marked section",
> > but an "RCDATA marked section", which means that special character
> > references and entity references would be allowed.
I missed this before. Java doesn't need character references, for
which it has its own syntax, and entity references would IMHO cause
more confusion then they are worth, since & and < have well-known
Java semantics utterly distinct from their SGML (reference) semantics.
> This would eliminate the most interesting thing about CDATA sections:
> character suppression. Making direction. Rather it should be a CDATA-on-steriods that even ignores CDEnd
> ("]]>") and only looks for something like ]JAVA]> to end the section.
Doesn't help, because the recursive problem remains vivid when Java
programs generate SGML. Keep CDEnd and use "]]\76" inside Java
strings, "]] >" in ordinary Java source.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From drabin at Adobe.COM Tue Nov 3 19:07:48 1998
From: drabin at Adobe.COM (Dan Rabin)
Date: Mon Jun 7 17:06:12 2004
Subject: Walking the DOM (was: XML APIs)
In-Reply-To: <3.0.32.19981103102617.00afddb0@pop.intergate.bc.ca>
Message-ID: <3.0.5.32.19981103110605.00f35670@mail-345>
At 10:27 AM 11/3/98 -0800, Tim Bray wrote:
>BTW, what languages can be relied on to do tail recursion?
Scheme can be so relied on (for sure), and Standard ML too (I think).
-- Dan Rabin
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Nov 3 19:10:27 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:06:12 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <363F4951.E05D1C5D@locke.ccil.org>
Message-ID: <000301be075d$a1376e30$dae887cb@NT.JELLIFFE.COM.AU>
> From: John Cowan
> Rick Jelliffe wrote:
>
> > (An optimistic view of ISO10646: there are dozens of new Han ideographs
> > created every day, apart from other scripts.)
>
> True but irrelevant, since no specifiable character set can hold these.
Not so. The additions are use composed of standard radicals and
combinations. There are various projects around (such as C.C.Hsieh in
Taiwan) to figure out encodings to "spell" Han ideographs by component
radicals. This would allow any number of characters and even variant forms.
But this is not in ISO 10646 yet.
I guess the point is that John thinks that if an XML system can produce
characters which a recipient system cannot process, because it does not use
ISO 10646, that is not something that CDATA sections should be used to
address. I think his reasons are that he cannot see it in the spec. Dave M
thinks that xml:lang is appropriate. My point about CDATA elements was that
there is no standard mechanism to lock CDATA marked sections. I think a lot
of people now think that any non-ISO10646 system is for losers anyway
(except for whatever character set they use, probably).
> .. the repertoire of a language is
> a sticky wicket. In the domain of "xml:lang='en-US'", am I to be
> forbidden to write "na?ve" or "co?perate"? How about "r?sum?" or
> "Qu?b?c"?
The primary purpose of xml:lang, as far as I am concerned, should be to
convey the information lost by ISO 10646 unification: where the Japanese and
Chinese glyphs (or Polish and Russian) for a unified character differ, then
I think transcoding and unifying the characters into ISO 10646 can lose
information unless the xml:lang attribute is set. After that, xml:lang can
be used to label text for the purposes of variant character selection, and
after that for marking up the natural language.
But I am not trying to fix the repertoire of a language (TEI WSD can declare
it, though). I am just thinking about how to constrain XML documents so that
they will not contain characters which will break non-ISO10646 target
systems.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 19:13:37 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:12 2004
Subject: Walking the DOM (was: XML APIs)
References: <3.0.32.19981103102617.00afddb0@pop.intergate.bc.ca>
Message-ID: <363F55AB.AEA4EB19@locke.ccil.org>
Tim Bray wrote:
> Wouldn't the effects of recursion will be lost in the static,
> compared to the effects of loading the doc into memory to facilitate
> tree processing?
That produces slow processing, not a hard failure (unless indeed there
is simply too much document for even virtual memory). Java, and
all other HLLs I know of, provide no way to recover from
stack overflow, short of starting the app all over again with
a command-line switch for a bigger stack.
A general-purpose routine ought not to generate a preventable
hard failure no matter what the document looks like, IMHO.
> BTW, what languages can be relied on to do tail recursion?
Scheme and ML and their descendants. The Scheme version of
Stephen's algorithm will detect the tail recursion, and will
be recursive down the tree and iterative across it.
Indeed, Scheme *has* no (primitive) way to do iteration except
with tail recursion (there are macros that syntactically sugar
this, if you want). As a result, Scheme compilers can concentrate
on making the very few constructs they have to understand
(function call, function closure, assignment, IF) very very
efficient.
> Also, shorter algorithms are better. -Tim
But constant-space algorithms are better too.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Tue Nov 3 20:40:43 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:12 2004
Subject: Unicode, xml:lang, and variant glyphs
References: <000301be075d$a1376e30$dae887cb@NT.JELLIFFE.COM.AU>
Message-ID: <363F657F.5D2E1B43@locke.ccil.org>
Rick Jelliffe wrote:
> Not so. The additions are use composed of standard radicals and
> combinations. There are various projects around (such as C.C.Hsieh in
> Taiwan) to figure out encodings to "spell" Han ideographs by component
> radicals.
I'm glad to hear about this; I find the IRG archives utterly
impenetrable.
> I guess the point is that John thinks that if an XML system can produce
> characters which a recipient system cannot process, because it does not use
> ISO 10646, that is not something that CDATA sections should be used to
> address. I think his reasons are that he cannot see it in the spec. [...]
> I think a lot
> of people now think that any non-ISO10646 system is for losers anyway
> (except for whatever character set they use, probably).
Well, actually I would say the latter rationale has more effect on me
than the former, if I must choose either. It just seemed to me that
using CDATA sections to constrain the behavior of editors was not
particularly user-friendly; if the user wants a character, let her
have it, using a character reference if possible.
In general, transcoding XML documents involves inserting NCRs as needed,
unless the target is UTF-8 or UTF-16.
> The primary purpose of xml:lang, as far as I am concerned, should be to
> convey the information lost by ISO 10646 unification: where the Japanese and
> Chinese glyphs
Actually, the problem isn't that clearcut. As John Jenkins posted
to the Unicode list last year:
# FACT. It is true that some Unihan characters are typically written
# differently within the Japanese, Taiwanese, Korean, and Mainland Chinese
# typographic traditions.
#
# FACT. These differences of writing style are within the general range of
# allowable differences within each typographic tradition.
#
# E.g., the official "Taiwanese" glyph for U+8349 ("grass") per ISO/IEC
# 10646 uses four strokes for the "grass" radical, whereas the PRC,
# Japanese, and Korean glyphs use three. As it happens, Apple's LiSung
# Light font for Big Five (which follows the "Taiwanese" typographic
# tradition) uses three strokes.
#
# (This is easily confirmed by accessing
# http://www.unicode.org/unihan/unihan.acgi$8349.)
#
# FACT. Japanese users prefer to see Japanese text written with "Japanese"
# glyphs.
#
# FACT. It is also acceptable to Japanese users to see Chinese text
# written with "Japanese" glyphs.
#
# E.g., I just borrowed from Lee Collins a standard Japanese dictionary
# which quotes Chinese authors (e.g., Mencius) to show how a character is
# used. When doing so, they use "Japanese" glyphs, not Chinese ones.
#
# In particular, it is acceptable within Japanese typography for a small
# stretch of Chinese quoted in a predominantly Japanese text to be written
# with "Japanese" glyphs.
#
# FACT. Han unification allows for the possibility that a Japanese user
# might be required to use a Chinese font to display some Japanese text
# (e.g., if it uses a rare kanji).
#
# FACT. Ditto for JIS or an ISO 2022-based solution.
#
# FACT. Unicode doesn't include all the characters in actual use in Japan
# today, particularly for personal names.
#
# FACT. Neither does JIS or an ISO 2022-based solution. There are vendor
# sets which include many of these characters, and Unicode is working with
# the IRG and East Asian national bodies to add them.
> (or Polish and Russian)
How's that again?
Polish uses Latin, Russian uses Cyrillic! What could possibly
count as a unification between these two?? *Nobody* thinks that
LATIN LETTER A and CYRILLIC LETTER A should be unified....
> for a unified character differ, then
> I think transcoding and unifying the characters into ISO 10646 can lose
> information unless the xml:lang attribute is set.
It doesn't lose information about meaning. It may make characters
harder to read, but the distinction is one of typographic tradition,
not language, and can cross languages.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jcw at equi4.com Tue Nov 3 21:40:30 1998
From: jcw at equi4.com (Jean-Claude Wippler)
Date: Mon Jun 7 17:06:12 2004
Subject: [Fwd: Walking the DOM (was: XML APIs)]
References: <363F54B8.6268F15@totten.com>
Message-ID: <363F7816.DF478026@equi4.com>
John Cowan wrote:
> Stephen R. Savitzky wrote:
>
> > [T]he classic algorithm for traversing a tree is:
> >
> > traverse(node) {
> > visit(node);
> > if (node.firstChild != null) traverse(node.firstChild);
> > if (node.nextSibling != null) traverse(node.nextSibling);
> > }
>
> The trouble with that algorithm is that it is recursive. It will
> blow up if the tree is sufficiently deep. Indeed, in
> languages that cannot be relied on to do tail recursion, like
> Java, it will blow up if the tree is merely sufficiently wide.
>
> Furthermore, if there is any end-of-node processing to do, such as
> emitting an end tag indication, then the algorithm is no longer
> even partly tail recursive and will blow up on both depth and
> width even in safe-tail-recursion languages.
>
> The algorithm I use in DOMParser, therefore, is non-recursive:
[...]
The way I load an XML document into MetaKit, it uses an explicit stack
with exactly one "int" per level. I think you'll agree that this amount
of "stack" use makes the approach suitable for any document (once I add
some tests - this is just an experiment for now). Source code is at:
http://www.equi4.com/metakit/xml/mk4xml.cpp
After that, you end up with an on-demand loaded document, which is
indexable so there is no scanning at all when accessing this data.
Every child node is in an indexable "subview". And when you *do* need
traversal, you can again use the same one-int-per-level stack approach.
This works equally well in the case of end-node processing, BTW.
> Because of the reliability of this algorithm vis-a-vis the recursive
> one, I believe it should be the standard way of walking DOM trees,
> and therefore it is essential that DOM implementations make the
> structural access methods fast.
By reliability, do you mean "not blowing up its stack"?
As you can see, there are more ways than one to skin this cat. It seems
to me that standardizing in the way you propose will prevent the use of
other techniques - such as storing XML as a MetaKit datafile and using
explicit recursion.
-- Jean-Claude
________________________________________________________________________
Jean-Claude Wippler MetaKit home page - http://www.equi4.com/metakit/
Equi4 Software "Portable database software for a changing world"
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jcw at equi4.com Tue Nov 3 21:41:24 1998
From: jcw at equi4.com (Jean-Claude Wippler)
Date: Mon Jun 7 17:06:12 2004
Subject: [Fwd: Walking the DOM (was: XML APIs)]
References: <363F54B8.6268F15@totten.com>
Message-ID: <363F7824.EE88DDDB@equi4.com>
John Cowan wrote:
> Stephen R. Savitzky wrote:
>
> > [T]he classic algorithm for traversing a tree is:
> >
> > traverse(node) {
> > visit(node);
> > if (node.firstChild != null) traverse(node.firstChild);
> > if (node.nextSibling != null) traverse(node.nextSibling);
> > }
>
> The trouble with that algorithm is that it is recursive. It will
> blow up if the tree is sufficiently deep. Indeed, in
> languages that cannot be relied on to do tail recursion, like
> Java, it will blow up if the tree is merely sufficiently wide.
>
> Furthermore, if there is any end-of-node processing to do, such as
> emitting an end tag indication, then the algorithm is no longer
> even partly tail recursive and will blow up on both depth and
> width even in safe-tail-recursion languages.
>
> The algorithm I use in DOMParser, therefore, is non-recursive:
[...]
The way I load an XML document into MetaKit, it uses an explicit stack
with exactly one "int" per level. I think you'll agree that this amount
of "stack" use makes the approach suitable for any document (once I add
some tests - this is just an experiment for now). Source code is at:
http://www.equi4.com/metakit/xml/mk4xml.cpp
After that, you end up with an on-demand loaded document, which is
indexable so there is no scanning at all when accessing this data.
Every child node is in an indexable "subview". And when you *do* need
traversal, you can again use the same one-int-per-level stack approach.
This works equally well in the case of end-node processing, BTW.
> Because of the reliability of this algorithm vis-a-vis the recursive
> one, I believe it should be the standard way of walking DOM trees,
> and therefore it is essential that DOM implementations make the
> structural access methods fast.
By reliability, do you mean "not blowing up its stack"?
As you can see, there are more ways than one to skin this cat. It seems
to me that standardizing in the way you propose will prevent the use of
other techniques - such as storing XML as a MetaKit datafile and using
explicit recursion.
-- Jean-Claude
________________________________________________________________________
Jean-Claude Wippler MetaKit home page - http://www.equi4.com/metakit/
Equi4 Software "Portable database software for a changing world"
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Jon.Bosak at eng.Sun.COM Tue Nov 3 22:46:59 1998
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun 7 17:06:12 2004
Subject: DTD,s
Message-ID: <199811032243.OAA23029@boethius.eng.sun.com>
[Frank Boumphrey:]
| Does any one know of a site where XML dtd's are available for general
| use?
OASIS (http://www.oasis-open.org) is slowly gearing up to provide a
DTD registry. This will no doubt be discussed at the OASIS meeting in
Chicago November 15.
Jon
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Tue Nov 3 23:34:00 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:12 2004
Subject: Last Call issued on initial stylesheet linking draft
Message-ID: <3.0.32.19981103153315.00af8b30@pop.intergate.bc.ca>
Posting this on behalf the Syntax WG; xml-dev is obviously
highly qualified to provide feedback.
Cover letter is from Syntax WG co-chair Joel Nava
===================================================================
The XML Syntax Working Group of the W3C is issuing a "Last Call"
for comments on the specification "Associating stylesheets with
XML documents - version 1.0"
http://www.w3.org/TR/WD-xml-stylesheet
Please review the document and send any comments you have
to jjc@jclark.com, tbray@textuality.com, jnava@adobe.com.
Comments are due by Friday Nov. 20th.
To save bandwidth, I am including some rationale for the
specific syntax we are using in this specification.
As you will notice, the Working Group has chosen to use a special
processing instruction, or PI, to link an XML document to stylesheets.
Some wonder whether an element or attribute based solution somewhere
along the lines of XLink or the XML Namespace mechanism would be more
appropriate.
The reasons for our choice are 3-fold:
First, for the most part the working group feels that this syntax
is the best for the problem. Many argue that this is the proper use
for a PI, because it keeps this information out of the document tree.
Second, timing is an issue. The mechanism that we have produced
is very similar to the HTML link element, and was agreed upon
many months prior to the formation of this WG. The XML Style
Sheet Linking Specification was a partially completed work item
from the old XML WG. In the intervening time between the end of
that group and the beginning of this group the time to make an
impact on the next release from the browser vendors was slipping
away. We have tried to move quickly to complete this specification
in order to have an impact on what gets implemented. Both Microsoft
and Netscape have agreed on this syntax, and as far as we can guess
will be shipping products based on it in the near future.
The third part of this is the fact that time and resources have
already been put aside to produce a Version 2 of this specification.
"Associating stylesheets with XML documents - version 2.0" will add
other mechanisms for linking style to XML document; see
http://www.w3.org/XML/Activity.html#future
--
Joel A. Nava (408)536-6209
Adobe Systems, Inc. jnava@adobe.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Tue Nov 3 23:42:49 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:12 2004
Subject: Last call - correction
Message-ID: <3.0.32.19981103154228.009fc660@pop.intergate.bc.ca>
[oops - date error in previous posting]
>> Comments are due by Friday Nov. 20th.
should read
Comments are due by Tuesday Nov. 17th
-Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Nov 4 00:42:40 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:06:12 2004
Subject: Walking the DOM (was: XML APIs)
References: <3.0.32.19981103102617.00afddb0@pop.intergate.bc.ca>
Message-ID: <363F9E39.D978D55F@technologist.com>
Tim Bray wrote:
>
> >The trouble with that algorithm is that it is recursive. It will
> >blow up if the tree is sufficiently deep. Indeed, in
> >languages that cannot be relied on to do tail recursion, like
> >Java, it will blow up if the tree is merely sufficiently wide.
>
> Wouldn't the effects of recursion will be lost in the static,
> compared to the effects of loading the doc into memory to facilitate
> tree processing? Even if you are doing some persistent-ancillary-
> info trick to do a virtual tree, in my experience for very large
> docs you really have to wrangle memory carefully.
But the persistent ancillary-info trick (i.e. "object database") keeps
only the data it needs to in memory. If it requires lots of swapping, that
slows things down, but the algorithm works nevertheless. If you blow your
stack, you blow your stack, and there is no database in the world that
will help you.
Depending on the algorithm, walking an object database tree for a really
huge file may be faster than parsing it and event-processing it. It
depends on how many nodes you are actually processing, and how much
ancillary info you must keep around to solve the problem you need to
solve.
Paul Prescod - http://itrc.uwaterloo.ca/~papresco
"I always wanted to be somebody, but I should have been more
specific." --Lily Tomlin
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From phani at www.hsc.wvu.edu Wed Nov 4 02:42:01 1998
From: phani at www.hsc.wvu.edu (Phani Adabala)
Date: Mon Jun 7 17:06:12 2004
Subject: xml parser
Message-ID:
1.To develop a search engine for xml documents, can we use the xml parser
already developed by microsoft and others or do we need to build our own
parser?
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rbourret at ito.tu-darmstadt.de Wed Nov 4 09:18:38 1998
From: rbourret at ito.tu-darmstadt.de (Ronald Bourret)
Date: Mon Jun 7 17:06:13 2004
Subject: CDATA by any other name... (was The raw and the cooked)
Message-ID: <01BE07DB.D4272500@grappa.ito.tu-darmstadt.de>
John Cowan wrote:
> Rick Jelliffe wrote:
>
> > I hope that the schema people will look at
> > a solution for constraining strings to use certain repertoires of
> > characters.
>
> And I hope that they allow no such thing, except perhaps as a fall-out
> from some regex or other local syntax mechanism.
Why not? This would be very useful for constraining what can be put into a database, many (most?) of which do not support Unicode.
-- Ron Bourret
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rbourret at ito.tu-darmstadt.de Wed Nov 4 09:43:40 1998
From: rbourret at ito.tu-darmstadt.de (Ronald Bourret)
Date: Mon Jun 7 17:06:13 2004
Subject: Last Call issued on initial stylesheet linking draft
Message-ID: <01BE07DF.52B752C0@grappa.ito.tu-darmstadt.de>
The second S in StylesheetPI and both S's in PseudoAtt should be optional. Even the examples don't include them.
-- Ron Bourret
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Wed Nov 4 09:45:35 1998
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 17:06:13 2004
Subject: Walking the DOM (was: XML APIs)
In-Reply-To: <363F9E39.D978D55F@technologist.com>
References: <3.0.32.19981103102617.00afddb0@pop.intergate.bc.ca>
Message-ID: <3.0.6.32.19981104093649.0095e800@gpo.iol.ie>
[Paul Prescod]
>But the persistent ancillary-info trick (i.e. "object database") keeps
>only the data it needs to in memory. If it requires lots of swapping, that
>slows things down, but the algorithm works nevertheless. If you blow your
>stack, you blow your stack, and there is no database in the world that
>will help you.
>
>Depending on the algorithm, walking an object database tree for a really
>huge file may be faster than parsing it and event-processing it. It
>depends on how many nodes you are actually processing, and how much
>ancillary info you must keep around to solve the problem you need to
>solve.
>
In my experience, there is a strong "principle of locality" in
XML/SGML processing. I find I can get by quite happily with
mini-tree structures harvested at suitable points from
a larger document processed event-style. In my
Python toolkit for SGML/XML processing I added support
for sparse tree building some time ago and I find myself
using it more and more.
This is certainly far easier to do that implement virtual
tree access with swapping to disk etc. You don't need
no object database either:-)
http://www.python.org
The "Swiss Army Laser Beam" of programming languages
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ht at cogsci.ed.ac.uk Wed Nov 4 09:58:02 1998
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun 7 17:06:13 2004
Subject: Hybrid event/tree interfaces (was: Walking the DOM (was: XML APIs))
In-Reply-To: Sean Mc Grath's message of "Wed, 04 Nov 1998 09:36:49 +0000"
References: <3.0.32.19981103102617.00afddb0@pop.intergate.bc.ca> <3.0.6.32.19981104093649.0095e800@gpo.iol.ie>
Message-ID:
Sean Mc Grath writes:
> In my experience, there is a strong "principle of locality" in
> XML/SGML processing. I find I can get by quite happily with
> mini-tree structures harvested at suitable points from
> a larger document processed event-style. In my
> Python toolkit for SGML/XML processing I added support
> for sparse tree building some time ago and I find myself
> using it more and more.
>
> This is certainly far easier to do that implement virtual
> tree access with swapping to disk etc. You don't need
> no object database either:-)
Our experience is very much in agreement with this. We have been
using an API [1] which allows you to switch from event to
tree(-fragment) view for the last few years, and it is a very
productive way to go. You can think of it as allowing you to loop
over nodes in a document which match a query in a restricted query
language, restricted in that you can only query properties
(e.g. tag name, attribute values) of candidate nodes and their
ancestors: no descendents or siblings. If you like what you see,
THEN you can ask for the whole subtree rooted in that node, and do
whatever you like with it.
It should be clear that a simple implementation of this is possible,
which only needs to keep a stack of current ancestor start-tags. The
result is no upper bound on document size: we regularly push 2GB of
XML through a chain of filters implemented in this way.
ht
[1] http://www.ltg.ed.ac.uk/software/xml/
--
Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.ltg.ed.ac.uk/~ht/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Wed Nov 4 11:00:01 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:06:13 2004
Subject: xml parser
Message-ID: <004701be07e1$9290ed00$7008e391@bra01wmhkay.bra01.icl.co.uk>
>
>1.To develop a search engine for xml documents, can we use the xml parser
>already developed by microsoft and others or do we need to build our own
>parser?
My immediate answer to this is yes, all the information you need for a
search engine is available via the SAX or DOM interface offered by many
parsers.
This is certainly true for the indexing phase; for displaying hit documents
I can think of some requirements that a standard parser might not meet, such
as displaying the text around a search term without parsing the whole
document. So it depends on your detailed design. But in any case many XML
parsers are available with source code so you shouldn't need to write a new
one from scratch.
Of course you don't need to build your own search engine either, all you
need to do is write an XML filter for an existing search engine. I'm
surprised no-one seems to have done this yet.
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rbourret at ito.tu-darmstadt.de Wed Nov 4 11:14:03 1998
From: rbourret at ito.tu-darmstadt.de (Ronald Bourret)
Date: Mon Jun 7 17:06:13 2004
Subject: xml parser
Message-ID: <01BE07EB.ED7FB070@GRAPPA>
Phani Adabala wrote:
> 1.To develop a search engine for xml documents, can we use the xml parser
> already developed by microsoft and others or do we need to build our own
> parser?
You definitely don't need to write your own parser -- there are plenty available, including Microsoft's. See, for example:
http://www.xmlsoftware.com/parsers/
http://www.oasis-open.org/cover/xml.html#xmlSoftware
It would also be a good idea to write your software in a parser-independent way, using SAX (http://www.megginson.com/SAX/) or DOM (http://www.w3.org/TR/REC-DOM-Level-1/). The former is an event-driven interface, the latter is a tree interface.
-- Ron Bourret
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rja at dip.co.uk Wed Nov 4 11:35:55 1998
From: rja at dip.co.uk (Richard James Anderson)
Date: Mon Jun 7 17:06:13 2004
Subject: xml parser
Message-ID: <000101be07e7$5c3e2910$c5010180@p197>
Hi,
For those who are interested, I've posted an early version of my ActiveX SAX
control up on my website ( URL below ).
The control still has a long way to go, but it can parse most files that do
not contain references to external entities.
The download includes a sample VB6 app for reading and processing XML files.
It just loads the XML file into a tree control, and shows the SAX events in
a list control. Of course, the control can be used to anything that
supports COM automation controllers.
Enjoy,
Richard.
RJA@DIP.CO.UK
http://www.arpsolutions.demon.co.uk
*** The text contained within this message is of a personal nature that does
not reflect the development of opinions of data interchange plc unless
specifically stated ***
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From david at megginson.com Wed Nov 4 11:49:30 1998
From: david at megginson.com (david@megginson.com)
Date: Mon Jun 7 17:06:13 2004
Subject: CDATA by any other name... (was The raw and the cooked)
In-Reply-To: <01BE07DB.D4272500@grappa.ito.tu-darmstadt.de>
References: <01BE07DB.D4272500@grappa.ito.tu-darmstadt.de>
Message-ID: <13888.15941.518550.194555@localhost.localdomain>
Ronald Bourret writes:
> Why not? This would be very useful for constraining what can be
> put into a database, many (most?) of which do not support Unicode.
There are three, much better choices for specific problems like this:
1. Have the application throw an error if an out-of-range character
appears.
2. Convert the text to UTF-8 before storing it in the database (UTF-8
and ASCII are identical up to 0x7f)
3. Escape non-ASCII characters with character references before
storing the text in the database.
As I mentioned before, it's always better to be explicit about this
kind of thing -- syntactic subtlety is a bad thing.
All the best,
David
--
David Megginson david@megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Nov 4 14:03:02 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:06:13 2004
Subject: Walking the DOM (was: XML APIs)
References: <3.0.32.19981103102617.00afddb0@pop.intergate.bc.ca> <3.0.6.32.19981104093649.0095e800@gpo.iol.ie>
Message-ID: <364059E7.4E2067AB@technologist.com>
Sean Mc Grath wrote:
>
> In my experience, there is a strong "principle of locality" in
> XML/SGML processing. I find I can get by quite happily with
> mini-tree structures harvested at suitable points from
> a larger document processed event-style. In my
> Python toolkit for SGML/XML processing I added support
> for sparse tree building some time ago and I find myself
> using it more and more.
As long as you "get by", more power to you. But what happens when you hit
a document where the first paragraph makes a cross reference to the last
paragraph and the last paragraph makes a reference to somewhere in the
middle? You can hack around it (after all, some people get away with using
Omnimark!), but you will be hacking. You could also hack around local tree
access. Given the choice of hacking around one or the other, I would
rather hack around local references, because those are more predictable.
> This is certainly far easier to do that implement virtual
> tree access with swapping to disk etc. You don't need
> no object database either:-)
That's true, but it doesn't scale to the full generality of problems. As
long as you can get away with it, do so, but I know that I have problems
that require the Full Monty.
Paul Prescod - http://itrc.uwaterloo.ca/~papresco
"I always wanted to be somebody, but I should have been more
specific." --Lily Tomlin
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Wed Nov 4 14:54:52 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:13 2004
Subject: Walking the DOM (was: XML APIs)
References: <3.0.32.19981103102617.00afddb0@pop.intergate.bc.ca> <363F9E39.D978D55F@technologist.com>
Message-ID: <36406AF7.4C8A6CD6@locke.ccil.org>
Paul Prescod wrote:
> If it requires lots of swapping, that
> slows things down, but the algorithm works nevertheless. If you blow your
> stack, you blow your stack, and there is no database in the world that
> will help you.
What he said.
Hence the desirability of a non-stack-based algorithm.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From msabin at cromwellmedia.co.uk Wed Nov 4 15:17:37 1998
From: msabin at cromwellmedia.co.uk (Miles Sabin)
Date: Mon Jun 7 17:06:13 2004
Subject: Interface name quandry ...
Message-ID:
Apologies in advance if this is a bit off topic,
and apologies to those who get multiple copies.
I'm working on a number of Java APIs which operate
on documents and their DOM representations relying
on only the intersection of the properties of XML
and HTML, and I've been racking my brains for a
good name that covers both HTML and XML, but isn't
as general as SGML.
Has anybody got any suggestions?
Cheers,
Miles
--
Miles Sabin Cromwell Media
Internet Systems Architect 5/6 Glenthorne Mews
+44 (0)181 410 2230 London, W6 0LJ
msabin@cromwellmedia.co.uk England
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Wed Nov 4 15:26:38 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:13 2004
Subject: Last Call issued on initial stylesheet linking draft
References: <3.0.32.19981103153315.00af8b30@pop.intergate.bc.ca>
Message-ID: <36407254.FF6816DD@locke.ccil.org>
Joel Nava wrote:
> To save bandwidth, I am including some rationale for the
> specific syntax we are using in this specification.
I believe that a lightly edited version of this rationale should
be included as an (informative) appendix to the recommendation.
Otherwise I believe people will see the rec as unmotivated
and will ignore it.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From sroth at radsys.com Wed Nov 4 15:28:53 1998
From: sroth at radsys.com (Roth, Scott)
Date: Mon Jun 7 17:06:13 2004
Subject: Text file to XML??
Message-ID: <5FAFB2A5D7B2D111ACEA0060972027CE186366@RADSYS_EXCH>
Help....???
I am working on a way to take delimited text file that has data in it and
break that up so that I can make files that hold xml data within it. The
text file holds metadata already that points to files and holds certain key
information. What I want to do is take that data and have it put the proper
xml tags in where the fields are and then take that data and put it also
into the file. Then I want to add the proper HTML. Does this make sense???
Please help me out I want to know if anybody out there has done this already
so I don't have to reinvent the wheel. And if anyone has any helpful hints
please let me know.
Thanks,
Scott Roth
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Wed Nov 4 15:49:19 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:13 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <01BE07DB.D4272500@grappa.ito.tu-darmstadt.de>
Message-ID: <364077B9.AB1AA3A6@locke.ccil.org>
Ronald Bourret wrote:
> Why not? This would be very useful for constraining what can be put
> into a database, many (most?) of which do not support Unicode.
Because I thought XML (and Unicode) were in the business of enabling,
not constraining. Lack of support for anything but Western Europe
is an unfortunate misfeature to be worked around. Perhaps the
routines that will later read from the database (not in XML, I assume)
can be taught to understand HCRs.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Wed Nov 4 16:00:55 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:13 2004
Subject: Interface name quandry ...
References:
Message-ID: <36407A63.DF462A3F@locke.ccil.org>
Miles Sabin wrote:
> I've been racking my brains for a
> good name that covers both HTML and XML, but isn't
> as general as SGML.
>
> Has anybody got any suggestions?
XHTML has been used on this list, and I think is well understood.
Some people have also used the term "HTML 5.0" :-)
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tug at wilson.co.uk Wed Nov 4 16:15:29 1998
From: tug at wilson.co.uk (John Wilson)
Date: Mon Jun 7 17:06:14 2004
Subject: Parsing XML for direct use by programs
Message-ID: <020d01be080e$3ebfe3c0$010a0a0a@bach.wilson.co.uk>
Programs that consume XML generally have to use a two stage process:
1/ Parse the XML
2/ Create new objects to represent the data in the document
(Here I'm thinking of things like EDI applications rather than of XML
browsers. In these cases I have elements which represent dates, amounts of
money, par number, etc. and I need to turn them into the appropriate
internal data structures before my program can process them.)
I have lots of support for step 1 but little or no support for step 2 (I'll
address Bill la Forge's Coins system latter)
What I think I would find helpful is a system which would let me describe
how an XML document which corresponds to a given DTD be converted into an
instances of a particular objects in my particular programming language. Of,
course I'd like to describe this in XML!
To take a concrete example:
There are several ways of expressing a date in various DTDs in use now.
In my Java program I want to deal with instances of java.util.Date.
I don't want to encumber my program with all the hand crafted tedious detail
of turning the XML element into an instance of java.util.Date by hand.
I do want a standard package that reads a DTD an DTD->Java Object mapping
description and an XML document and spits out the object tree that is
understood by my program, not a DOM tree.
Now, as I understand it, Coins can sort of do this but the designer of the
DTD has really to take Coins into account at the beginning. This isn't what
I want to do at all. I want the same DTD to be combined with different
mapping descriptions to produce different object trees and I want different
DTDs to be combined with different mapping descriptions to provide the same
object tree.
Is anybody working on this?
Is it feasible?
Is it useful?
John Wilson
The Wilson Partnership
5 Market Hill, Whitchurch, Aylesbury, Bucks HP22 4JB, UK
+44 1296 641072, +44 976 611010(mobile), +44 1296 641874(fax)
Mailto: tug@wilson.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rhanson at blast.net Wed Nov 4 16:20:05 1998
From: rhanson at blast.net (Robert Hanson)
Date: Mon Jun 7 17:06:14 2004
Subject: xml parser
Message-ID: <000e01be080e$d8728220$12b919ce@Bertha>
I downloaded your control, and have some questions...
1. Why can't I get it to work?
Ok, that was only one question, but a serious one.
I'm am trying to add Perl to your list of "Tested with", but ran into some
problems. Below is the code I used:
1. use Win32::OLE;
2.
3. $parser = Win32::OLE->new('SAX.SAXParser') or die $!;
4. $parser->parseFile('c:\winn95\desktop\test.xml') or die $!;
5. undef $parser;
6.
7. sub characters
8. {
9. my ($sCharacter, $iLength) = @_;
10. print "$sCharacter\n\n";
11. }
It seems to be able to create the SAX object in line 3, but dies on line 4
with the parseFile method. Is there anyway to
get an error from the parser to see what the problem is... maybe a
getLastError method? If I get a chance, I may also try it out with
PerlScript (or VBScript) in ASP later this week.
...If you (or anyone else) have any other ideas on getting this to work,
please let me know.
Many thanks,
Robert
-----Original Message-----
From: Richard James Anderson
To: XMLDEV
Date: Wednesday, November 04, 1998 6:36 AM
Subject: RE: xml parser
>Hi,
>
>For those who are interested, I've posted an early version of my ActiveX
SAX
>control up on my website ( URL below ).
>
>The control still has a long way to go, but it can parse most files that do
>not contain references to external entities.
>
>The download includes a sample VB6 app for reading and processing XML
files.
>It just loads the XML file into a tree control, and shows the SAX events in
>a list control. Of course, the control can be used to anything that
>supports COM automation controllers.
>
>Enjoy,
>
>Richard.
>
>RJA@DIP.CO.UK
>http://www.arpsolutions.demon.co.uk
>
>*** The text contained within this message is of a personal nature that
does
>not reflect the development of opinions of data interchange plc unless
>specifically stated ***
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Wed Nov 4 16:23:10 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:14 2004
Subject: xml parser
Message-ID: <3.0.32.19981104081951.00b61b10@pop.intergate.bc.ca>
At 10:55 AM 11/4/98 -0000, Michael Kay wrote:
>My immediate answer to this is yes, all the information you need for a
>search engine is available via the SAX or DOM interface offered by many
>parsers.
I disagree. Few parsers track byte offsets or other locational info in
the file, and I think you need that to do basic things like proximity
and phrase search.
>Of course you don't need to build your own search engine either, all you
>need to do is write an XML filter for an existing search engine. I'm
>surprised no-one seems to have done this yet.
I think you do need to build your own engine. Reason is, most existing
search engines have an atomic-document view of the world, and break
down completely when asked to model a general recursive hierarchical
structure like XML. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rja at dip.co.uk Wed Nov 4 16:39:59 1998
From: rja at dip.co.uk (RJA)
Date: Mon Jun 7 17:06:14 2004
Subject: xml parser
Message-ID: <000101be0811$dbde7c90$c5010180@p197>
>1. Why can't I get it to work?
Lets try to find out.
>Is there anyway to
>get an error from the parser to see what the problem is... maybe a
>getLastError method? If I get a chance, I may also try it out with
>PerlScript (or VBScript) in ASP later this week.
The error interface has not been exposed in the control yet. I'll be doing
that as soon as I get some spare time ( asap ).
I'll download the python compiler and try your sample.
The SAX events are currently being fired using standard COM connection
points, but the event interface is IUnknown based, not IDispatch. Maybe
thats a problem with Python ? I'll let you know.
Regards,
Richard.
mailto://RJA@DIP.CO.UK
http://www.arpsolutions.demon.co.uk
*** The text contained within this message is of a personal nature that does
not reflect the development of opinions of data interchange plc unless
specifically stated ***
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Nov 4 16:49:41 1998
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 17:06:14 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <01BE07DB.D4272500@grappa.ito.tu-darmstadt.de> <364077B9.AB1AA3A6@locke.ccil.org>
Message-ID: <36407FC0.D8919AA7@technologist.com>
John Cowan wrote:
>
> Because I thought XML (and Unicode) were in the business of enabling,
> not constraining.
XML DTDs are in the business of constraining people to the data models and
data that the software is expecting/can deal with. I don't see any big
difference between saying: "This content must be restricted to this set of
characters" and "this content must be a NMTOKEN or base-64 encoded."
Nevertheless, this is clearly a schema problem and CDATA sections seem to
me to be a really bad tool for enforcing this distinction. No editor
vendor is going to support that use for them so it is a moot point.
Paul Prescod - http://itrc.uwaterloo.ca/~papresco
The United Nations Declaration of Human Rights will be 50 years old on
December 10, 1998. These are your fundamental rights:
http://www.udhr.org/history/default.htm
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Wed Nov 4 17:00:00 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:14 2004
Subject: CDATA by any other name... (was The raw and the cooked)
References: <01BE07DB.D4272500@grappa.ito.tu-darmstadt.de> <364077B9.AB1AA3A6@locke.ccil.org> <36407FC0.D8919AA7@technologist.com>
Message-ID: <36408848.12498760@locke.ccil.org>
Paul Prescod wrote:
> XML DTDs are in the business of constraining people to the data models and
> data that the software is expecting/can deal with. I don't see any big
> difference between saying: "This content must be restricted to this set of
> characters" and "this content must be a NMTOKEN or base-64 encoded."
Put that way, I suppose you are right. As I said before, this could and
should be handled as a special case of "The character data of this
element must conform to the following regular expression."
> Nevertheless, this is clearly a schema problem and CDATA sections seem to
> me to be a really bad tool for enforcing this distinction.
Particularly because it would mean that the charset of an XML document
would become part of its schema: a document in US-ASCII can have
only ASCII in its CDATA sections, but if it were transcoded to
ShiftJIS, then it could have any JIS X 208 character in the
CDATA section.
So this means that transcoding arbitrary XML documents *requires*
parsing them, because if you are reducing the repertoire, you may need
to break up CDATA sections, and you cannot (?) recognize a
CDATA section reliably without parsing. (In particular, what
looks like a CDATA section start/end could appear as an attribute
value, PI data, or comment.) An interesting side effect!
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Wed Nov 4 17:40:52 1998
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 17:06:14 2004
Subject: CDATA by any other name... (was The raw and the cooked)
Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C08743F15@RED-MSG-56>
I like Rick's idea of xml:content-mode="CDATA". This definitely
disambiguates this whitespace case for the validating parser.
So
]>
would become:
]>
It's true that a non-validating parser will have difficulty with the latter
example, but that is solved by putting the xml:content-mode attribute on the
instance.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From avirr at LanMinds.Com Wed Nov 4 18:30:48 1998
From: avirr at LanMinds.Com (Avi Rappoport)
Date: Mon Jun 7 17:06:14 2004
Subject: xml parser
In-Reply-To: <3.0.32.19981104081951.00b61b10@pop.intergate.bc.ca>
Message-ID:
At 8:22 AM -0800 11/4/98, Tim Bray wrote:
> At 10:55 AM 11/4/98 -0000, Michael Kay wrote:
> >My immediate answer to this is yes, all the information you need for a
> >search engine is available via the SAX or DOM interface offered by many
> >parsers.
>
> I disagree. Few parsers track byte offsets or other locational info in
> the file, and I think you need that to do basic things like proximity
> and phrase search.
What Tim said. Most search engines do not have database storage, they have
a fairly simple inverted index. Trying to put all the XML info in there
would overload them. The point of having an XML search is to have metadata
and context, so you probably need to use some of the more sophisticated
text retrieval and library systems.
BTW, I'm trying to collect information on XML and search, so please keep me
posted if you are working on something. I post everything I hear about at
Avi
________________________________________________________________
Avi Rappoport, Web Site Search Tools Maven:
Guide to Site Indexing and Local Search Engines:
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From bckman at ix.netcom.com Wed Nov 4 20:18:13 1998
From: bckman at ix.netcom.com (Frank Boumphrey)
Date: Mon Jun 7 17:06:14 2004
Subject: W3 DOM tutorial.
Message-ID: <000701be0830$133a68c0$16afdccf@ix.netcom.com>
(cross posted to xml-dev)
For those who may be interested, now that Microsoft have released the new
version of their IE5 beta, I have posted a new tutorial on the DOM at
www.hypermedic.com.
Follow the DOM links.
Regards
Frank
Frank Boumphrey
XML and style sheet info at Http://www.hypermedic.com/style/index.htm
Author: - Professional Style Sheets for HTML and XML http://www.wrox.com
CoAuthor: Professional XML applications form Wrox Press, www.wrox.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Suli.Ding at geis.ge.com Wed Nov 4 22:23:17 1998
From: Suli.Ding at geis.ge.com (Ding, Suli (GEIS))
Date: Mon Jun 7 17:06:14 2004
Subject: Text file to XML??
Message-ID:
Scott,
Have you check out this URL
http://www.geocities.com/SiliconValley/Platform/4871/
Regards,
Suli
> ----------
> From: Roth, Scott[SMTP:sroth@radsys.com]
> Reply To: Roth, Scott
> Sent: Wednesday, November 04, 1998 10:28 AM
> To: XML Dev Mailing (E-mail)
> Subject: Text file to XML??
>
> Help....???
>
> I am working on a way to take delimited text file that has data in it and
> break that up so that I can make files that hold xml data within it. The
> text file holds metadata already that points to files and holds certain
> key
> information. What I want to do is take that data and have it put the
> proper
> xml tags in where the fields are and then take that data and put it also
> into the file. Then I want to add the proper HTML. Does this make
> sense???
>
> Please help me out I want to know if anybody out there has done this
> already
> so I don't have to reinvent the wheel. And if anyone has any helpful
> hints
> please let me know.
>
> Thanks,
>
> Scott Roth
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From david at megginson.com Wed Nov 4 22:27:45 1998
From: david at megginson.com (david@megginson.com)
Date: Mon Jun 7 17:06:14 2004
Subject: Interface name quandry ...
In-Reply-To:
References:
Message-ID: <13888.54350.64691.396015@localhost.localdomain>
[cross-postings removed]
Miles Sabin writes:
> I'm working on a number of Java APIs which operate
> on documents and their DOM representations relying
> on only the intersection of the properties of XML
> and HTML, and I've been racking my brains for a
> good name that covers both HTML and XML, but isn't
> as general as SGML.
>
> Has anybody got any suggestions?
I think that the DOM calls these the "fundamental" node types, but I
don't have the REC in front of me right now to check.
All the best,
David
--
David Megginson david@megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From david at megginson.com Wed Nov 4 22:33:58 1998
From: david at megginson.com (david@megginson.com)
Date: Mon Jun 7 17:06:14 2004
Subject: SAX, DOM, and Search Engines (was Re: xml parser)
In-Reply-To: <3.0.32.19981104081951.00b61b10@pop.intergate.bc.ca>
References: <3.0.32.19981104081951.00b61b10@pop.intergate.bc.ca>
Message-ID: <13888.54487.434062.193573@localhost.localdomain>
Tim Bray writes:
> At 10:55 AM 11/4/98 -0000, Michael Kay wrote:
> >My immediate answer to this is yes, all the information you need for a
> >search engine is available via the SAX or DOM interface offered by many
> >parsers.
>
> I disagree. Few parsers track byte offsets or other locational info in
> the file, and I think you need that to do basic things like proximity
> and phrase search.
I disagree. While byte offsets might be useful for other purposes,
they would be inappropriate for proximity and phrase searches -- for
those, you need to track the relative positions of words, not their
absolute positions. Consider the following example:
WORD1 &x; WORD2
Is WORD1 close to WORD2? It's only five bytes away (assuming an 8-bit
encoding), but might be separated by 20,000 words, depending on what
&x; expands to. SAX and the DOM do give you enough information to
determine the relative positions of words.
Byte offsets would be helpful for displaying context around a match,
but there would be no 100% reliable way to format that context without
starting from the top of the document, in which case an XPOINTER (also
derivable from SAX or DOM) might be more helpful unless you want the
search engine to display raw XML markup for the context.
All the best,
David
--
David Megginson david@megginson.com
http://www.megginson.com/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Wed Nov 4 23:08:59 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:15 2004
Subject: SAX, DOM, and Search Engines (was Re: xml parser)
Message-ID: <3.0.32.19981104150834.00b4f100@pop.intergate.bc.ca>
At 05:32 PM 11/4/98 -0500, david@megginson.com wrote:
>Tim Bray writes:
> > I disagree. Few parsers track byte offsets or other locational info in
> > the file, and I think you need that to do basic things like proximity
> > and phrase search.
>
>I disagree. While byte offsets might be useful for other purposes,
>they would be inappropriate for proximity and phrase searches -- for
>those, you need to track the relative positions of words, not their
>absolute positions. Consider the following example:
>
>
WORD1 &x; WORD2
>Is WORD1 close to WORD2?
Clearly, the proximity tests have to work in terms of proximity in the
cooked, not raw, text. Lark carefully tracks offsets in terms of the
entity stack so you can do this. But that's so obvious I don't think
it's your point.
Secondly, for proximity, you're worried about counting characters, not
bytes, but for addressing back into the entity, you're worried about byte,
not character, offsets. So it's even harder than it looks. Unless
of course you're using UTF16 and staying in the BMP - which might be
a REAL good idea in an IR-oriented system anyhow.
> It's only five bytes away (assuming an 8-bit
>encoding), but might be separated by 20,000 words, depending on what
>&x; expands to. SAX and the DOM do give you enough information to
>determine the relative positions of words.
[warning: simple argument with long embedded digression]
I don't think so. How about languages, such as those spoken by the
majority of the world's inhabitants, that do not separate words with
spaces? (Identifying word breaks in running Japanese or Chinese
text is essentially a strong-AI problem. You can get decent results
by running a dictionary and searching at each character break for
a match, with morphological heuristics, but it turns out that in those
languages there is sufficient encoding redundancy that you get pretty
good results (at a cost of some space wasteage) just treating most
characters as words - and lurking in that fact there's a PhD in
linguistics for someone - but I digress, I spent a long time
in those particular mines).
But spotting "words" may not matter. In fact, I am not aware of
any research that shows word proximity to be a better information
retrieval heuristic than character proximity. And it's much easier
to nail down what you mean by "character" than "word", and thus get
deterministic cross-language behavior.
>Byte offsets would be helpful for displaying context around a match,
>but there would be no 100% reliable way to format that context without
>starting from the top of the document
unless you used the whizzy new soon-to-arrive W3C fragment packager,
right? Actually, if you have an index that can understand the the
structure well enough to support xpointer-flavor querying, the engine
is going to know all the context info, so this should actually work
pretty well (but only if you know the byte/character offsets).
And the right way to display results in context depends on whether
you're sampling, or visiting match.
OK, you've been warned... if you get me going on the problems of
searching in tagged internationalized text, bring a windbreaker -
you'll need it. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From salur at csee.wvu.edu Wed Nov 4 23:23:38 1998
From: salur at csee.wvu.edu (Salur Prashanth)
Date: Mon Jun 7 17:06:15 2004
Subject: XML Search Engine
Message-ID:
Hi all,
Can anyone tell me where the difference lies in implementing a search
engine for HTML and a search engine for XML.
Thanks
Salur.
@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#
Address:
--------
Prashanth Kumar Salur, Apt# 910-1,
Graduate Student, CSEE, 445 Oakland Street,
West Virginia University Morgantown,WV-26505
Off. Ph: 304 293 6371 Ext 577 Res Ph: 304 598 8025
@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#@#
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ralph at fsc.fujitsu.com Thu Nov 5 00:02:32 1998
From: ralph at fsc.fujitsu.com (Ralph Ferris)
Date: Mon Jun 7 17:06:15 2004
Subject: HyBrick V0.8 with XLink/XPointer is now Available
Message-ID: <3.0.5.32.19981105065829.00956a80@pophost.fsc.fujitsu.com>
All,
The latest version of Fujitsu's "HyBrick" browser, V0.8, with support for
XLink/XPointer, is now available from Fujitsu's Web site:
http://www.fujitsu.co.jp/hypertext/free/HyBrick/download2.html
The browser and supporting documentation can be downloaded by clicking on
hb.08.exe.
This is a Japanese-language site, so much of the supporting documentation
won't be accessible to non-Japanese readers. A brief summary:
Features:
- HyBrick includes a DSSSL renderer and XLink/XPointer engine running on
top of SP and Jade
- XLink/XPointer are supported on the local file system
- XPointer is implemented as a subset of the HyTime property set
- Link traversal can use either "New" or "Replace" to display a new page
Using HyBrick:
- HyBrick is supplied as a self-extracting file.
- Once the files are installed, start HyBrick from the bin directory.
- Use the "Browse" button to open the file sample\docs\readme.xml.
- Click on blue-highlighted areas with the left mouse button to see a list
of locations linked to the highlighted location. If only one location is
available, traversal to that location is immediate.
- Click on blue-highlighted areas with the right mouse button to see the
location of that area expressed as an XPointer.
Contact info:
Please address questions and comments to:
hb-staff@ml.flab.fujitsu.co.jp
Best regards,
Ralph E. Ferris
Fujitsu Software Corporation
ralph@fsc.fujitsu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Sung_Nguyen at datacard.com Thu Nov 5 00:16:56 1998
From: Sung_Nguyen at datacard.com (Sung Nguyen)
Date: Mon Jun 7 17:06:15 2004
Subject: C++ XML Parser
Message-ID: <00153772.3096@datacard.com>
Hi:
Please point me to the API of any C++ XML Parser -
I am using IE5.0 - I cannot find any example to
follow - Someone please help.
Thanks,
SeanN
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Sung_Nguyen at datacard.com Thu Nov 5 00:24:52 1998
From: Sung_Nguyen at datacard.com (Sung Nguyen)
Date: Mon Jun 7 17:06:15 2004
Subject: WHERE: mshtml.h msxml.h
Message-ID: <0015378C.3096@datacard.com>
Hi:
I installed IE5.0 and I looked for the two header file mshtml.h msxml.h -
I couldn't find them? Do I need anything else to use C++ XML Parser in
IE5.0?
Please englighten me,
SeanN
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From v-jmurr at microsoft.com Thu Nov 5 04:20:14 1998
From: v-jmurr at microsoft.com (John Murray (Murray Info Serv. inc.))
Date: Mon Jun 7 17:06:15 2004
Subject: C++ samples, XML DOM doc
Message-ID:
C++ examples:
http://www.microsoft.com/gallery/samples/xml/c++_samples/default.asp
XML DOM reference:
http://www.microsoft.com/workshop/xml/xmldom/reference/start.asp
Thanks
John
From: Sung_Nguyen@datacard.com (Sung Nguyen)
Date: Wed, 4 Nov 1998 18:21:37 -0600
Subject: WHERE: mshtml.h msxml.h
Hi:
Please point me to the API of any C++ XML Parser -
I am using IE5.0 - I cannot find any example to
follow - Someone please help.
Thanks,
SeanN
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Thu Nov 5 05:45:17 1998
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 17:06:15 2004
Subject: Unicode, xml:lang, and variant glyphs
In-Reply-To: <363F657F.5D2E1B43@locke.ccil.org>
Message-ID: <002101be087f$a6d02440$d9e887cb@NT.JELLIFFE.COM.AU>
> From: John Cowan
> Rick Jelliffe wrote:
> > The primary purpose of xml:lang, as far as I am concerned, should be to
> > convey the information lost by ISO 10646 unification: where the
> > Japanese and Chinese glyphs
>
> Actually, the problem isn't that clearcut. As John Jenkins posted
> to the Unicode list last year:
> (..Lots of facts..)
FACT: Many times that someone says two characters are variants and should be
unified, someone else has used them not as variants. Hence the Unicode
compatability area.
> > (or Polish and Russian)
>
> How's that again?
Oops I meant Russian and Bylorussian (or Khazak or Ukrainian) where some of
the national characters have a different form.
> It doesn't lose information about meaning. It may make characters
> harder to read, but the distinction is one of typographic tradition,
> not language, and can cross languages.
Are you are saying that characters carry information, and never glyphs (or
character + locale + markup)? You cannot say this without knowing the domain
and purpose of the text: if it is mathematics, then the font definitely
carries information that the unified character does not. If you have a
multi-language dictionary or a list of names which requires exactness, the
font (or markup which selects the font) again is important.
"Harder to read" is no criterion at all. If it is harder to read, it is
because it has lost information.
Rick Jelliffe
Independent XML/SGML Consultant: FM+SGML a speciality
Research Assistant:Computing Center, Academia Sinica, Taipei
Author: The XML & SGML Cookbook, Recipes for Structured Information, 1998
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Thu Nov 5 12:31:28 1998
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 17:06:15 2004
Subject: XML Search Engine
Message-ID: <002001be08b7$9dbdeda0$7008e391@bra01wmhkay.bra01.icl.co.uk>
>Hi all,
>Can anyone tell me where the difference lies in implementing a search
>engine for HTML and a search engine for XML.
The main difference is that in HTML the tagging is almost useless in
localising the query, whereas in XML it is potentially very valuable. Many
search engines support field-oriented query, e.g. find "Ireland" as a
surname; with the right input filter for XML it becomes possible to map XML
elements to the fields understood by the search engine, making such queries
a feasible proposition, which is not the case for HTML.
Switching thrreads, I am a little surprised by Tim's remarks on word
proximity versus character proximity. Confining our attention to European
languages (as most search engines do), word proximity searching is a common
feature of the high-end search engines, whereas character proximity is
hardly found outside basic desktop tools like grep. Apart from anything
else, once you've done the word normalisation (normalising different
linguistic forms or spellings of the same word), character proximity is
meaningless. In the older boolean engines word proximity is used rather
mechanistically, in the newer engines it is used more subtly as part of a
statistical or linguistic approach to relevance ranking, but either way it
is an established feature of the scene, and it is not there on whim: the
search algorithms used are based on extensive research and benchmarking of
relevance and recall scores.
An interesting comparison of web search engines is at
http://www.netstrider.com/search/features.html ; this asserts that all the
well-known web search engines other than Lycos use word proximity matching.
(A good survey in spite of the fact that it fails to distinguish the
effectiveness of the query matcher from the effectiveness of the web
crawler)
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From kurt at simberg.com Thu Nov 5 13:03:18 1998
From: kurt at simberg.com (Kurt Helenelund)
Date: Mon Jun 7 17:06:15 2004
Subject: Creation of XML documents
Message-ID: <3641A21B.E5E4513D@simberg.com>
I am working on a project where we will use XML to exchange information
between
applications in different government agenices. We want to implement both
on-line access
between applications and asynchronous store & forward type of
mechanisms.
I understand that there are 'lots' of good XML parsers (we have tried
some) out there and that SAX and DOM are
the prefered ways for applications to 'read' XML structures. I would
like to ask if there's anyone
that have the opposite problem i.e. for applications to create XML
documents on-the-fly. Of course
the developer could 'hand code' the XML structures which is error prone
and booring . I am looking
for something (API, lib) so that we could avoid this.
I would like to have a 'library' to which the application developer
could say 'using this DTD please
instantiate a XML document and help me to fill it in'.
Any solutions?
--
_______________________________________________________________________
Kurt Helenelund Mobile: +358 50 555 0192
Simberg & Partners Home: +358 9 294 0313
Mielikintie 7B Fax: +358 9 294 0314
FIN-04230 KERAVA, Finland Email: kurt@simberg.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at cogsci.ed.ac.uk Thu Nov 5 13:40:02 1998
From: richard at cogsci.ed.ac.uk (Richard Tobin)
Date: Mon Jun 7 17:06:15 2004
Subject: Character and byte offsets
In-Reply-To: Tim Bray's message of Wed, 04 Nov 1998 15:08:39 -0800
Message-ID: <199811051339.NAA00077@cogsci.ed.ac.uk>
> Secondly, for proximity, you're worried about counting characters, not
> bytes, but for addressing back into the entity, you're worried about byte,
> not character, offsets. So it's even harder than it looks.
This reminds me - are there good techniques for maintaining a byte
offset in conjunction with character-set translations? Ideally you
want the translation done in big blocks at a low level, but then how
do you access the byte offsets? In RXP/LTXML I keep the offset of the
start of the block (which is actually a line), and then (in the case
of UTF-8) effectively reverse-translate to calculate how much to add
(this relies on UTF-8 being invertible). Surely there must be a better
way...
-- Richard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From north at Synopsys.COM Thu Nov 5 13:52:50 1998
From: north at Synopsys.COM (Simon North)
Date: Mon Jun 7 17:06:15 2004
Subject: XML and IE5 beta PR2
Message-ID: <199811051350.OAA03527@goofy.gr05.synopsys.com>
For anyone who hasn't noticed, the preview 2 release of IE5 was put
on the public servers yesterday (it had been placed there last Friday
but was pulled shortly afterwards). Today, Microsoft appear to have
put updated documentation on the SBN web pages.
The new release doesn't seem to support the style part of XSL, only
the transformation part (but it does seem to be nearly 100%
compliant, or at least as far as I've had time to check). It will,
for example, choke on process-children (unless someone else has
got it to work).
I have managed to get it to work with simple files such as
these:
The XML file:
Pierre: The AmbiguitiesHerman Melville9.99Heart of DarknessJoseph Conrad12.99ArrowsmithSinclair Lewis8.99Oedipus RexSophocles8.99The Secret Sharer and Other StoriesJoseph Conrad13.99The RepublicPlato12.99The RepublicPlato15.99PragmatismWilliam James15.99
and the XSL file:
TITLE
AUTHOR
PRICE
(I quickly hacked these files from the sources available on the SBN
site).
Without a style sheet, it shows a 'raw' XML tree that you can expand
and contract. The DSO and data island mechanisms appear to be intact
(bar a few minor changes).
It also appears to correctly parse and validate XML code against a
DTD; the DTD display is suppressed.
I'm now going to tackle XLink and XPointer, although I suspect I know
what the results will be.
Simon.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ramesh.kasetty at trane.com Thu Nov 5 14:30:31 1998
From: ramesh.kasetty at trane.com (Kasetty, Ramesh)
Date: Mon Jun 7 17:06:15 2004
Subject: html, xml
Message-ID: <199811051429.IAA05529@nacg.trane.com>
Hi,
I have knowledge of HTML and trying to learn XML. Can anyone tell me the
difference between HTML and XML and where XML can used.
Thanks in advance,
Ramesh
ramesh.kasetty@trane.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jborden at mediaone.net Thu Nov 5 14:58:46 1998
From: jborden at mediaone.net (Borden, Jonathan)
Date: Mon Jun 7 17:06:15 2004
Subject: FW: Creation of XML documents
Message-ID: <003d01be08cc$9e9e07e0$d3228018@jabr.ne.mediaone.net>
There are several answers to this problem, including XML generation classes.
A standard way to do this is to persist your data into a DOM object and then
ask it to save itself. You might look at Jade (http://www.jclark.com) or
IBM's xml4j as a start.
Jonathan Borden
JABR Technolgy
>
>
> I am working on a project where we will use XML to exchange information
> between
> applications in different government agenices. We want to implement both
> on-line access
> between applications and asynchronous store & forward type of
> mechanisms.
>
> I understand that there are 'lots' of good XML parsers (we have tried
> some) out there and that SAX and DOM are
> the prefered ways for applications to 'read' XML structures. I would
> like to ask if there's anyone
> that have the opposite problem i.e. for applications to create XML
> documents on-the-fly. Of course
> the developer could 'hand code' the XML structures which is error prone
> and booring . I am looking
> for something (API, lib) so that we could avoid this.
>
> I would like to have a 'library' to which the application developer
> could say 'using this DTD please
> instantiate a XML document and help me to fill it in'.
>
> Any solutions?
>
>
>
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jhb at software-ag.de Thu Nov 5 15:10:32 1998
From: jhb at software-ag.de (Juliane Harbarth)
Date: Mon Jun 7 17:06:15 2004
Subject: html, xml
Message-ID: <008201be08d6$b368dc90$4ba2bd9d@pcjhb.software-ag.de>
-----Original Message-----
From: Kasetty, Ramesh
To: 'xml-dev@ic.ac.uk'
Date: Thursday, November 05, 1998 2:43 PM
Subject: html, xml
Kasetty, Ramesh >I have knowledge of HTML and trying to learn XML. Can
anyone tell me the
Kasetty, Ramesh >difference between HTML and XML and where XML can used.
HTML uses a fixed set of tags, to specify display properties for those
things
enclosed in the tags. XML allows the definition of tags that enable the
specification of semantic properties.
In my opinion XML offers the great benefit of being more processable by
machines than HTML. That especially holds for retrieval. Who wants to know
whether a certain document contains '1234' within
-Tags ? But a
question like 'which document contains 1234 as an Employee-Number'
makes sense.
Juliane Harbarth
Technical Consultant
Software AG Germany
mailto:jhb@software-ag.de
Tel +49 (0)6151 92 1147
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Thu Nov 5 15:55:28 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:15 2004
Subject: Unicode, xml:lang, and variant glyphs
Message-ID: <3.0.32.19981105075248.00b18610@pop.intergate.bc.ca>
At 04:46 PM 11/5/98 +1100, Rick Jelliffe wrote:
>> > The primary purpose of xml:lang, as far as I am concerned, should be to
>> > convey the information lost by ISO 10646 unification: where the
>> > Japanese and Chinese glyphs
The *only* purpose of xml:lang is to say what language it's in.
You need to know this for a *lot* more than picking glyphs. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Thu Nov 5 17:16:29 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:15 2004
Subject: Unicode, xml:lang, and variant glyphs
References: <002101be087f$a6d02440$d9e887cb@NT.JELLIFFE.COM.AU>
Message-ID: <3641DD79.2DD20660@locke.ccil.org>
Rick Jelliffe wrote:
> FACT: Many times that someone says two characters are variants and should be
> unified, someone else has used them not as variants. Hence the Unicode
> compatability area.
Unicode had to be round-trip compatible with many character sets formed
on different principles. The KSC character sets, e.g. encode some
hanja (Chinese character) more than once if they have more than
one meaning, for the sake of making hanja-hangeul conversions easy.
Nobody denies that these are the same *characters*; even their glyphs
are bit for bit the same.
> Oops I meant Russian and Bylorussian (or Khazak or Ukrainian) where some of
> the national characters have a different form.
I don't know about this. Are there really glyphic differences?
I know about the character-level differences, like Ukrainian using
GHE WITH STROKE except for a period from Stalin till a few years
ago, when they were forced to use GHE indiscriminately for GHE and
GHE WITH STROKE.
I also know about Polish accents, which are properly placed lower
over the character than similar-looking Western accents. That
certainly is a glyph difference that fine Polish typography should
take into account, but getting it wrong does not interfere with
*meaning*: it is not a plaintext distinction. (See below.)
A borderline case is 8859-2's use of S WITH CEDILLA and T WITH
CEDILLA to represent Romanian's S and T WITH COMMA BELOW. This is
finally being undone, so that Turkish can keep S WITH CEDILLA and
Romanian will get a proper S WITH COMMA BELOW. (Nobody actually
needs T WITH CEDILLA.) My *National Geographic* world map uses
S WITH CEDILLA in Romanian place names, but you have to look closely
and compare with Turkish place names to be sure.
> Are you are saying that characters carry information, and never glyphs (or
> character + locale + markup)?
No, I am talking about the CJK case specifically. A unified font
may look ugly, and certainly shouldn't be used for fine typography,
but a language indicator is neither necessary nor sufficient to
solve this problem.
This is not to say that in documents to be finely rendered, an
attribute called "cjkv-typographic-tradition" might not be
useful.
> if it is mathematics, then the font definitely
> carries information that the unified character does not.
Which is why there are a whole bunch of "letterlike symbols" for
math purposes.
> If you have a
> multi-language dictionary or a list of names which requires exactness, the
> font (or markup which selects the font) again is important.
Sure, font is important when it's important. My claim is confined
to this: that for plain-text purposes, Han unification does not
obscure anything essential.
> "Harder to read" is no criterion at all. If it is harder to read, it is
> because it has lost information.
Au contraire. The Unicode definition of a "plain text distinction"
is one which is necessary for mere legibility.
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cowan at locke.ccil.org Thu Nov 5 17:40:39 1998
From: cowan at locke.ccil.org (John Cowan)
Date: Mon Jun 7 17:06:16 2004
Subject: FW: Creation of XML documents
References: <003d01be08cc$9e9e07e0$d3228018@jabr.ne.mediaone.net>
Message-ID: <3641E353.4256F5FF@locke.ccil.org>
Borden, Jonathan wrote:
> A standard way to do this is to persist your data into a DOM object and then
> ask it to save itself. You might look at Jade (http://www.jclark.com) or
> IBM's xml4j as a start.
Alas, the DOM does not provide a standardized way for objects to
"save themselves".
--
John Cowan http://www.ccil.org/~cowan cowan@ccil.org
You tollerday donsk? N. You tolkatiff scowegian? Nn.
You spigotty anglease? Nnn. You phonio saxo? Nnnn.
Clear all so! 'Tis a Jute.... (Finnegans Wake 16.5)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Thu Nov 5 17:52:41 1998
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 17:06:16 2004
Subject: XML Search Engine
Message-ID: <3.0.32.19981105094725.009209b0@pop.intergate.bc.ca>
At 12:27 PM 11/5/98 -0000, Michael Kay wrote:
>Switching thrreads, I am a little surprised by Tim's remarks on word
>proximity versus character proximity. Confining our attention to European
>languages (as most search engines do), word proximity searching is a common
>feature of the high-end search engines, whereas character proximity is
>hardly found outside basic desktop tools like grep.
What I said was:
1. I have not seen any research which demonstrates that word proximity
achieves better results than character proximity based on any
well-known IR metric.
2. Doing word proximity at all is a *very* hard problem in the languages
used by a large majority of the world's population.
>Apart from anything
>else, once you've done the word normalisation (normalising different
>linguistic forms or spellings of the same word), character proximity is
>meaningless. In the older boolean engines word proximity is used rather
>mechanistically, in the newer engines it is used more subtly as part of a
>statistical or linguistic approach to relevance ranking
If you go poking around either in the SIGIR world (that would be the
Association for Computing Machinery's Special Interest Group on
Information Retrieval) or in the actual commercial retrieval engine
world, you find a distressing lack of technology progress. Yes, with
modern engines, precision & recall are measurably better than they
were in 1978. But 10 times as good? Hah! Twice as good? Maybe,
for certain restricted application domains. Given all this, I'm
less than impressed about the subtle techniques of modern engines.
On top of which, most of the techniques used in the "advanced" engines
are basically Anglocentric and fall apart once you get outside the
English-speaking world.
> but either way it
>is an established feature of the scene, and it is not there on whim: the
>search algorithms used are based on extensive research and benchmarking of
>relevance and recall scores.
Yeah, well, it's *not* an established feature of the scene in Asia. Maybe
it's just an irrational prejudice, but I'm not all that interested in
computing techniques that are not usable by a large majority of the
world's population. And once again, I challenge the assertion that,
for all these clever heuristics, real-world retrieval software is
really much better than it was 20 years ago.
>An interesting comparison of web search engines is at
>http://www.netstrider.com/search/features.html ; this asserts that all the
>well-known web search engines other than Lycos use word proximity matching.
And we know what wonderful results they produce (that's in English; for
real joy, go try a tricky in German - even European languages sometimes
leave out the spaces between the words - and see what happens). -Tim
PS: Given my grouchy tone, I should say that I'm dazzled at the
inventiveness, deep thought, and creativity that have been invested
in the IR field in recent decades. The fact the results are so
underwhelming is evidence of how hard the problems are... the real
lesson is that we should marvel at the language-processing apparatus
we carry around between our ears. -T
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From avirr at LanMinds.Com Thu Nov 5 18:04:03 1998
From: avirr at LanMinds.Com (Avi Rappoport)
Date: Mon Jun 7 17:06:16 2004
Subject: html, xml
In-Reply-To: <008201be08d6$b368dc90$4ba2bd9d@pcjhb.software-ag.de>
Message-ID:
> Kasetty, Ramesh >I have knowledge of HTML and trying to learn XML. Can
> anyone tell me the
> Kasetty, Ramesh >difference between HTML and XML and where XML can used.
>
>
> HTML uses a fixed set of tags, to specify display properties for those
> things
> enclosed in the tags. XML allows the definition of tags that enable the
> specification of semantic properties.
> In my opinion XML offers the great benefit of being more processable by
> machines than HTML. That especially holds for retrieval. Who wants to know
> whether a certain document contains '1234' within
-Tags ? But a
> question like 'which document contains 1234 as an Employee-Number'
> makes sense.
While I'm obsessed with search and XML, I think that's not going to be the
short-term gain with XML.
XML is not a set of tags like HTML, it's a set of simple rules for defining
tags for your own content. This lets you use XML files for data
interchange. Eventually, you'll be able to post them to the Web with
associated style sheets and people will view them (but not until 5.0
browsers come out).
There are major advantages for using XML files for data storage and
especialy for data interchange. XML formats are basically self-documenting
and are meant to be both human and machine-readable. That means that you
will be able to read the file in 10 or 20 years, unlike most other data
structures. You can read and write a valid XML file from any
XML-generating application, so you aren't locked into a single program with
a proprietary file format. It's based on Unicode, so it's not limited to
Western languages. While XML is not very efficient for database access,
database programs can read and write XML files very easily.
For more information, see , the FAQs at
, and the
news page at .
Hope that helps!
Avi
________________________________________________________________
Avi Rappoport, Web Site Search Tools Maven:
Guide to Site Indexing and Local Search Engines:
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jborden at mediaone.net Thu Nov 5 18:21:03 1998
From: jborden at mediaone.net (Borden, Jonathan)
Date: Mon Jun 7 17:06:16 2004
Subject: XML Search Engine
In-Reply-To: <3.0.32.19981105094725.009209b0@pop.intergate.bc.ca>
Message-ID: <005901be08e8$e1c63210$d3228018@jabr.ne.mediaone.net>
As you say Word/Character proximity searching is not that interesting, and
if this is desired, XML doesn't have much to add to the current equation.
On the other hand grove based proximity search techniques have also been
used since the 1970's when this was called a "semantic network". the
advantage is that it is language independent. To date, this hasn't been
terribly useful with HTML as not many people care about indexing