And with this one, it works (just because P contains PCDATA in its content !):
[Wodrich, Markus] That doesn't work, it is mixed content, I think you mean .
[Wodrich, Markus] I don't have the answer for your question, but when you run MSXML 1.6 with java msxml -d1 example.xml
[Wodrich, Markus] you get this:
[Wodrich, Markus] DOCUMENT
|---XMLDECL VERSION="1.0"
|---WHITESPACE 0xd 0xa
|---DOCTYPE NAME="EXAMPLE"
| |---WHITESPACE 0xd 0xa
| |---ELEMENTDECL EXAMPLE (P)+
| |---WHITESPACE 0xd 0xa
| |---ELEMENTDECL P (#PCDATA|S)*
| |---COMMENT --
| | +---CDATA " <<= here "
| |---WHITESPACE 0xd 0xa
| |---ELEMENTDECL S (#PCDATA)*
| |---WHITESPACE 0xd 0xa
| |---COMMENT --
| | +---CDATA " ENTITY incs SYSTEM "inc-s.xml" "
| |---WHITESPACE 0xd 0xa
| +---INTENTITYDCL incs
| +---ELEMENT S
| |---PCDATA "A third in a new paragraph."
| +---PCDATA "A third in a new paragraph." <------------Here!!
|---WHITESPACE 0xd 0xa
|---ELEMENT EXAMPLE
| |---WHITESPACE 0xd 0xa
| |---ELEMENT P
| | |---ELEMENT S
| | | +---PCDATA "A sentence."
| | +---ELEMENT S
| | +---PCDATA "An another."
| |---WHITESPACE 0xd 0xa
| |---ELEMENT P
| | +---ENTITYREF incs "A third in a new paragraph. A third in a new paragraph." <----- and here too!!
| +---WHITESPACE 0xd 0xa
+---WHITESPACE 0xd 0xa
The ENTITY is doubled.
Markus
A third in a new paragraph.">
]>
A sentence. An another.
&incs;
Is there something broken in the msxml kingdom ?
Pat.
--
==============================================================
bonhomme@loria.fr | Office : B.228
http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37
--------------------------------------------------------------
* Projet Aquarelle : http://aqua.inria.fr
* Serveur Silfide : http://www.loria.fr/Projet/Silfide
==============================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Mon Dec 1 01:01:07 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:12 2004
Subject: Well-formedness checker available
Message-ID: <3482061D.ECF5ECD5@jclark.com>
I've enhanced my XML tokenizer to support multiple encodings and to
provide enough functionality that it can be used as the basis of high
performance full XML processors. As a proof of this, I've written a
well-formedness checker (xmlwf) on top of the tokenizer.
The main design goal was performance. On my portable (a 133Mhz Pentium
running Windows NT), it can check Jon's 3.7Mb ot.xml file in about
0.5sec (this compares to about 8sec for nsgmlsu and about 2sec for RXP
on the same system). It seems to be about 15% slower than the original
tokenizer. On the other hand, the size of the source and object code has
increased a lot. The source has also got a lot hairier.
The source code (in ANSI C) and Win32 binaries are available at:
ftp://ftp.jclark.com/pub/test/xmltok.zip
This is an alpha release. The only documentation is what you're reading
now.
To use the well-formedness checker, just give xmlwf one or more
filenames, and it will check that each one is a well-formed XML document
entity. There's a -g option which tells it to check instead that each
file is a well-formed XML external general text entity.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Mon Dec 1 01:01:39 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:12 2004
Subject: Test cases available
Message-ID: <34820B05.173C152F@jclark.com>
I've made available a collection of XML test cases at
ftp://ftp.jclark.com/pub/test/xmltest.zip
This contains 141 small files that (in my view) fail to be well-formed
XML documents, and should therefore cause any conforming XML processor
to report a fatal error.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Mon Dec 1 03:55:58 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:12 2004
Subject: Revelling parser writers (was Rebelling)
Message-ID: <34822CE5.C30933B9@technologist.com>
> Some people seem to use 'processor' to mean an XML parser. Others
> seem to use 'processor' as a piece of software 'after' the parser.
I do not think that the latter people have a basis in the XML
standard.
> I think some
> people use 'parser' to mean a piece of software that reads in an XML
> document (and associated components and transforms them into some
> other information structure or sets of actions. the 'Parsers' at
> present appear to be able to emit event Streams and/or build trees.
I think that most software developers would build trees *from* the
event stream. This separation allows you to plug in another parser
(reader/event generator) without changing your tree-building software.
Maybe I'm just extrapolating incorrectly from SP's design and my
design of my own systems.
In Jade, there is a parser (SP) that outputs events that are read by a
grovebuilder (GroveBuilder.cxx) that serves as the source grove for a
DSSSL process. My PyGrove uses the same system.
> >Building a grove is not the job of a
> ^^^^^^^^^^^^^^^^^
> >parser. Typically the parser outputs the events and some other process
> >builds the grove from the information. The only way a parser could be
> >not written to create groves is if the parser did not output sufficient
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> Is there a difference between 'build' and 'create'? I don't understand how
> a parser can 'not build a grove' and 'be not written to create groves'.
That tortuous prose is my attempt to integrate your text about parsers
being "not written to create a grove." The only way I could imagine a
parser being unfit to create a grove is if it did not output enough
information for the grovebuilder to do so.
> Earlier on XML-DEV we discussed at length what the API to a 'parser' (or
> was it a 'processor') was. I thought that this could have included building
> a grove.
I think that the grovebuilder would be a *client* of the parser API.
Then
it could build groves from (e.g.) XML or full SGML or even something
else,
as long as the various parsers exported the same API.
> If I rephrase my statement as 'no-one has written any XML-based software
> which interfaces with the current crop of (mainly java-based) parsers to
> generate groves'.
This statement makes more sense to me than your previous one.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Mon Dec 1 06:02:01 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:12 2004
Subject: Response to Simon St.L. on Entities v. XLL
Message-ID: <199712010601.RAA09705@jawa.chilli.net.au>
> From: Peter Murray-Rust
> XML(SGML) entities (NOTATION) have traditionally used PUBLIC and FPIs
> (Formal Public Identifier) for adding type information. This works if there
> is a registry of FPIs for this purpose. Without it is not much use.
(Peter Flynn had such a registry of FPIs this year.)
> My
> impression - and I'm happy to be corrected - is that there are few useful
> FPIs for Typing objects.
...
> As yet, MIME is not part of the XLL mechanism. I wish it was, and keep
> squeaking for it. If it isn't I suggest we use XDEV:MIME as a FUA
> 'frequently used attribute' in XML-LINKs.
You can make up your own FPIs *now* for all MIME types using the following
pattern.
The important thing about an FPI is that it does not have to be syntactically
correct to work, unlike a system identifier. OF course, getting an agreed
on form will be best.
It is interesting to note that I found it very difficult to find the
official site for the RFC. There is nothing I could find at IETF, IANA,
Internet Sosiety, and searching on websites did not help. For people
in this situation, they can use FPIs with, for example:
FPIs are a great idea because they do not have to correspond to
anything in a fixed location: they can be descriptive.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Mon Dec 1 07:51:31 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:12 2004
Subject: MONDO: Partitioned Design document and New Addition
Message-ID:
I fixed the MONDO Design document to have a central web page that allows
you to look at the TOC and download subsets of the document. The page is at:
http://www.chimu.com/projects/mondo/design/index.html
All the links are now directed here but the direct "mondoDesign.pdf" file
is in the same location and still works.
The PDF file has also been broken into smaller subsets of approximately a
chapter or two. Each of the subsets is ~70K instead of the full 400K
document. My apologies for not doing this the first time, especially to
anyone who had problems with downloading a single large file.
The Design Document web page also contains an "additions" section which
will have new document sections that have not yet been integrated into
the main version. This is to keep the main document from having Chapters
changing every few days and to make new additions more visible.
The newest addition and its first paragraph is:
Modeling and Implementing, Objects and Recipes
----------------------------------------------
Recipes describe how to build knowledge though creating objects. An
important aspect to working with knowledge is to be able to model it. So
far, we have assumed the model of our knowledge preexists and only exists
in the ObjectBase's DomainModel. Our other choice is to explicitly
describe the Model outside of the ObjectBase and then configure the
ObjectBase based on that model. With this approach, information will
describe its own model (or models) and we will be provided with a lot
more capabilities to automatically and universally understand that
information.
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Mon Dec 1 10:37:53 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:12 2004
Subject: The MONDO Approach: Introduction
Message-ID:
I realize the MONDO Design document is a bit difficult to digest in toto,
so I thought I might try to produce short examples of MONDO's approach to
particular problems that have been brought up on XML-Dev, c.t.sgml, JXML,
java forums, or other related areas. This may help people to see where
MONDO is different, useful, or flawed compared to other approaches -- and
to have a particular topic to comment on instead of a whole (approaching
100 page) document.
The most important word in the above paragraph was "short". I will try
to be very brief: 1/2 - 3 pages. I have difficulty with this type of
brevity (i.e. I hate leaving out details), but I will try very hard and I
do have another outlet for more details: the Design Document and its
additions. This brevity means that the approach statements will not
really explain anything in detail, especially not the whys.
This does not mean I think the problems are trivial or the solutions
easy to understand on their own (MONDO is simple at its core but complex
in its implications). The fuller description of the problem will come
from previous or subsequent discussions, and the MONDO solution is (or
will be) more fully explained in the Design Document, the interfaces, or
the code.
The brevity and the "emailness" of these approach statements also
ensures I will not include any diagrams. I love diagrams and I think I
produce pretty informative ones. Please look at the relevant (usually
referenced) portion of the Design document to check for diagrams that may
help explain how MONDO is thinking.
Most of the approach statements will be pattern-ish. A Title, A
Problem, An Approach, and Tradeoffs/Comments. Because the statements are
so short they will not really be patterns (and certainly not good ones),
but I thought I would mention the structure.
I was planning on posting all of these to XML-Dev & JXML, and some of
them to advanced-java. I am currently undecided about c.t.sgml. If
anyone has suggestions about this ("not here" or "maybe there") let me know.
==================
MONDO is a general architecture for encoding, modeling, and processing
information. MONDO is especially designed for building information from
human-readable text files and then doing sophisticated interactions with
that information. Its first reference implementation is in Java, which
will be released shortly. More information about MONDO can be found at
the main WWW site:
http://www.chimu.com/projects/mondo/
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Mon Dec 1 11:15:20 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 16:59:12 2004
Subject: Data warehousing and XML
Message-ID: <199712011115.LAA25449@mail.iol.ie>
I have read a number of articles about Data Warehousing and I *think* I know
what it is but
I have yet to come accross any technical info about how to implement in. On
the face of it though,
it looks like an interesting potential app. for XML.
As I (mis)understand it, you shovel all your corporate data from a variety
of sources (sales, purchasing,
production, memos, R&D etc.) into one humongous repository of data with a
view to asking the seething
mass of data questions that benefit from the totality of information in the
repository. Prior to
putting the stuff there it is "cleaned up". Presumably harmonised into a
homogenous format of some
format. Sure sounds like XML + related standards (specifically SDQL) to me.
Sean Mc Grath
sean at digitome dot com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Mon Dec 1 12:21:41 1997
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 16:59:12 2004
Subject: Data warehousing and XML
Message-ID: <01bcfe53$7a24bd20$1e09e391@mhklaptop.bra01.icl.co.uk>
-----Original Message-----
From: Sean Mc Grath
To: xml-dev@ic.ac.uk
Date: 01 December 1997 11:16
Subject: Data warehousing and XML
>I have read a number of articles about Data Warehousing...
>it looks like an interesting potential app. for XML.
>
>As I (mis)understand it, you shovel all your corporate data from a variety
>of sources (sales, purchasing,
>production, memos, R&D etc.) into one humongous repository of data ...
Firstly, I think XML has some work to do if it is to acquire acceptance in
the database community: in particular, someone needs to show how its
underlying data model relates to models like UML used in the database
world; one would also like to see how a DTD can be translated to/from an
ODL schema. The fact that XML uses terms like "entity" and "attribute" with
completely different meanings from UML or ODMG doesn't help.
Secondly, I think the "humongous repository" concept in data warehousing
(sometimes ridiculed as the "data whorehouse") is going out of fashion.
The modern approach is usually much more focused. In fact, the data
warehouse concept has never really embraced documentary information
like memos or research reports: it's all about old-fashioned "data".
I do agree that in principle XML provides a good representation of data that
is in transit between heterogeneous databases. One drawback is that it
provides
far more features than are required for this purpose, so people may go for
simpler
encodings.
Mike Kay, ICL
M.H.Kay@eng.icl.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Mon Dec 1 13:56:48 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:12 2004
Subject: Data warehousing and XML
References: <199712011115.LAA25449@mail.iol.ie>
Message-ID: <3482C2EE.3F606F14@technologist.com>
This is probably more
Sean Mc Grath wrote:
> Prior to putting the stuff there it is "cleaned up". Presumably
> harmonised into a homogenous format of some format.
Database people typically do not store their information in any explicit
format. The database handles the representation. Data warehouses are the
same. I don't think that data warehouses are any more or less amenable
to XML than any other relational database.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Mon Dec 1 14:17:13 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:13 2004
Subject: The MONDO Approach to: Describing the Model of Information
Message-ID:
The MONDO Approach to: Describing the Model of Information
Problem
=======
How do we describe the model we are using for our information?
Models always exist. In MONDO, the information model is represented by
the DomainModel and must exist (no matter how simple) for any
ObjectBase. Their is always an implicit model. We can also provide a
way to explicitly describe models and have MONDO use those models to
understand information better and to validate whether it is being
constructed (i.e. by recipes) correctly. Explicity models can also
provide a common human-oriented exchange format that is known to be MONDO
understandable and verifiable (i.e. it at least makes sense to MONDO).
Forces
======
Relying on implicit models provides flexibility in describing and
implementing the model, but directly affords no common description and
reuse. Providing explicit models makes sharing models easier and allows
the information to describe itself, but could be limiting in how
information is used.
If we do use explicit models we have the choice between using the same
form as all other information (i.e. recipes) or a different form that is
designed especially for models.
MONDO Approach
==============
MONDO allows both implicit and explicit modeling of information depending
on what the producer of the information wants to describe and what the
consumer of the information would like to use. Explicit models are
marked up in the same format (i.e. recipes) as all other types of
information and the resulting model is simply an organized set of objects
that describe another set of objects (the instances of that model).
An example model for:
end =
>
might look like:
>
>
>
>
)
constructors = (
)
>
>
>
)
constructors = (
)
>
)>
Note that the model does not describe implementation in any way, just the
expected Types, properties, associations (non above), behavior (e.g.
constructors), and other externally visible features of an object.
To associate a model with an instance we are just relating objects. We
can do it explicitly (and singularly) in the same recipe file:
//...
Model> UseModel>
end =
>
In two files but still with a single default interpretation:
end =
>
Or in three+ files which allows multiple interpretations of the model to
use with the recipe:
Because models are just objects we can also retrieve them by reference
instead of direct recipe construction:
>
>
MONDO supports Models as simply the same as any other type of
information: objects. The only difference is their role toward other
objects.
Benefits & Penalties
====================
Allowing both implicit and explicit models provides flexibility. The
only tradeoff that can occur is that people assume an implicit model is
OK when it would be better to make the model explicit. Other forces than
technology should drive this choice.
There are very few drawbacks and a great number of benefits by having the
model in the same format as all other information. It allows the two
core concepts (recipes and objects) to be leverage to understanding new
facilities. The new facilities can benefit from all the functionality of
objects and recipes (e.g. references, encoding formats, type vs. class
separation, properties and all other normal object abilities). And
because we have complete closure we can then implement and model the
model itself in the exact same terms (recipes and objects).
Because the models are objects, the models can be arbitrarily
sophisticated and take advantage of subtyping. New modeling refinements
can be extension of existing techniques. This avoids closed-end modeling
limitations (e.g. DTDs) while still having backward capabilities. Also,
the model is for the resulting DomainObjects not the recipe itself (or
the parser) so it does not need to worry about, and will not constrain,
irrelevant details like the actual names of recipes (e.g. ""). The
model only cares about the Types of the resulting DomainObjects that are
built by the recipe.
Finally, the encoding can be the same for the model as for the objects.
This is important on a conceptual level (models are really, really the
same things but just have a special role) and on a lower level: users
only have to understand a single encoding (if they chose) and parsers can
be very simple.
The only drawback might be difficulty in encoding the model in the
standard MONDO recipe encoding formats. Generally this is probably not a
drawback. Recipes allow flexibility that can be very useful for modeling
and the encoding formats can be quite concise (plus they are inherently
self-describing which is helpful for learning them).
See
===
The MONDO Design addition on "Modeling and Implementing, Objects and
Recipes".
http://www.chimu.com/projects/mondo/design/index.html#additions
Classes-as-objects is part of Smalltalk, CLOS, and (in some form) many
other interesting languages (e.g. SELF). Generally this meta-object
capability provides a great deal of power and relative simplicity. For
some references, see the OO sections of:
http://www.chimu.com/projects/mondo/links.html
SGML DTDs are somewhat the other extreme as MONDO models in encoding (very
limited), but they can still be treated as creating document-oriented
Model objects through a different encoding format. [How SGML treats DTDs
is quite different (they constrain the parser, recipe, and the model
stage) but that is a different topic.]
The XML-Data model has similarities to how MONDO works with models. And
the approach of XML-Data and representing models as instances was
discussed in:
http://www.lists.ic.ac.uk/hypermail/xml-dev/
(Search for XML-Data and/or DTD)
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Mon Dec 1 14:20:39 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:13 2004
Subject: The MONDO Approach to: Extending Model Functionality
Message-ID:
The MONDO Approach to: Extending Model Functionality
Problem
=======
How can we extend the functionality of our information and information
model without becoming language specific?
Although information can be interpreted and implemented in many ways,
frequently we will want to provide possible implementations so
applications can automatically extend their capabilities in interesting
new ways.
Forces
======
If we put the implementation into the information we will make the
information less general. If we provide no implementation (when we have
one) we make the information less knowledgeable and capable than we could
have.
MONDO Approach
==============
Describe implementation details in the same knowledge form as all our
other information and loosely associate/link Classes to Types through
possible Implementations.
We can represent a Java class as (the MONDO recipe in OML):
>
This is readable to both Java and non-Java systems. A non-Java system
may not understand the bytecodes, but it can understand everything else
and work with the information usefully.
Next we can associate this class with a particular Type in our model.
language = "Java"
class =
>
The loose association is (relatively) complete, and a particular program
can decide whether it can use and wants to use a Java implementation of
the Type Period. It can also check whether the VM level is acceptable.
We can similarly provide a Smalltalk or ".dll" implementation (assuming
we can move the ".dll" around).
None of this had any effect on our original instance and model:
end =
>
>
>
)
constructors = (
)
>
//...
)>
So they still describe "pure" general knowledge and we can still use them
independently of all the language-localized implementations.
Tradeoffs
=========
Generally, it is a win-win situation. Implementation can be reasoned
about and chosen without directly coupling it into the information
itself. The architecture itself is also no more complicated, but only
has new objects and classes to represent implementation information. A
negative might be the added complexity of the:
instance--[interpretation]--model--[implementation]--classes
associations, but the complexity can be selected as needed. Another
negative might be the required based functionality of an ObjectBuilder,
but generally ObjectBuilders must be able to simply model (e.g. as an
object with properties) anything they can not understand in more detail.
See
===
The MONDO Design addition on "Modeling and Implementing, Objects and
Recipes".
http://www.chimu.com/projects/mondo/design/index.html#additions
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Mon Dec 1 14:23:17 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:13 2004
Subject: The MONDO Approach to: Language Independent Object Serialization
Message-ID:
The MONDO Approach to: Language Independent Object Serialization
Problem
=======
How do we serialize Objects so we can later read them back into a program
of either the same language or a different language? Also, how do we
allow humans to easily create, read, and modify these objects (i.e.
include human languages)?
It is common to need a simple way to save a web of objects and later read
them back in to the same program or a different one. Sometimes the
programs will be of the same language (e.g. both Java) and sometimes they
will be different. The later is much more complicated, especially in its
most general form (any object to any language). A variation on the
inter-language movement is from human writable language (through text) to
a computer language. Usually this is restricted to very simple
information (e.g. String properties) or document-oriented information
(e.g. HTML/SGML).
Tradeoffs
=========
If we do not have a single interchange format we will have multiple ones
and the complexity will be higher. If we can not describe objects in a
cross-language format than languages will be unable to interoperate with
this mechanism. If we can not describe object information in
human-readable formats than humans will be less likely to understand the
process and will be unable to participate in the general capabilities
(e.g. they can only work with simple property files).
A single, general, object interchange approach would allow all movements
of objects to be easier to both computers and people. On the other hand,
if we try to design a general approach that becomes too cumbersome it
will not be useful to the many common needs of applications (i.e. same
language serialization and simple property files).
MONDO Approach
==============
Encode information as "recipes" to build objects. Describe the most
general information first: what to build and what "ingredients" (recipes
for other objects) it needs. Next describe the language independent
model of that information. Finally describe the possible implementations
for that model in different languages. Any of these steps (other than
the first) can be left off but it results in less ability to move between
languages.
Also, enable "recipes" to be easily convertible to a human readable and
writeable form: usually as marked-up text files in any one of
XML/SGML/OML (the last being oriented to objects and MONDO, an Object
Markup Language similar to XML).
Some simple examples of recipes (in OML and no models yet) are:
----------------------
end =
>
----------------------
)>
----------------------
>
----------------------
>
) !Recipe>
>
>
>
//...
>
----------------------
,
the company's President. The following is a summary
written by Luke on what he views as the key elements
to VTL's success. } P>
=======================
All of the above, when by themselves, rely on the reading and "building"
application to interpret and implement the information model. This might
be suitable for language specific encoding of information or when the
model is very standard.
But we can also explicitly add model information, for example:
>
>
)
constructors = (
)
>
//...
)>
And loosely[1] link it to the actual information:
end =
>
Alternatively we can use well-known models, which allow wider interchange
without moving recipes:
>
>
>
>
================
The next step is to be able to connect specific implementations to the
models, but this is covered in a different MONDO approach statement:
"Extending Model Functionality"
Benefits
========
We have a very general way to encode information so multiple
applications, programming environments, and people can understand it. It
is simple to parse and process for a computer and for a person. We have
also cleanly separated the information from the specifics required to
instantiate that information in any given programming environment. But
we can encode general models for the information in the same format
(complete reflectivity and closure) and take advantage of them if the
application desires.
We can take advantage of sharing models and information through public
references to objects/recipes or by "shipping" recipes along with the
information. This supports both a push and a pull model of moving
information, models, and implementation between applications.
Inter-language movement is inherently supported, as well as inter-version
(e.g. JDK version) movement. The choice is up to the application how
well the information is described vs. how much the receiving application
will be responsible for interpretation.
General document markup, objects, and information modeling are aligned
and we can take advantage of the concepts, patterns, designs, and
abilities of all of them.
Penalties
=========
MONDO is a newer approach that is different from industry specific and
language-specific approaches. More overhead than language specific
binary approaches [this could be lessened or removed by using binary
recipe encoding format]. Possibly slightly more overhead than simple
text property files, but is much more general.
See
===
The MONDO Design document, an especially Chapters 1-5.
http://www.chimu.com/projects/mondo/design/index.html
For related technology see the Smalltalk file-in format, LISP, Java
serialization, CORBA and the ODMG OIF specification. Some resources for
these can be found at:
http://www.chimu.com/projects/mondo/links.html
Notes
=====
[1] Actually, we can be even looser than that using a separate
"interpretation" file.
[2] I ran over 3 pages by a couple paragraphs :-( I guess the DSSSL
example pushed me over.
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Mon Dec 1 14:43:25 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:13 2004
Subject: Notation System Identifiers (was Re: Response to Simon St.L. on Entities v. XLL)
In-Reply-To: <199712010601.RAA09705@jawa.chilli.net.au>
References: <199712010601.RAA09705@jawa.chilli.net.au>
Message-ID: <199712011444.JAA00575@unready.microstar.com>
Rick Jelliffe writes:
> You can make up your own FPIs *now* for all MIME types using the following
> pattern.
>
> PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION
> Multipurpose Internet Mail Extensions::video/mpeg//EN">
Yes, though this is not a well-formed XML notation declaration without
a system identifier.
On that point, I am still troubled about what to do with system
identifiers for notations. WD-xml-971117 states that a system
identifier is "a URL, which may be used to retrieve the entity"
(sect.4.3.2), but we are not dealing with an entity here. Later on,
when describing notations, the draft states that
XML processors ... may additionally resolve the external identifier
into the system identifier, file name, or other information needed
to allow the application to call a processor for data in the
notation described.
(sect.4.7)
It would seem to me that a MIME type would make more sense than a URL
for the system identifier of notations, but that would introduce an
inconsistency into the external-identifier scheme. I imagine that
this topic has already been beaten to death in the WG.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Mon Dec 1 15:06:33 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:13 2004
Subject: Notation System Identifiers (was Re: Response to Simon
St.L. on Entities v. XLL)
Message-ID: <3.0.32.19971201070724.00955930@pop.intergate.bc.ca>
At 09:44 AM 01/12/97 -0500, David Megginson wrote:
>Yes, though this is not a well-formed XML notation declaration without
>a system identifier.
No longer; the WG just voted to allow PUBLIC without SYSTEM, specifically
and only for
>From a DOM perspective, EMBEDded material will almost certainly not be
>considered part of the document tree containing the EMBED element.
I very much look forward to seeing what the DOM does (or doesn't do) with the
EMBEDded material. But is this an issue for the DOM in particular, or should
the XML-Link spec give clearer direction about the nature of EMBEDded
material? Especially as some of the replies so far have said that an
application _could_ include the EMBEDded material in the document tree _if_
the developer so chose - which opens the door to multiple interpretations in a
large way.
And, of course, I can think of a considerable number of applications where it
might be useful to be apply to apply the DOM to EMBEDded content without
having to cope with a separate document tree.
Sounds like fun. For the applications I'm proposing, I'd like them in the
document tree, but of course that isn't appropriate for many situations. I'd
really rather not see this prohibited, either - it would chop off an entire
branch of XML development I'm working on. Could be the price of progress.
We'll see.
I guess what I'd love to see is another XML-Link attribute specifying whether
to include an EMBED in the document tree or not - it seems to be the central
issue around which this discussion has focused. Failing that, I'll look into
Peter's proposals for XDEV, since they seem to address the challenges of
multiple application behaviors directly - if they get implemented by
application developers, of course.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Mon Dec 1 16:03:21 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:13 2004
Subject: Data warehousing and XML
Message-ID:
>Database people typically do not store their information in any explicit
>format. The database handles the representation. Data warehouses are the
>same. I don't think that data warehouses are any more or less amenable
>to XML than any other relational database.
This may be writing it off too quickly; I think the great advantage of XML for
a data warehouse would be its ability to ease the inclusion of non-relational
data. A data warehouse that was capable of dealing with information in
multiple formats might well take advantage of XML for storing data that wasn't
necessarily in a table. Data warehousing meets document management: love
match or endless feud?
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tallen at sonic.net Mon Dec 1 16:26:39 1997
From: tallen at sonic.net (Terry Allen)
Date: Mon Jun 7 16:59:13 2004
Subject: Proper use of FPI name spaces
Message-ID: <199712011626.IAA07954@bolt.sonic.net>
Rick Jelliffe wrote:
|
| You can make up your own FPIs *now* for all MIME types using the following
| pattern.
|
|
Really, only the owners of the name space denoted by
"IDN ds.internic.net" should be assigning such FPIs. It will not do
for just anybody to be assigning names in someone else's name space.
In your own name space you could name something belonging to someone
else (unless legal issues prevent), but that's different.
Regards,
Terry Allen Electronic Publishing Consultant tallen[at]sonic.net
http://www.sonic.net/~tallen/
Davenport and DocBook: http://www.ora.com/davenport/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From RMcDouga at JetForm.com Mon Dec 1 16:27:14 1997
From: RMcDouga at JetForm.com (Rob McDougall)
Date: Mon Jun 7 16:59:13 2004
Subject: EMBED and validation
Message-ID:
I'm new to XML but this doesn't seem to accomplish what I would be
looking for as an "include" capability.
Let's say I have a markup language (let's call if RML, "Rob's Markup
Language"). I create a DTD for it and post it to my public web site.
All users of RML put the URL for the DTD in the declaration.
So far so good?
Now, if one particular user of RML notices that there's a section that's
common across every one of their RML documents, they might wish to
seperate it out into a distinct file and insert a link to it. This
common piece is not a complete document unto itself so it cannot be
validated, yet the user may wish to have the documents that include make
sure that it is valid within the context that it was embedded. Since
this particular file is unique to this user and not all RML users, it
does not belong in the commono DTD. This would seem to make an external
text entity undesireable for this case.
Is this correct, or am I missing something? Is there any other way to
accomplish this using the current XML/XLL specs?
Rob
=======================================================
Rob McDougall Phone: (613)751-4800 ext.5232
JetForm Corporation Fax: (613)751-4864
http://www.jetform.com mailto:rmcdouga@jetform.com
=======================================================
>-----Original Message-----
>From: Eve L. Maler [SMTP:elm@arbortext.com]
>Sent: November 29, 1997 10:09 AM
>To: Peter Murray-Rust
>Cc: xml-dev@ic.ac.uk
>Subject: RE: EMBED and validation
>
>
>I don't think I've seen it explicitly suggested here, so here goes. If you
>want to ensure that what's pointed to is real XML, and "belongs" in that
>location, how about using a plain old external text entity? With a
>validating XML processor, you can guarantee that (a) the entity will be
>expanded in place before it even gets to the application and that (b) it
>will be validated in context.
>
> Eve
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From elm at arbortext.com Mon Dec 1 17:11:26 1997
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun 7 16:59:13 2004
Subject: EMBED and validation
Message-ID: <3.0.32.19971201120455.009ca100@village.doctools.com>
There is a way to handle this using external text entities.
The DTD for any one document is really made up of two parts (if they both
exist): the external subset and the internal subset. Most people tend to
think of the external subset "the DTD" and think of the internal subset as
"the place where I supply my own common text, graphics, etc." However, if
you want to create your own set of text entities and put them in the
internal subsets of only the documents that you own, you've effectively
made a local modification to the DTD.
(There hasn't been a formal way to distinguish between "harmless" and
"harmful" DTD modifications, and of course different people might draw the
line in different places. In interchange of SGML today, typically it's
acceptable to provide general entity declarations but not element/attribute
declarations; put another way, the "markup model" isn't supposed to be
changed by means of the internal subset.)
Eve
At 11:22 AM 12/1/97 -0500, Rob McDougall wrote:
>I'm new to XML but this doesn't seem to accomplish what I would be
>looking for as an "include" capability.
>
>Let's say I have a markup language (let's call if RML, "Rob's Markup
>Language"). I create a DTD for it and post it to my public web site.
>All users of RML put the URL for the DTD in the declaration.
>So far so good?
>
>Now, if one particular user of RML notices that there's a section that's
>common across every one of their RML documents, they might wish to
>seperate it out into a distinct file and insert a link to it. This
>common piece is not a complete document unto itself so it cannot be
>validated, yet the user may wish to have the documents that include make
>sure that it is valid within the context that it was embedded. Since
>this particular file is unique to this user and not all RML users, it
>does not belong in the commono DTD. This would seem to make an external
>text entity undesireable for this case.
>
>Is this correct, or am I missing something? Is there any other way to
>accomplish this using the current XML/XLL specs?
>
>Rob
>=======================================================
>Rob McDougall Phone: (613)751-4800 ext.5232
>JetForm Corporation Fax: (613)751-4864
>http://www.jetform.com mailto:rmcdouga@jetform.com
>=======================================================
>
>>-----Original Message-----
>>From: Eve L. Maler [SMTP:elm@arbortext.com]
>>Sent: November 29, 1997 10:09 AM
>>To: Peter Murray-Rust
>>Cc: xml-dev@ic.ac.uk
>>Subject: RE: EMBED and validation
>>
>>
>>I don't think I've seen it explicitly suggested here, so here goes. If you
>>want to ensure that what's pointed to is real XML, and "belongs" in that
>>location, how about using a plain old external text entity? With a
>>validating XML processor, you can guarantee that (a) the entity will be
>>expanded in place before it even gets to the application and that (b) it
>>will be validated in context.
>>
>> Eve
>>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eliot at isogen.com Mon Dec 1 17:31:21 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 16:59:13 2004
Subject: EMBED and validation
Message-ID: <3.0.32.19971201112637.00c9a84c@swbell.net>
At 03:36 PM 12/1/97 UT, Simon St.Laurent wrote:
>>From a DOM perspective, EMBEDded material will almost certainly not be
>>considered part of the document tree containing the EMBED element.
>
>I very much look forward to seeing what the DOM does (or doesn't do) with
the
>EMBEDded material. But is this an issue for the DOM in particular, or
should
>the XML-Link spec give clearer direction about the nature of EMBEDded
>material? Especially as some of the replies so far have said that an
>application _could_ include the EMBEDded material in the document tree _if_
>the developer so chose - which opens the door to multiple interpretations
in a
>large way.
XML (or SGML) data can be used in one of two ways:
1. Use by value (you get the data syntactically). This is what text
entities are for. A text entity is, by definition, part of the *character
string* of the document that references it. That means that the parser
parses it at the point of reference and it must be valid or well formed
(if the entire document is well formed). A document with a text entity
reference is identical, for parsing purposes, to a document with the
reference replaced by the entity's replacement text (note that in base
SGML ESIS, text entity references are not communicated by the parser).
2. Use by reference (you point to the data but don't get it syntactically).
This is what XML Link means by "EMBED" and what HyTime means by
"value reference". The referenced data is a separate, self-contained
object and the parser does not parse it at the point of reference (if
at all, as it may not be XML data). For use-by-reference, it is up to the
processing application to make sense of the reference, for example,
presenting a referenced image according to the active style settings or
presenting a referenced document as though it had occurred in line, or
providing an icon you can select to see the referenced thing.
As for "document trees" (groves), the initial result is *never* a single
tree containing the results of parsing two documents (if the thing used by
reference is another document). However, a processing application might
choose to construct a *new* tree that combines the two documents in some
way that makes sense *to the application*. For example, I've written
several instances of a program that takes a tree of subdocuments and
creates a single instance from them.
Note that making the distinction between use by value and use by reference
keeps separate the storage and logical organization of the data, so that
data can be organized into storage objects independently of how it might be
used logically by reference. For example, I might put all my chapters in a
single storage object (document entity) but use individual chapters by
reference (using element-level addressing). It's also important to keep in
mind that, for XML and SGML, a reference to a document entity is usually
taken as shorthand for reference to that document's root element (that is
the HyTime default, and I assume, the TEI default).
In HyTime's abstract processing model, use by reference is, by default,
transparent to processing applications because the HyTime engine redirects
the processing application to the data used by reference, making it look to
the processor as though there is but a single grove. However, under the
covers the groves are distinct and processors can ask to view them that
way. This is probably more sophistication than most XML processors (e.g.,
browsers) need provide, although more sophisticated browsers and hypertext
systems need this flexibility.
Cheers,
Eliot
--
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Mon Dec 1 17:51:04 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:59:13 2004
Subject: EMBED and validation
In-Reply-To:
(SimonStL@classic.msn.com)
Message-ID: <199712011755.MAA16132@geode.ora.com>
[Simon St. Laurent]
> This EMBED issue raises even more bizarre questions for styling -
> context-dependent styling could well be forced to adjust if EMBEDded
> material is considered part of the document tree. Taking this into
> account will be an interesting challenge that may force me to use
> some old-style CLASS attributes, but we'll see. CSS will have some
> problems, but they may be surmountable. XML styling hasn't exactly
> happened yet, but I hope the developers are keeping this in mind.
See "Problems with Dynamically Assembled Document Portions, and Some
Solutions", to be delivered at 8:30 am Wednesday in the Expert Track
at SGML/XML '97. This is *precisely* what Steve and I will be
discussing.
The paper will be published on the Web after the conference.
-Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Mon Dec 1 17:55:35 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:13 2004
Subject: Data warehousing and XML
References:
Message-ID: <3482FAC7.9DD9D6E4@technologist.com>
Simon St.Laurent wrote:
>
> This may be writing it off too quickly; I think the great advantage of XML for
> a data warehouse would be its ability to ease the inclusion of non-relational
> data.
I don't claim to be an expert, but my understanding was that the goal of
a data warehouse was to make a repository of information that could be
plumbed through essentially relational queries -- demographic
information, correlations between dates and so forth. I don't see the
benefit of including document data in this sort of repository. As
someone else mentioned, XML may well be a good transfer format for the
information to move it FROM the regular databases into the data
warehouse.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Mon Dec 1 18:31:50 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:13 2004
Subject: EMBED and validation
Message-ID:
>As for "document trees" (groves), the initial result is *never* a single
>tree containing the results of parsing two documents (if the thing used by
>reference is another document). However, a processing application might
>choose to construct a *new* tree that combines the two documents in some
>way that makes sense *to the application*. For example, I've written
>several instances of a program that takes a tree of subdocuments and
>creates a single instance from them.
Precisely. I'd like to be able to tell that application to create a single
instance from the trees (in whatever shape they arrive) under certain
circumstances. Having to guess whether an application will do so makes the
tools far less useful.
The use by reference/use by value distinction is important, but made painful
in practice by the fact that XML has an infinitely richer vocabulary for use
by reference than it does for use by value. Entities are extraordinarily
limited when compared to the rich possibilities XPointers open up, and I hope
that for many, though of course not all, uses entities (and notations as well)
will be effectively obsoleted.
Maybe Ted Nelson's right, and all this markup stuff gets in the way of proper
transclusion.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Mon Dec 1 19:04:18 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:13 2004
Subject: Proper use of FPI name spaces
Message-ID: <199712011903.GAA29097@jawa.chilli.net.au>
> From: Terry Allen
> Rick Jelliffe wrote:
> |
> | You can make up your own FPIs *now* for all MIME types using the following
> | pattern.
> |
> | | PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION
> | Multipurpose Internet Mail Extensions::video/mpeg//EN">
>
> Really, only the owners of the name space denoted by
> "IDN ds.internic.net" should be assigning such FPIs. It will not do
> for just anybody to be assigning names in someone else's name space.
>
> In your own name space you could name something belonging to someone
> else (unless legal issues prevent), but that's different.
In the general case, yes.
But in this case, where internic have placed the information in a fixed
place specifically to provide a long-term repository for the public,
and where SGML (i.e. WebSGML) provides a clear way to express the owner,
I don't have any qualms about intruding.
The FPI mechanism is supposed to stop identifier collisions, where two FPIs
mean the same thing. In this case, if they use the same FPI what
else can they be meaning except the same thing? If they decide to make
their own FPI, then everyone can switch to that (if it ever happens).
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From RMcDouga at JetForm.com Mon Dec 1 19:32:41 1997
From: RMcDouga at JetForm.com (Rob McDougall)
Date: Mon Jun 7 16:59:13 2004
Subject: Entity substition in non-validating parsers (was RE: EMBED and validation)
Message-ID:
Now I understand about external entities(or at least I think I do :) ).
This raises another question. In using the "C" version of the MSXML
processor, I've found that it ignores the DTD entirely (including any
entities that I've defined in either subset of the DTD). This is mildly
annoying because I'd like the speed of a non-validating processor, but
would like the convenience of doing entity substitution. To my mind the
two issues are not necessarily linked.
The XML spec only distinguishes between validating and non-validating
processors in terms of "report[ing] violations of the constraints
expressed ... in the DTD". It says nothing about ignoring/utilising any
non-"constraint" information contained in the DTD.
Is the way the MSXML "C" processor works correct? Can I expect that
sort of behaviour out of all non-validating processors? If so, can we
designate a third class of processors that process entity substitution,
but do not perform validation? It seems unfair that this is such an
"all or nothing" affair. It also seems to mean that we cannot do file
inclusion using the external entity method Eve outlined unless a
document is not only well-formed, but is also valid.
At the very least, can we add some words about entity substitution to
the "Conformance" section of the XML spec?
Rob
=======================================================
Rob McDougall Phone: (613)751-4800 ext.5232
JetForm Corporation Fax: (613)751-4864
http://www.jetform.com mailto:rmcdouga@jetform.com
=======================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tallen at sonic.net Mon Dec 1 19:34:03 1997
From: tallen at sonic.net (Terry Allen)
Date: Mon Jun 7 16:59:14 2004
Subject: Re FPIs for RFCs
Message-ID: <199712011933.LAA18465@bolt.sonic.net>
Rick Jelliffe wrote:
| > |
| > | You can make up your own FPIs *now* for all MIME types using the following
| > | pattern.
| > |
| > | | PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION
| > | Multipurpose Internet Mail Extensions::video/mpeg//EN">
| >
| > Really, only the owners of the name space denoted by
| > "IDN ds.internic.net" should be assigning such FPIs. It will not do
| > for just anybody to be assigning names in someone else's name space.
| >
| > In your own name space you could name something belonging to someone
| > else (unless legal issues prevent), but that's different.
|
| In the general case, yes.
|
| But in this case, where internic have placed the information in a fixed
| place specifically to provide a long-term repository for the public,
| and where SGML (i.e. WebSGML) provides a clear way to express the owner,
| I don't have any qualms about intruding.
It's still not your name space, and you shouldn't be assigning names
within it. Try that trick with some commercial company's DNS name
and you'll be hearing from their lawyers - properly. Qualms or not,
don't intrude.
You assume that there is only one way to construct a name space for
RFC 2046 using the IDN mechanism. Doesn't IDN cover only the "ds.internic.net"
part? You also assume that the IETF (not Internic) has resolved on
using that URL indefinitely. Have you asked them? Don't you think
you should be referring to the MPEG spec instead anyway? What happens
when RFC 2046 is made obsolete? All those are policy questions for
the IETF to resolve without having to deal with decisions *you* made.
You should also be aware of draft-ietf-urn-ietf-02.txt, "A URN Namespace
for IETF Documents", which while work in progress may result in the
formation of URNs for RFCs.
Terry Allen Electronic Publishing Consultant tallen[at]sonic.net
http://www.sonic.net/~tallen/
Davenport and DocBook: http://www.ora.com/davenport/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Mon Dec 1 20:00:34 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:14 2004
Subject: Entity substition in non-validating parsers (was RE: EMBED
and validation)
Message-ID: <3.0.32.19971201120212.009709e0@pop.intergate.bc.ca>
At 02:29 PM 01/12/97 -0500, Rob McDougall wrote:
>Now I understand about external entities(or at least I think I do :) ).
>This raises another question. In using the "C" version of the MSXML
>processor, I've found
First of all, at this point in history we should be charitable to the
various XML processors, which have had to track the successive versions
of the spec.
Having said that, as recently reported, we changed the rules to do
two things:
(a) simplify the internal subset (only simple PE's, which can be
ignored by a non-validating processor)
(b) require all conforming processors to use
> From: Terry Allen
> It's still not your name space, and you shouldn't be assigning names
> within it. Try that trick with some commercial company's DNS name
> and you'll be hearing from their lawyers - properly. Qualms or not,
> don't intrude.
On what grounds? "owner"ship in the ISO 9070 sense is not a property
right. Otherwise people could not use ISBN numbers in FPIs, for the same
reason. It is merely because there is no convenient noun for "person/
thing belonged to".
And in any case, I would not do it for private data, because it would be
rude. Constructing an FPI which reflects a public archive is not
rude, nor does it violate any ownership rights. (Do you have any legal
cases or laws that suggest otherwise? I would be interested to find
out more, since presumably the same thing would effect URLs and URNs.)
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tallen at sonic.net Mon Dec 1 20:53:52 1997
From: tallen at sonic.net (Terry Allen)
Date: Mon Jun 7 16:59:14 2004
Subject: Re FPIs for RFCs
Message-ID: <199712012053.MAA21601@bolt.sonic.net>
Rick Jelliffe wrote:
| No. IDN means Internet Domain Name. Internic, by establishing the archive
| there for public use, have de facto made it available for use in FPIs.
| I do not need their permission. They are the "owner" of the
| text ds.internic.net/rfc/rfc2046.txt. It is like using an ISBN number.
No, it's making up a new identifier in someone else's name space without
their permission. I can't make it any clearer than that: don't appropriate
the property of others. Neither Network Solutions, which owns
"ds.internic.net", nor the IETF (or IESG), which owns the document
known not as "ds.internic.net/rfc/rfc2046.txt" but as "RFC 2046", nor
ISO has given you permission to use this name space.
Network Solutions, Inc. (INTERNIC-DOM)
505 Huntmar Park Drive
Herndon, VA 20170
Domain Name: INTERNIC.NET
Administrative Contact:
Network Solutions, Inc. (HOSTMASTER) hostmaster@INTERNIC.NET
(703) 742-4777 (FAX) (703) 742-9552
| If it was private or non-archival material I would have used me as the
| "owner".
As you should in this case, too. It's *your* name for it, not one
they have decided on and agreed to use. You should have consideration
for the resolution burden you are placing on people you haven't
consulted.
| Of course it is better if there is one canonical correct version for all
| FPIs, but people often have to make up FPIs. For example, almost every
| use of ISBN in an FPI would not have been made by the author of the text
| of the book. This means that there can indeed be multiple similar versions
| of an FPI. The benefit of being able to describe vaguely rather than locate is
| a great thing for people putting systems together. (Of course, for XML,
| this is not such a good thing, if we are treating XML as a closed system.)
The decision to use an ISBN as the owner identifier within an FPI must
rest with the owner of the ISBN. That's who owns that name space.
| > It's still not your name space, and you shouldn't be assigning names
| > within it. Try that trick with some commercial company's DNS name
| > and you'll be hearing from their lawyers - properly. Qualms or not,
| > don't intrude.
|
| On what grounds? "owner"ship in the ISO 9070 sense is not a property
| right. Otherwise people could not use ISBN numbers in FPIs, for the same
| reason. It is merely because there is no convenient noun for "person/
| thing belonged to".
See above.
| And in any case, I would not do it for private data, because it would be
| rude. Constructing an FPI which reflects a public archive is not
| rude,
Sure it is. People are going to try to resolve that FPI by going to
ds.internic.net, and when it eventually fails they'll complain to
ds.internic.net, not to you. It's up to the IESG to decide what uses
of their name spaces they are willing to commit to for the long run.
| nor does it violate any ownership rights. (Do you have any legal
| cases or laws that suggest otherwise? I would be interested to find
| out more, since presumably the same thing would effect URLs and URNs.)
Read the URN drafts re name space ownership. No applicable case law
on URIs yet that I know of, but you've seen the outcome of suits over
DNS names.
You are asserting an ownership right you cannot back up. That's dangerous
for one's legal health. Referring to something by using its URL is one
thing, but using that URL to create a name that lies in someone else's
name space is another matter entirely.
Terry Allen Electronic Publishing Consultant tallen[at]sonic.net
http://www.sonic.net/~tallen/
Davenport and DocBook: http://www.ora.com/davenport/index.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Mon Dec 1 22:33:14 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:14 2004
Subject: XML and MONDO
In-Reply-To:
Message-ID:
I have had a couple negative reactions to posting the MONDO notes on
XML-Dev, so I will stop until I have a code release. I thought I would
explain the one issue: What does MONDO have to do with XML?
MONDO uses XML as an encoding format for recipes (recipes are similar to
an event stream but reduced to an even simpler core [e.g. like LISP]).
Internally, MONDO does not care about what the textual encoding is but
XML is the most likely to gain widespread acceptance for use with MONDO.
Now if MONDO uses XML, why were all my examples in a different markup
(which MONDO calls OML)? Simply for brevity and ability to directly show
what is happening (or so I thought). The transformation between OML and
XML is trivial (they both encode exactly the same
event-stream/data-structure as far as MONDO is concerned). This is
described in detail in Chapter 4 & 5 & ?11 (an appendix) of the MONDO
Design document. For example, two common examples are (in XML):
iso
What MONDO's Builder (can be considered a generalization of a
GroveBuilder) sees would be identical whether in the above or OML. This
is part of the benefits of its architecture.
Aspects of that overall architecture is what I was "promoting" and
soliciting feedback on, not OML itself. Again, I probably made the wrong
choice to put the examples in OML when posting to an XML group, so I
apologize for that.
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cbullard at hiwaay.net Tue Dec 2 00:16:29 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun 7 16:59:14 2004
Subject: Response to Simon St.L. on Entities v. XLL
References: <3.0.1.16.19971130122745.2ec75a98@pop3.demon.co.uk>
Message-ID: <34835315.1EBC@hiwaay.net>
Peter Murray-Rust wrote:
>
> At 21:29 30/11/97 +1100, Rick Jelliffe wrote:
> >
> >
> >* The SGML entity mechanism is based on having type information as part
> >of the declaration of the entity, not in the entity reference and not in the
> >entity itself.
>
> I am very interested in automatic Typing of information components and
> think that this will be a very active area for the XML community.
It is an active area for any community that must associate semantics
in interoperable frameworks. In SGML, this is the level
of interoperability that typically is not specified. The
issue of interoperable semantics is system implementation.
That is obscure in XML to me.
> XML(SGML) entities (NOTATION) have traditionally used PUBLIC and FPIs
> (Formal Public Identifier) for adding type information. This works if there
> is a registry of FPIs for this purpose. Without it is not much use.
Any framework that depends on registration to associate semantics
with markup based on FPI, URN, MS registry, etc. has the requirement
for maintenance. No discussion I've read of this subject assumes
otherwise. The discussion typically breakdown in the discussion of the
mechanism. In all the variants of SGML systems (eg, XML, HyTime, DSSSL,
etc)
different mechanisms are proposed and all have camps of adherents.
All of the systems have been shown to work, so, in essence,
picking one to implement has become an issue of economy and polity.
> My impression - and I'm happy to be corrected - is that there are few useful
> FPIs for Typing objects.
Hmm, please clarify? The usefulness of the FPI concept (any registry, i
guess)
is to ensure persistence. Is an FPI needed for a typing object (what is
a
typing object)?
> Using a SYSTEM Id is subject to the problem of permanence and uniqueness of
> URLs.
Is there a form of unique identifier that does not? Registry
systems are a level of indirection under a regime of authority
(e.g, who gets to declare, modify, delete, copy(distribute) unique
identifiers).
> >* The XLL mechanism (well, I should say the MIME mechanism really) is
> >based on the entity being self-identifying as to type (aided by
> >any additional attributes you like on the linking element).
>
> Unfortunately, not all targets of XLL HREFs will be self-identifying. This
> is true of local files and not-very-smart-servers.
Right. A question that arises is one of where maintenance
should occur. This is a systemic requirement of scope. What
is the scope of the system which uses the files? Local is
not a satisfying answer in a linktoAnythingAnywhere system.
If not LTAA, eg, a local Intranet is creating XML, XML
DTDs, other to be determined schema mechanisms, why should
they be restricted to the use of MIME typing? If not, where
should they maintain a registry that is both local to
the Intranet and useable by a larger Intranet.
SGML practice reveals the need to maintain communities
of DTDs that specify different document types being
created and destroyed within the same overall framework
of *processes* (eg, a business). What do you think
is the best way for both the data creators, maintainers,
in the process/production environment to do this?
> It is therefore useful
> for the author to be able to add MIME types to the target.
> As yet, MIME is not part of the XLL mechanism. I wish it was, and keep
> squeaking for it. If it isn't I suggest we use XDEV:MIME as a FUA
> 'frequently used attribute' in XML-LINKs.
MIME. Ok. The problem is that the registry type *is assumed* to
specify mechanism and authority of the registrar. If the
mechanism is adequate(depends on requirements; anyone have
functional requirements for XML?) then the only issue is the polity.
IOW, should an XLL link be constrained to one mechanism? Requirements?
The polity is an issue for XML of its origins in the W3C. For
SGML, that is ISO. Divergence on the decision of formal public
registries could have a chilling effect on the common community.
Len Bullard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cbullard at hiwaay.net Tue Dec 2 01:15:30 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun 7 16:59:14 2004
Subject: Data warehousing and XML
References: <3482FAC7.9DD9D6E4@technologist.com>
Message-ID: <348360F2.407B@hiwaay.net>
Paul Prescod wrote:
>
> Simon St.Laurent wrote:
> >
> > This may be writing it off too quickly; I think the great advantage of XML for
> > a data warehouse would be its ability to ease the inclusion of non-relational
> > data.
>
> I don't claim to be an expert, but my understanding was that the goal of
> a data warehouse was to make a repository of information that could be
> plumbed through essentially relational queries -- demographic
> information, correlations between dates and so forth. I don't see the
> benefit of including document data in this sort of repository. As
> someone else mentioned, XML may well be a good transfer format for the
> information to move it FROM the regular databases into the data
> warehouse.
Of course XML can be used to create non-ambiguous
transfer formats (data schlepping). But Paul,
a lot of the information that needs to be mined
is not in relational formats. Depending on the query
language and implementation, there is no reason one
cannot build an industrial strength data repository
over generalized markup. Some IETM applications
(eg, 87269, MID, etc.) are designed to do that.
Even with IADS some years ago, we had some primitive
capabilities for this although immature then. Probably
much improved now. In those designs it was always
assumed that the client language (eg, MID) was
essentially a navigation system over a set of
notations whose processors are known. It is also
assumed that the client language included a query
language or could call one. So, data warehouse may
be in need of further clarification. Applications
I work with have to have both document frameworks
and relational systems as well as all of the
ad hoc-inTransit data used to interface the
live data (sensor-derived) to the database that
is collecting and warehousing.
However, let me ask a technical
question that you can probably answer with a deeper
technical perspective than mine? How well can one query
data (or convert it for that matter) for which one
has no rigorous schema (of some kind)? (Note,
I consider a self-identifying type (eg, magic number)
to be a pre-validated file of the notation.)
len bullard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Tue Dec 2 02:34:26 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:14 2004
Subject: REQ: "spec.dtd" referenced in XML version of the XML Spec
Message-ID: <01bcfeca$5a1c1510$0100007f@localhost>
I am looking for the "spec.dtd" file referenced in the XML version of the
XML Spec. W3C site does not have a search engine (that I could find) and
nothing turned up in XML-DEV archive as well as by AltaVista.
I would appreciate its URL so I can add it to my "JStud's XML Example and
DTD Catalog" at:
http://www.quake.net/~donpark/xmlcat.html -- Sorry, I couldn't pass up the
opportunity ;-).
BTW, the catalog is coming along. There are lots of empty sections but I am
filling them up as I find them by search engines and by this mailing list.
My expectation is that it will be very useful in about one month and less
than useful by end of next year .
Have fun while it lasts,
Don "JStud" Park
Java/MFC Consultant
donpark@quake.net
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Tue Dec 2 03:06:01 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:14 2004
Subject: Response to Simon St.L. on Entities v. XLL
Message-ID:
> From: Peter Murray-Rust
>XML(SGML) entities (NOTATION) have traditionally used PUBLIC and FPIs
>(Formal Public Identifier) for adding type information. This works if there
>is a registry of FPIs for this purpose. Without it is not much use. My
>impression - and I'm happy to be corrected - is that there are few
>useful FPIs for Typing objects.
This is a real problem. Steve DeRose and I provided a list of PUBLIC
IDs for a variety of data formats in our book Making Hypermedia Work.
We did this because of the lack of FORMAL public identifiers. We based
our IDs on the ISBN of the book, so that we met the letter as well as
the spirit of the ISO rules. Some have actually used these IDs.
>Using a SYSTEM Id is subject to the problem of permanence and
>uniqueness of URLs.
The proposal to use MIME types for NOTATION system IDs failed, but you
could easily make a URL that contained the MIME type, and would thus
be easy to resolve for the knowledgeable, without actually following
the URL. The URL itself could even be a CGI script returning a page
describing the convention (and possbily the mime type), for those who
did resolve it.
for example:
http://ursus.demon.co.uk/~peter/cgi-bin/mime-script?text/application
could return a document saying "use the 'text/application' mime type
for the referenced entity".
>>* The XLL mechanism (well, I should say the MIME mechanism really) is
>>based on the entity being self-identifying as to type (aided by
>>any additional attributes you like on the linking element).
>
>Unfortunately, not all targets of XLL HREFs will be self-identifying. This
>is true of local files and not-very-smart-servers. It is therefore useful
>for the author to be able to add MIME types to the target.
>As yet, MIME is not part of the XLL mechanism. I wish it was, and keep
>squeaking for it. If it isn't I suggest we use XDEV:MIME as a FUA
>'frequently used attribute' in XML-LINKs.
I don't understand why (if you are putting the information in the
source document) you don't simply use NOTATION, which works very well
with XLL without the need to invent your own private attribute convention.
SGML entity declarations allow the association of a type with a
destination in the source document. Untyped XLL links should only be
used in cases (and they exist) where the HTTP MIME type information is
dependably available and thus preferable to static in-document declaraions.
-- David
------------------------------------------+----------------------------
David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/
| MAPA: mapping for the WWW
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Tue Dec 2 03:34:13 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:14 2004
Subject: EMBED and validation
Message-ID:
On Dec 1, 3:36pm, Simon St.Laurent wrote:
> Subject: RE: EMBED and validation
> >From a DOM perspective, EMBEDded material will almost certainly not be
> >considered part of the document tree containing the EMBED element.
>
> I very much look forward to seeing what the DOM does (or doesn't do) with the
> EMBEDded material. But is this an issue for the DOM in particular, or should
> the XML-Link spec give clearer direction about the nature of EMBEDded
> material? Especially as some of the replies so far have said that an
> application _could_ include the EMBEDded material in the document tree _if_
> the developer so chose - which opens the door to multiple interpretations in
a
> large way.
You're getting closer. The document, itself, contains no embedded material: it
contains an "EMBED" (quitation) link to other material. It will not be a
requirement for any XML application to do anything other than include the XLL
attributes in it output for applications that want them. This attribute
information is part of the proper domain of the DOM, as I can understand it.
XLL applications will be required to interpret the link as a _connection_
between two points, with "default semantics" of "include as quotation". Whether
than is most convenietly implemented boy combing document data structures, as
you and Peter assume, or by some other method that preserves two structures,
and renders them in a particular style is an application implementation
decision.
Regardless of that implementation strategy, XSL stylesheets for XLL constructs,
will have to specify how to choose display options for such links. Those
construct _will_ have to deal with the fact that XLL links _need not_ be to
"well-formed subtrees" of the linked-to document. This is of critical
inportance for implementing external markup and arbitrary quotation. So, any
implementations strategy that depends on WFST (Well Formed SubTrees) will fail
for some legal imnput documents. That's fine, depending on the goals and
limitations of the application.
It is concievable that XSL will not give any method for formatting non-WFST
EMBED links. That would also be OK, as people who require the more complex
linking can still create their own (more complex) applications.
However, purely in the form that you've asked the question (i.e. XML parsing
rules), there is no inherent relation between EMBED linking and document
validation -- and this is not an oversight, but a planned strategyu to enhance
the reusability of hyperdocuments in the same way that XML enhances the
reusability of single documents -- by late-binding all formatting and display
issues via a stylesheet or toerh form of processing specification.
The confusion over "application flexibility" is occuring because people are
used to early-binding models like HTML, where the format of documents is
explicitly encoded in the document. To judge application compatibility you
require not only the knowledge of the XML input, but the stylesheet language
(processing model) being apllied to the document.
I've seen nothing so far in the XDEV proposals that is not more properly an
issue for XSL.
One way to see that this flexibility is required to is to imagine in what sense
there can be interoperability between a web-mapping, or web-indexing
application and a browser display application. They might have very different
strategies for when to do many things (such as expand entities) and attach very
different semantics to those operations (a map might represent entities as a
special type of link, a browser would silently expand them, or perhaps use a
stretchtext view where the entities were buttons that would trigger textual
expansion when clicked). Formatters would selsect such options based on their
stylesheets. Analysis application might more often do the same thing via
hard-wired code or configuration files.
> And, of course, I can think of a considerable number of applications where it
> might be useful to be apply to apply the DOM to EMBEDded content without
> having to cope with a separate document tree.
That is fine, but that is a decision on processing model that, if taken, will
not handle certain legal XLL-linked documents... You can pick your processing
model, but then you have to live with the consequences.
> Sounds like fun. For the applications I'm proposing, I'd like them in the
> document tree, but of course that isn't appropriate for many situations. I'd
> really rather not see this prohibited, either - it would chop off an entire
> branch of XML development I'm working on. Could be the price of progress.
> We'll see.
I think the price will be some options in XSL (entity expansion rules) that may
seem mysterious at first and second glance, but will enable much more
sophisticated (and controllable) hypertext interaction.
> I guess what I'd love to see is another XML-Link attribute specifying whether
> to include an EMBED in the document tree or not - it seems to be the central
> issue around which this discussion has focused. Failing that, I'll look into
> Peter's proposals for XDEV, since they seem to address the challenges of
> multiple application behaviors directly - if they get implemented by
> application developers, of course.
This is:
1. not XLL's job, as explained above. Whether a processing models includes
the tree is relevant only to that processing model, not the document itself.
2. That attribute would only be legal for the subset of XLL links that select
WFSTs of the destination document -- this is an unreasonable limitation that
removes some useful applications of such links. For an interesting example of
the scholarly use of such markup, the MULTEXT project may be of interest (
http://www.cogsci.ed.ac.uk/~ht/nsldoc/nsldoc.html ).
3. If 2 is deemed a minority view, XSL will support _only_ what you ware
requresting, but other processing languages will be able to process such
linking structures.
------------------------------------------+----------------------------
David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Tue Dec 2 03:34:19 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:14 2004
Subject: EMBED and validation
Message-ID:
On Dec 1, 11:22am, Rob McDougall wrote:
> Subject: RE: EMBED and validation
> I'm new to XML but this doesn't seem to accomplish what I would be
> looking for as an "include" capability.
no, EMBED is _not_ an include facility. External text entities are.
> Let's say I have a markup language (let's call if RML, "Rob's Markup
> Language"). I create a DTD for it and post it to my public web site.
> All users of RML put the URL for the DTD in the declaration.
> So far so good?
yep.
> Now, if one particular user of RML notices that there's a section that's
> common across every one of their RML documents, they might wish to
> seperate it out into a distinct file and insert a link to it. This
> common piece is not a complete document unto itself so it cannot be
> validated, yet the user may wish to have the documents that include make
> sure that it is valid within the context that it was embedded. Since
> this particular file is unique to this user and not all RML users, it
> does not belong in the commono DTD. This would seem to make an external
> text entity undesireable for this case.
Right. That's why XML has the "internal subset. You put the any _per-document_
declarations there, (inside the square brackets of the doctype) and they
augment the DTD, without removing it.
> Is this correct, or am I missing something? Is there any other way to
> accomplish this using the current XML/XLL specs?
No; Yes; Yes.
for example, you might have:
]>
.... later on in your document ...
&boilerplate;
The entity reference to "boilerplate" will include the whole disclaimer without
having to change the DTD, or fill it with weird private information. Is this
good enough?
------------------------------------------+----------------------------
David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/
| MAPA: mapping for the WW
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Dec 2 04:12:39 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:14 2004
Subject: Re FPIs for RFCs
Message-ID: <199712020411.PAA26436@jawa.chilli.net.au>
> From: Terry Allen
> You are asserting an ownership right you cannot back up. That's dangerous
> for one's legal health. Referring to something by using its URL is one
> thing, but using that URL to create a name that lies in someone else's
> name space is another matter entirely.
I have emailed Internic to find out their views. However, I do not
believe that an FPI is property. I believe it is common and accepted
practise to create FPIs for published material using ISBN, and that
the IDN can be used in exactly the same way.
ISO 8879 says 4.223 owner identifier "The portion of a public identifier
that identies the owner or originator of public text".
I read that to mean that it would actually be wrong for me to use myself
in the owner field. The owner means the owner (or originator) of the
public text, not the originator of the FPI.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Tue Dec 2 14:55:12 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:14 2004
Subject: Data warehousing and XML
References: <3482FAC7.9DD9D6E4@technologist.com> <348360F2.407B@hiwaay.net>
Message-ID: <34842224.18C52B52@technologist.com>
len bullard wrote:
> Of course XML can be used to create non-ambiguous
> transfer formats (data schlepping). But Paul,
> a lot of the information that needs to be mined
> is not in relational formats.
I don't doubt that there are some people in the world who want to "mine"
documents, but I think that they are in the minority, and will be for a
long time. But more important, it makes little sense to me to "mine" XML
data. Even if you wanted to mine your structured document data it will
almost always make sense to load that into the mining tool's internal
data structures.
Once again, XML is great as the transfer format, but when you get down
to doing your queries, your data mining software should not be parsing
the XML syntax.
> However, let me ask a technical
> question that you can probably answer with a deeper
> technical perspective than mine? How well can one query
> data (or convert it for that matter) for which one
> has no rigorous schema (of some kind)?
In some cases you can do sophisticated queries on data without a schema,
but you would have to jump through AI hoops. It's not a job I would
apply for, but neural net experts may be able to detect structure in the
chaos. But building the schema first is definately cheaper than trying
to divine the structure later.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Tue Dec 2 15:17:23 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:15 2004
Subject: Infoworld XML Article
Message-ID: <3484275E.CB18C5F2@technologist.com>
Vendors to push XML as all-purpose Web middleware format
By Lynda Radosevich
InfoWorld Electric
http://www.infoworld.com/cgi-bin/displayStory.pl?sc?97121.exml.htm
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From gray at interlog.com Tue Dec 2 15:27:11 1997
From: gray at interlog.com (Graydon Hoare)
Date: Mon Jun 7 16:59:15 2004
Subject: Data warehousing and XML
In-Reply-To: <34842224.18C52B52@technologist.com>
Message-ID:
On Tue, 2 Dec 1997, Paul Prescod wrote:
> neural net experts may be able to detect structure in the
> chaos. But building the schema first is definately cheaper than trying
> to divine the structure later.
er, you're missing something. The whole point of data mining is admitting
that all the schemas you will ever establish are in some way flawed, no
matter what you do. There would be no need for such tools if we were
simply able to see the future, and know that it's terribly important to
maintain a count of how many sticks of gum get shipped to guam on tuesdays
in december.
This is precisely why text retrieval is so hard -- the "schema" that all
documents are written in is a human written language, and nobody knows how
to machine-process that. You can chunk it up all you like into logical
blocks, but you're always going to be missing certain substantive
information relating to the text. In fact, if you want to get really
finicky about it, plain vanilla transcribed text loses useful information
conveyed in spoken language, and requires an expert "document engineer"
to produce (compare a literate adult's writing to that of a child).
-graydon
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From RMcDouga at JetForm.com Tue Dec 2 15:35:45 1997
From: RMcDouga at JetForm.com (Rob McDougall)
Date: Mon Jun 7 16:59:15 2004
Subject: EMBED and validation
Message-ID:
Thanks to everyone for the replies. I now (think I) understand how this
would be used. This method does require that the person creating the
document specify all the URLs he will be "include"ing at the top of the
file. This is somewhat inconvenient for someone who only inserts things
once into any given document. If the documents are being generated on
the fly from some application, the application may have to perform two
passes to derive a list of filenames, or else "bulk up" the document
with lots of entities that may never be substituted.
It would be nice if there was also an "inline" way of doing includes
that would allow the XML parser to validate the resulting content.
Rob
=======================================================
Rob McDougall Phone: (613)751-4800 ext.5232
JetForm Corporation Fax: (613)751-4864
http://www.jetform.com mailto:rmcdouga@jetform.com
=======================================================
>-----Original Message-----
>From: dgd@cs.bu.edu [SMTP:dgd@cs.bu.edu]
>Sent: December 1, 1997 10:34 PM
>To: 'xml-dev@ic.ac.uk'
>Subject: Re: EMBED and validation
>
>
>for example, you might have:
>
>[
>
>]>
>
>.... later on in your document ...
>&boilerplate;
>
>The entity reference to "boilerplate" will include the whole disclaimer
>without
>having to change the DTD, or fill it with weird private information. Is this
>good enough?
>
>------------------------------------------+----------------------------
>David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
>Boston University Computer Science | Dynamic Diagrams
>http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/
> | MAPA: mapping for the WW
>
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Tue Dec 2 16:13:05 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:15 2004
Subject: Data warehousing and XML
References:
Message-ID: <34843440.98331B25@technologist.com>
Graydon Hoare wrote:
>
> er, you're missing something. The whole point of data mining is admitting
> that all the schemas you will ever establish are in some way flawed, no
> matter what you do. There would be no need for such tools if we were
> simply able to see the future, and know that it's terribly important to
> maintain a count of how many sticks of gum get shipped to guam on tuesdays
> in december.
Right but Len's question was about having a "schema of *some kind*". The
closer your schema is to explicitly recognizing the information you want
to discover, the easier it is to discover the information. If you have
no schema then you are Very Far Away from that goal.
> This is precisely why text retrieval is so hard -- the "schema" that all
> documents are written in is a human written language, and nobody knows how
> to machine-process that. You can chunk it up all you like into logical
> blocks, but you're always going to be missing certain substantive
> information relating to the text.
Certainly, but those who actually do this processing still chunk it up
into the logical blocks because according to some schema, because that
is the way to get closest to achieving the goal. So in answer to Len's
question I still say that having a schema is better than not having one,
despite the fact that having the schema does not "solve" the problem. It
gets you closer to solving the problem.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Tue Dec 2 17:43:08 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:15 2004
Subject: Entities and XPointers
Message-ID:
While they don't provide the actuation flexibility or many of the other
features of XML-Link, it may be possible to create external entities that use
XPointers in the URL. Of course, this would require that either the processing
application can cope with XPointers (unlikely in this case), or that the
server can interpret the XPointer and return only the chunk requested.
Provided that I can make the server interpret the XPointer (using | as the
connector, or the XML-XPTR query syntax) and return only the chunk requested,
is this kosher? It's mixing and matching the spec, which I'm not sure is
appropriate.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Dec 2 19:33:33 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:15 2004
Subject: XML for distributed processing
In-Reply-To: <9712021745.aa26388@salmon.maths.tcd.ie>
References: <9712021745.aa26388@salmon.maths.tcd.ie>
Message-ID: <199712021934.OAA00404@unready.microstar.com>
El Melody Chile writes:
> Bosak says (in his paper "XML, Java, and the future of the Web")
> that "its utility ultimately lies in the fact that a
> computation-intensive process, that would otherwise entail an
> enormous, extended resource hit on the server has been changed into
> a brief interaction with the server followed by an extended
> interaction with the user's own Web client". I can't see how this
> is directly due to XML, would the same process not be possible
> using a Java applet and data written in *any* industry-specific
> representation language? Is there any specific benefit associated
> with using XML to implement this language?
One advantage is the fact that XML has a concept of both physical
(entity) and logical (element) structure. You can put together a
document from many different sources anywhere on the Internet (or any
other network), and produce an entirely different logical structure
for use by your application. For example, here's a document that gets
its first chapter from a hypothetical server in Canada, its second,
from a server at an American university, and the second paragraph of
its third chapter, from a server in Finland (of course, if you're
using a Java applet, your web browser must allow applets to make
TCP/IP connections to multiple hosts):
]>
My Book
&chap01;
&chap02;
This is the third chapter
First paragraph.
¶01;
Third paragraph.
Another advantage is the fact that many people are using it. You
could invent a different syntax that did the same thing, but why
bother, especially when there's already lots of free and commercial
software supporting XML.
A final advantage is that XML is not language-, software-, or
vendor-specific; instead, it's based on an International Standard, ISO
8879, that has been in widespread enterprise use for over a decade.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Tue Dec 2 21:36:36 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:15 2004
Subject: Entities and XPointers
Message-ID:
On Dec 2, 5:40pm, Simon St.Laurent wrote:
> Subject: Entities and XPointers
> While they don't provide the actuation flexibility or many of the other
> features of XML-Link, it may be possible to create external entities that use
> XPointers in the URL. Of course, this would require that either the
processing
> application can cope with XPointers (unlikely in this case), or that the
> server can interpret the XPointer and return only the chunk requested.
There's no reason that you can't do this at the server side, and a client that
was so sophisticated could interpret the URL (but, pracitcally speaking, I
wouldn't expect many such to exist).
> Provided that I can make the server interpret the XPointer (using | as the
> connector, or the XML-XPTR query syntax) and return only the chunk requested,
> is this kosher? It's mixing and matching the spec, which I'm not sure is
> appropriate.
I'm not sure about using the | connector. It's ideal for the
maybe-client/maybe-server case, but I think those special URLs may be an
XLL-only item -- thus not required to be interpreted by XML parsers. But the
use of query strings (with '?') and smart servers to serve up partial XML data
is part of what we wanted to enable by using URLs in XML in the first place.
-- David
------------------------------------------+----------------------------
David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/
| MAPA: mapping for the WWW
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Dec 2 23:38:04 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:15 2004
Subject: EMBED and validation
In-Reply-To:
Message-ID: <3.0.1.16.19971203001246.21bffac6@pop3.demon.co.uk>
At 10:29 02/12/97 -0500, Rob McDougall wrote:
>Thanks to everyone for the replies. I now (think I) understand how this
>would be used. This method does require that the person creating the
>document specify all the URLs he will be "include"ing at the top of the
>file. This is somewhat inconvenient for someone who only inserts things
>once into any given document. If the documents are being generated on
>the fly from some application, the application may have to perform two
>passes to derive a list of filenames, or else "bulk up" the document
>with lots of entities that may never be substituted.
If you are going to 'include' binary 'files' (i.e. entities) then it gets
more complex. This is my current analysis. It's probably wrong. (Are there
any Java parsers which manage this?)
]>
This is all required for one GIF. Every GIF requires an ENTITY. There
*must* be an internal subset. There must be a registry for the FPIs, etc.
In XLL I can write a complete document:
(excuse the case insensitivity)
>It would be nice if there was also an "inline" way of doing includes
>that would allow the XML parser to validate the resulting content.
Well, XLL does this ***as long as we agree on the semantics***. HREF (or
IMG/SRC) is so widely used in HTML that people will certainly start doing
their own thing. There are the following possibilities:
- wait for a W3C body to pronounce (won't be this year, I suspect)
- wait and see what commercial browsers do
- invent nine-and-sixty ways of doing it
- use XDEV: as at least a means of coordinating *some* people.
JUMBO will start with the latter, and junk it as soon as anything official
comes along...
[BTW I am not very happy with the idea that FPIs are intended to be human-
but not machine-readable. That makes them useless for things like image/gif.]
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Dec 2 23:42:13 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:15 2004
Subject: Entities and XPointers
In-Reply-To:
Message-ID: <3.0.1.16.19971203002645.218f929c@pop3.demon.co.uk>
At 16:35 02/12/97 -0500, David G. Durand wrote:
>On Dec 2, 5:40pm, Simon St.Laurent wrote:
>> Subject: Entities and XPointers
>> While they don't provide the actuation flexibility or many of the other
>> features of XML-Link, it may be possible to create external entities
that use
>> XPointers in the URL. Of course, this would require that either the
>processing
>> application can cope with XPointers (unlikely in this case), or that the
>> server can interpret the XPointer and return only the chunk requested.
>
>There's no reason that you can't do this at the server side, and a client
that
>was so sophisticated could interpret the URL (but, pracitcally speaking, I
>wouldn't expect many such to exist).
I am probably missing something, but it seems fairly straightforward to
extract something from another document - the question is whether it's
allowed. For example,
or
could return a chunk of well-formed XML. (JUMBO is capable of the second
form at present). The question is whether
...
&chap3;
is legal in an XML parser. I suspect that this is undefined - however it
must not be 'application-dependent', because otherwise we get different
parser behaviour. (The use of other connectors (| and ?) is presumably
similar - it's the mechanics of how the entity is retrieved.)
>
The only argument I can see against this is that it requires all parser
writers who cope with ENTITYs to resolve XLL - and that is quite a strong
argument :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 3 00:05:49 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:15 2004
Subject: Validation algorithm/code wanted
In-Reply-To:
Message-ID: <3.0.1.16.19971203010145.223f5f34@pop3.demon.co.uk>
This may come as a shock to some, but I would actually like to use
DTD-based validation in JUMBO. The primary purpose is to be able to read in
a document and map the content of each ELEMENT onto the DTD. This is so I
can have a GUI-based authoring tool. [ATTLISTs are relatively easy and I
have already done them, I think].
I would be grateful for some or all of the following:
- a java-based library routine (I think this may be optimistic in 1997)
- an algorithm, or a pointer to one on the WWW
- some wise words about how much effort is involved in writing an algorithm.
[Norbert solved this in NXP by including JACC - a java-based yacc-like
beast - but it is cumbersome for just analysing single content models
against instances].
The operation seems to be somewhere in between a graph matching routine
(which I can do except for the optionality) and a BNF parser (e.g. yacc)
which I certainly can't. My recollection of regexps is that they use a
'maximal munch' of some sort and so I would try to match as many of the
early nodes and then unwind the stack repeatedly if it failed. However,
yacc throws up the 'shift-reduce' conflicts which I imagine still pertain
in XML. (This means there is more than one way of mapping a document onto
the content model, I assume.)
I'd really hate to have to hack this myself - maybe there is a mythical
grad student on this list who really loves writing parsers. If so, I'll
write to her supervisor with a glowing reference :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cbullard at hiwaay.net Wed Dec 3 00:29:09 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun 7 16:59:15 2004
Subject: Data warehousing and XML
References: <3482FAC7.9DD9D6E4@technologist.com> <348360F2.407B@hiwaay.net> <34842224.18C52B52@technologist.com>
Message-ID: <3484A795.220@hiwaay.net>
Paul Prescod wrote:
> I don't doubt that there are some people in the world who want to "mine"
> documents, but I think that they are in the minority, and will be for a
> long time. But more important, it makes little sense to me to "mine" XML
> data. Even if you wanted to mine your structured document data it will
> almost always make sense to load that into the mining tool's internal
> data structures.
Umm.. that actually was one of the often requested capabilities
when I was still working on SGML systems. The problem was
precisely that a great deal of the *interesting* information
was not in relational databases. Comparative policy analysis,
for example.
> Once again, XML is great as the transfer format, but when you get down
> to doing your queries, your data mining software should not be parsing
> the XML syntax.
Ok. Hmm? Well, what were the various proposals over the
years for SGML querying systems for?
> > However, let me ask a technical
> > question that you can probably answer with a deeper
> > technical perspective than mine? How well can one query
> > data (or convert it for that matter) for which one
> > has no rigorous schema (of some kind)?
>
> In some cases you can do sophisticated queries on data without a schema,
> but you would have to jump through AI hoops. It's not a job I would
> apply for, but neural net experts may be able to detect structure in the
> chaos. But building the schema first is definately cheaper than trying
> to divine the structure later.
That is what I thought to be the case. I remember when we
were doing the GE CASS system we bounced around the idea
of using DTDs as sort of a reversed query, that is, it
gave us a way to figure out what kinds of queries should
be interesting. We never pursued the idea because the
SGML systems of that time were fairly primitive.
len
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From gmckenzi at JetForm.com Wed Dec 3 01:01:33 1997
From: gmckenzi at JetForm.com (Gavin McKenzie)
Date: Mon Jun 7 16:59:15 2004
Subject: EMBED and validation
Message-ID:
Just some comments on this issue of 'inclusion'. I apologize if this
sounds like a ramble...
I understand the purpose and usefullness of declaring an entity in the
internal DTD subset and employing this mechanism as the proper and valid
way to include some (potentially marked up) text. But, echoing Rob
McDougall's closing statements, for *many* applications it is simply too
difficult for the application to 'predict' these inclusion points and
place a corresponding declaration in the internal DTD subset. In fact,
I would venture to say that most of my customers would walk away from
XML based on this issue alone.
Heavens, so many data processing shops still want to continue writing
data out in fixed length COBOL style records; and while it may be the
nineties, they are resistant to change. As much as it may seem to be a
stretch to bring these type of data producers into the XML world, I
(naively) think it is possible.
So, after reading all the previous submissions (especially Peter's
display of the overhead for setting up a GIF reference via the external
entity method) I too wish to use an XLL based mechanism for expressing
an 'inclusion' linkage, and pine for some agreement on the semantics.
Although one thing remains unclear, despite the dozens of submissions
I've read: Is it, or is it not acceptable for an application to choose
to act upon an XLL linkage in a way that causes the target linked
content to be included and validated. Another way, if I create an XML
derived format, and document that a processor of this derived format
should view a particular usage of an XLL construct as instructions to
"retrieved and include 'inline' the target content, and validate it
against the originating document's DTD as if the target content was part
of the original document".
I'd much prefer that there was a way to express this in the syntax.
Gavin.
========================================================
Gavin F. McKenzie Vox:+1(613)230-3676 ext 5277
JetForm Corporation Fax:+1(613)594-8886
http://www.jetform.com mailto:gmckenzi@jetform.com
========================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From elm at arbortext.com Wed Dec 3 01:28:16 1997
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun 7 16:59:15 2004
Subject: EMBED and validation
Message-ID: <3.0.32.19971202202247.00a9e690@village.doctools.com>
At 07:12 PM 12/2/97 -0500, Peter Murray-Rust wrote:
>At 10:29 02/12/97 -0500, Rob McDougall wrote:
>>Thanks to everyone for the replies. I now (think I) understand how this
>>would be used. This method does require that the person creating the
>>document specify all the URLs he will be "include"ing at the top of the
>>file. This is somewhat inconvenient for someone who only inserts things
>>once into any given document. If the documents are being generated on
>>the fly from some application, the application may have to perform two
>>passes to derive a list of filenames, or else "bulk up" the document
>>with lots of entities that may never be substituted.
>
>If you are going to 'include' binary 'files' (i.e. entities) then it gets
>more complex. This is my current analysis. It's probably wrong. (Are there
>any Java parsers which manage this?)
>
> PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION
> Multipurpose Internet Mail Extensions::image/gif//EN">
>
>
>
>
>
>
>]>
>
>
>
>
>This is all required for one GIF. Every GIF requires an ENTITY. There
>*must* be an internal subset. There must be a registry for the FPIs, etc.
(One small note: XML does not currently require public IDs to be formal.
This doesn't materially change your point, though...)
>In XLL I can write a complete document:
>
> MIME="image/gif"/>
>
>(excuse the case insensitivity)
It's true that this is another way to do basically the same thing, a way
that relies on not only XML but also XLL. In practice, a lot of SGML shops
don't use the "pure" way either; they just put a pathname in an attribute
value, and use proprietary means to indicate that the named file should be
output as a graphic or whatever. XLL is definitely an improvement on that!
>>It would be nice if there was also an "inline" way of doing includes
>>that would allow the XML parser to validate the resulting content.
This feels a bit apples-to-oranges, because unless you're declaring XML
itself as a "foreign notation" through a NOTATION declaration, you don't
need a lot of the overhead you've shown above:
]>
...
&mycontent;
...
>Well, XLL does this ***as long as we agree on the semantics***. HREF (or
>IMG/SRC) is so widely used in HTML that people will certainly start doing
>their own thing. There are the following possibilities:
> - wait for a W3C body to pronounce (won't be this year, I suspect)
> - wait and see what commercial browsers do
> - invent nine-and-sixty ways of doing it
> - use XDEV: as at least a means of coordinating *some* people.
>
>JUMBO will start with the latter, and junk it as soon as anything official
>comes along...
XLL itself isn't intended to pull in content and have it validated as part
of the same context in which the linking element appears. I think you'd
have to use the DOM to dynamically change your document, and then reparse
if you choose to. E.g., if you were to define a ROLE attribute value that
means "parse me in context once you've pulled me in," you'd have to start
another XML processor pass to do this, and it would be part of your own
application semantics, not those of XLL.
>[BTW I am not very happy with the idea that FPIs are intended to be human-
>but not machine-readable. That makes them useless for things like image/gif.]
>
> P.
Eve
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Wed Dec 3 04:00:17 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:15 2004
Subject: EMBED and validation
In-Reply-To: <3.0.1.16.19971203001246.21bffac6@pop3.demon.co.uk>
References:
Message-ID:
At 12:12 AM -0000 12/3/97, Peter Murray-Rust wrote:
>If you are going to 'include' binary 'files' (i.e. entities) then it gets
>more complex. This is my current analysis. It's probably wrong. (Are there
>any Java parsers which manage this?)
Actually, I just noticed, it _is_ wrong (I removed > quoting because it's
too gross for SGMl examples):
>
This should be:
The notation is attached to the entity, not the citation of the entity.
]>
Finally, this is a bit overstated, the following lines could (and should)
all be included in any reasonable CML DTD:
So the internal subset would have to contain the following to support _one_
gif:
and the document would contain:
Defining a DTD (and its associated stylesheets) generally requires careful
thought about what external notations are required in the intended
application. Predefined notation sets (in the form of external entities
with Public indentifiers) are common as dirt in the SGML world, for the
reasons of interchangeability and author sanity.
The only place the FPI need appear is in the shared declaration, the
stylesheet (used to actually render or trigger processing of the non-XML
data), can use the notation name "gif" to detect a GIF file. No FPI is
involved at the "browser end" (non-validating processor augmented with a
CML stylesheet).
>In XLL I can write a complete document:
Once you factor out the declarations, this looks about the same (assuming
that you also use ATTLIST declarations in the internal subset to factor out
the redundant attribute values on ):
At 10:29 02/12/97 -0500, Rob McDougall wrote:
>>It would be nice if there was also an "inline" way of doing includes
>>that would allow the XML parser to validate the resulting content.
>
>Well, XLL does this ***as long as we agree on the semantics***.
No, it doesn't You can define a new stylesheet language (or custom
processor) that does this if you want. Perhaps XSL's re-ordering facilities
will be able to do this, without the validation. Validation is an XML
process, and XML itself does not "include" files except via entities.
> HREF (or
>IMG/SRC) is so widely used in HTML that people will certainly start doing
>their own thing.
There is no question that XSL will support this markup idiom for exactly
those reasons (it probably does now, but I've not finished reading it yet).
>There are the following possibilities:
> - wait for a W3C body to pronounce (won't be this year, I suspect)
> - wait and see what commercial browsers do
> - invent nine-and-sixty ways of doing it
> - use XDEV: as at least a means of coordinating *some* people.
Given that XSL will support this, there is no call to go putting any
formatting gunk into your documents. Whatever stylesheet mechanism you
implement in JUMBO should be able to express this, and that is where you
should do it. Note that hardwiring tag names into your processor is a
stylesheet in my terminology, though admittedly not a very flexible one.
>
>JUMBO will start with the latter, and junk it as soon as anything official
>comes along...
The _only_ thing that I can see XDEV having any utility for is expressing
where to find a stylesheet. Maybe you should think about the fundamental
goals of content markup (_SEPARATION_ of content from processing). Read
Coombs, Renear and Derose's Comm. ACM article for the details.
>[BTW I am not very happy with the idea that FPIs are intended to be human-
>but not machine-readable. That makes them useless for things like image/gif.]
The fact that they are human readable has nothing to do with whether they
are supposed to be machine readable. Rick Jelliffe is wrong when he
asserted that they are intended to be "fuzzily matched". So don't worry
about that at any rate.
-- David
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Wed Dec 3 04:01:08 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:15 2004
Subject: Entities and XPointers
In-Reply-To: <3.0.1.16.19971203002645.218f929c@pop3.demon.co.uk>
References:
Message-ID:
At 12:26 AM -0000 12/3/97, Peter Murray-Rust wrote:
>I am probably missing something, but it seems fairly straightforward to
>extract something from another document - the question is whether it's
>allowed. For example,
>
>
>or
>
>could return a chunk of well-formed XML. (JUMBO is capable of the second
>form at present). The question is whether:
>
>...
>&chap3;
>
>is legal in an XML parser.
Sure, it's a legal URL. However, an XML parser is not required to process
fragment IDs, so it's almost certainly a "broken link" in XML parsers that
don't implement XLL.
I think XLL will have to say whether this is supposed to work in XML
parsers that _do_ implement XLL. I would argue that they should (and that
since XLL should be widely implemented) that it will _eventually_ be
sensible to do. At the moment XLL is still soft enough that a hard and fast
judgement on this issue can't really be given, I think.
. I suspect that this is undefined - however it
>must not be 'application-dependent', because otherwise we get different
>parser behaviour. (The use of other connectors (| and ?) is presumably
>similar - it's the mechanics of how the entity is retrieved.)
no, ? is always legal since it's processed at the server, by definition.
Whether it works (as will all URLs) depends on the server's policy for
reolving the URL's sent to it.
>The only argument I can see against this is that it requires all parser
>writers who cope with ENTITYs to resolve XLL - and that is quite a strong
>argument :-)
I think the evident usefulness of this is another strong argument for
implementing XLL widely, and also for making sure that XLL processors are
defined to affect URI (URL) throughout a document, and not just in XLL
specific elements.
-- David
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Wed Dec 3 04:02:24 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:16 2004
Subject: EMBED and validation
Message-ID:
On Dec 1, 11:22am, Rob McDougall wrote:
> Subject: RE: EMBED and validation
> I'm new to XML but this doesn't seem to accomplish what I would be
> looking for as an "include" capability.
no, EMBED is _not_ an include facility. External text entities are.
> Let's say I have a markup language (let's call if RML, "Rob's Markup
> Language"). I create a DTD for it and post it to my public web site.
> All users of RML put the URL for the DTD in the declaration.
> So far so good?
yep.
> Now, if one particular user of RML notices that there's a section that's
> common across every one of their RML documents, they might wish to
> seperate it out into a distinct file and insert a link to it. This
> common piece is not a complete document unto itself so it cannot be
> validated, yet the user may wish to have the documents that include make
> sure that it is valid within the context that it was embedded. Since
> this particular file is unique to this user and not all RML users, it
> does not belong in the commono DTD. This would seem to make an external
> text entity undesireable for this case.
Right. That's why XML has the "internal subset. You put the any _per-document_
declarations there, (inside the square brackets of the doctype) and they
augment the DTD, without removing it.
> Is this correct, or am I missing something? Is there any other way to
> accomplish this using the current XML/XLL specs?
No; Yes; Yes.
for example, you might have:
]>
.... later on in your document ...
&boilerplate;
The entity reference to "boilerplate" will include the whole disclaimer without
having to change the DTD, or fill it with weird private information. Is this
good enough?
------------------------------------------+----------------------------
David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/
| MAPA: mapping for the WW
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Wed Dec 3 04:02:30 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:16 2004
Subject: Re FPIs for RFCs
In-Reply-To: <199712020411.PAA26436@jawa.chilli.net.au>
Message-ID:
XML-dev is not dependable for me. I think this posting was lost.
Summary:
FPIs cannot be assigned in someone else's namespace, the "owner" identifies
a naming authority, not an intellectual property owner. Assertion supported
with verbiage quoted from relevant ISO standards.
At 4:11 AM -0000 12/2/97, Rick Jelliffe wrote:
>
>> From: Terry Allen
>
>> You are asserting an ownership right you cannot back up. That's dangerous
>> for one's legal health. Referring to something by using its URL is one
>> thing, but using that URL to create a name that lies in someone else's
>> name space is another matter entirely.
>
>I have emailed Internic to find out their views. However, I do not
>believe that an FPI is property.
I'm sorry this is flat out wrong. ISO 9070 is very clear on the subject,
and I quote:
"3.10 Owner name: the portion of a public identifier that names its owner.
NOTES
.... 13 The owner of a public identifier is not necessarily the owner of
the object it identifies"
and from the introduction:
"... and an 'owner name', which identifies the originator of the public
identifier"
The whole point of owners (I can't quote the showe standard, unfortunately)
is to create domains of administration for namespaces, and sub-namespaces.
This just can't work if I'm allowed to make names in _your namespace_
without your permission, just because I'm citing your work. So either there
is a wording goof in 8879, or 9070 is making
> I believe it is common and accepted
>practise to create FPIs for published material using ISBN
I've never heard of this practice, and since it leads to a chaotic and
broken system of public identifiers, we should stamp it out to the extent
that it has been accepted.
> and that
>the IDN can be used in exactly the same way.
IDN is not in 9070 rev 2, and thus is not suitable _de jure_; it is also
unsuitable _de facto_, since domain names can be reused by different
organizations. Unless Internet policies and 9070 have both changed, I think
this is also wrong.
>ISO 8879 says 4.223 owner identifier "The portion of a public identifier
>that identies the owner or originator of public text".
That definition conflicts with the 9070 definition, but in the context of
the public entity sets in SGML, the confusion of identifier owner and data
owner is understandable. When 8879 was written, the notion of needing to
assign persistent names to _other people's_ computer readable documents was
not foremost in anyone's mind.
>I read that to mean that it would actually be wrong for me to use myself
>in the owner field. The owner means the owner (or originator) of the
>public text, not the originator of the FPI.
This interpretation is explicitly contrary to 9070, though, and 9070 is
more recent, normatively cited by 8879, and edited by the same editor; so I
am inclined to prefer the 9070 reading. Also, the name assignment protocol
you suggest fails to achieve the list of goals in 9070 for the whole public
identifier standard, since it fails to provide a set of rules that will
guarantee that no colficting PUBLIC identifiers are ever assigned. The 9070
interpretation does provide such rules, based on hierarchical name
assignment.
-- David
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Wed Dec 3 04:02:38 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:16 2004
Subject: EMBED and validation
In-Reply-To:
Message-ID:
I've reordered your mail for ease of response.
At 12:56 AM -0000 12/3/97, Gavin McKenzie wrote:
>Just some comments on this issue of 'inclusion'. I apologize if this
>sounds like a ramble...
>Although one thing remains unclear, despite the dozens of submissions
>I've read: Is it, or is it not acceptable for an application to choose
>to act upon an XLL linkage in a way that causes the target linked
>content to be included and validated.
I must have been rambling too much. No. It is not part of XML validation to
performn such a step. Your application is free to implement any checks it
reuqires (that the linked data is a well-formed XML tree, and that that
subtree would be legal if put in place of the link). XML will give you no
aid in such checking other than maybe providing the code to help you
implement these checks.
For that reason, I would not recommend doing this unless you know exactly
what software will run across your data, or you can write a stylesheet to
perform your required validation.
> Another way, if I create an XML
>derived format, and document that a processor of this derived format
>should view a particular usage of an XLL construct as instructions to
>"retrieved and include 'inline' the target content, and validate it
>against the originating document's DTD as if the target content was part
>of the original document".
The retireve and view part yes, the validate part is a no, unless you
provide the code for that step. It's not a part of XML. If you need XML
parser-based inclusion you have to use entities.
>I understand the purpose and usefullness of declaring an entity in the
>internal DTD subset and employing this mechanism as the proper and valid
>way to include some (potentially marked up) text. But, echoing Rob
>McDougall's closing statements, for *many* applications it is simply too
>difficult for the application to 'predict' these inclusion points and
>place a corresponding declaration in the internal DTD subset. In fact,
>I would venture to say that most of my customers would walk away from
>XML based on this issue alone.
I don't fully understand why, which is not to say that I don't believe you.
XML does not have another mechanism for textual inclusion. Some already
believe that one mechanism is too many, so I don't know how likely this is
to change.
Why is this a show-stopper four your application?
>Heavens, so many data processing shops still want to continue writing
>data out in fixed length COBOL style records; and while it may be the
>nineties, they are resistant to change. As much as it may seem to be a
>stretch to bring these type of data producers into the XML world, I
>(naively) think it is possible.
fixed length records aren't such a problem for XML, but I suspect you have
soemthing else in mind...
>So, after reading all the previous submissions (especially Peter's
>display of the overhead for setting up a GIF reference via the external
>entity method) I too wish to use an XLL based mechanism for expressing
>an 'inclusion' linkage, and pine for some agreement on the semantics.
Peter's example would be a great deal simpler, if he moved the element and
notation specifications into the DTD, rather than keeping them in the
subset. Since it is a validity constraint, and not a well-formedness
constraint that a NOTATIONs be declared, they can be banished to the DTD,
and the stylesheet can use the notation name to decide processing.
-- David
_________________________________________
David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
Boston University Computer Science \ Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
--------------------------------------------\ http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW \__________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Wed Dec 3 04:02:43 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:16 2004
Subject: EMBED and validation
Message-ID:
My postings keep bouncing form the list, sorry if this is a duplicate.
On Dec 1, 3:36pm, Simon St.Laurent wrote:
> Subject: RE: EMBED and validation
> >From a DOM perspective, EMBEDded material will almost certainly not be
> >considered part of the document tree containing the EMBED element.
>
> I very much look forward to seeing what the DOM does (or doesn't do) with the
> EMBEDded material. But is this an issue for the DOM in particular, or should
> the XML-Link spec give clearer direction about the nature of EMBEDded
> material? Especially as some of the replies so far have said that an
> application _could_ include the EMBEDded material in the document tree _if_
> the developer so chose - which opens the door to multiple interpretations in
a
> large way.
You're getting closer. The document, itself, contains no embedded material: it
contains an "EMBED" (quitation) link to other material. It will not be a
requirement for any XML application to do anything other than include the XLL
attributes in it output for applications that want them. This attribute
information is part of the proper domain of the DOM, as I can understand it.
XLL applications will be required to interpret the link as a _connection_
between two points, with "default semantics" of "include as quotation". Whether
than is most convenietly implemented boy combing document data structures, as
you and Peter assume, or by some other method that preserves two structures,
and renders them in a particular style is an application implementation
decision.
Regardless of that implementation strategy, XSL stylesheets for XLL constructs,
will have to specify how to choose display options for such links. Those
construct _will_ have to deal with the fact that XLL links _need not_ be to
"well-formed subtrees" of the linked-to document. This is of critical
inportance for implementing external markup and arbitrary quotation. So, any
implementations strategy that depends on WFST (Well Formed SubTrees) will fail
for some legal imnput documents. That's fine, depending on the goals and
limitations of the application.
It is concievable that XSL will not give any method for formatting non-WFST
EMBED links. That would also be OK, as people who require the more complex
linking can still create their own (more complex) applications.
However, purely in the form that you've asked the question (i.e. XML parsing
rules), there is no inherent relation between EMBED linking and document
validation -- and this is not an oversight, but a planned strategyu to enhance
the reusability of hyperdocuments in the same way that XML enhances the
reusability of single documents -- by late-binding all formatting and display
issues via a stylesheet or toerh form of processing specification.
The confusion over "application flexibility" is occuring because people are
used to early-binding models like HTML, where the format of documents is
explicitly encoded in the document. To judge application compatibility you
require not only the knowledge of the XML input, but the stylesheet language
(processing model) being apllied to the document.
I've seen nothing so far in the XDEV proposals that is not more properly an
issue for XSL.
One way to see that this flexibility is required to is to imagine in what sense
there can be interoperability between a web-mapping, or web-indexing
application and a browser display application. They might have very different
strategies for when to do many things (such as expand entities) and attach very
different semantics to those operations (a map might represent entities as a
special type of link, a browser would silently expand them, or perhaps use a
stretchtext view where the entities were buttons that would trigger textual
expansion when clicked). Formatters would selsect such options based on their
stylesheets. Analysis application might more often do the same thing via
hard-wired code or configuration files.
> And, of course, I can think of a considerable number of applications where it
> might be useful to be apply to apply the DOM to EMBEDded content without
> having to cope with a separate document tree.
That is fine, but that is a decision on processing model that, if taken, will
not handle certain legal XLL-linked documents... You can pick your processing
model, but then you have to live with the consequences.
> Sounds like fun. For the applications I'm proposing, I'd like them in the
> document tree, but of course that isn't appropriate for many situations. I'd
> really rather not see this prohibited, either - it would chop off an entire
> branch of XML development I'm working on. Could be the price of progress.
> We'll see.
I think the price will be some options in XSL (entity expansion rules) that may
seem mysterious at first and second glance, but will enable much more
sophisticated (and controllable) hypertext interaction.
> I guess what I'd love to see is another XML-Link attribute specifying whether
> to include an EMBED in the document tree or not - it seems to be the central
> issue around which this discussion has focused. Failing that, I'll look into
> Peter's proposals for XDEV, since they seem to address the challenges of
> multiple application behaviors directly - if they get implemented by
> application developers, of course.
This is:
1. not XLL's job, as explained above. Whether a processing models includes
the tree is relevant only to that processing model, not the document itself.
2. That attribute would only be legal for the subset of XLL links that select
WFSTs of the destination document -- this is an unreasonable limitation that
removes some useful applications of such links. For an interesting example of
the scholarly use of such markup, the MULTEXT project may be of interest (
http://www.cogsci.ed.ac.uk/~ht/nsldoc/nsldoc.html ).
3. If 2 is deemed a minority view, XSL will support _only_ what you ware
requresting, but other processing languages will be able to process such
linking structures.
------------------------------------------+----------------------------
David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Dec 3 04:04:09 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:16 2004
Subject: EMBED and validation
References:
Message-ID: <3484DAA0.17AEE44C@technologist.com>
Gavin McKenzie wrote:
>
> Although one thing remains unclear, despite the dozens of submissions
> I've read: Is it, or is it not acceptable for an application to choose
> to act upon an XLL linkage in a way that causes the target linked
> content to be included and validated.
The XML spec. puts no constraints on applications, so you can do
whatever you want. It does not define "validation" for hyperdocuments
constructed by gluing together bits through XLL. If you wish to invent a
definition of validation that allows this, for the purposes of your
application, then you may do so. An easy way to do that would be to read
the two entities, and pipe it through a processor as a single XML
document. But no, it is not the processor's role to do this for you,
because the processor need only support the concept of validation
described in the XML spec., which has nothing to do with XLL at all.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Wed Dec 3 04:10:58 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:16 2004
Subject: MIME Notations
Message-ID: <199712030408.PAA10380@jawa.chilli.net.au>
There have been several serious complaints about my FPI for MIME.
I have asked for clarification from the relevant standards body.
In the meantime, if we are nervous and need an FPI now, instead of using
we could decide on (if the XML-DEV leaders give their blessing)
something like
I will let this list know what the outcome is. Until then, better
safe than sorry.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Wed Dec 3 04:58:22 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:16 2004
Subject: Re FPIs for RFCs
Message-ID: <199712030456.PAA11940@jawa.chilli.net.au>
> From: Murray Altheim
> Maybe I'm confused by this point, but only the owner has the right to
> create FPIs within their namespace.
Who gives this right? What law or cases say that, if internic.net
makes a file publically available on an archive-server, I cannot
use its address inside a 9070 identifier? Unlike that English newspaper
case, I am not passing off the thing pointed to as mine, I am
*not* passing it off. ISO standards are not law. FPIs are merely a
statement of fact, in a standard form.
Statements of fact cannot be copyright, under US Law. So a telephone
directory can be taken and reproduced without copyright infringement
(unless there is some non-mechanical uniqueness in the arrangement),
including company names that are also tradmarks. This is because
names and addresses are facts not inventions.
Tell me why is legal to say:
image/gif is part of the Multipurpose Internet Mail Extensions
given in ds.internic.net/rfc/rfc2046.txt
but somehow illegal to say:
PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION
Multipurpose Internet Mail Extensions::image/gif//EN">
The use of Internet Domain Names in Formal Public Identifiers is part
of WebSGML. The text is currently being finalized. I have asked for
clarification.
If it is clarified to say that only owners of public text can make
up FPIs, and that people cannot construct appropriate FPIs using
publicly available facts, then there are a lot of naughty FPIs out
there!
That being said, I must agree with David that ISO 9070 seems clear
(but contradictory to ISO 8879) on it. I will ask WG4 for ISO 8879
to be reconciled with ISO 9070. Even though I certainly do not see
how it can be unlawful, if it is wrongly formed against the rules of
ISO 9070, that is a good enough reason not to do it.
Rick Jelliffe
-----------------------------------
ISO 8879
4.223 "The portion of a public identifier
that identies the owner or originator of public text".
^^^^
------------------------------------
ISO 9070
"3.10 Owner name: the portion of a public identifier that names its owner.
NOTES
.... 13 The owner of a public identifier is not necessarily the owner of
the object it identifies"
and from the introduction:
"... and an 'owner name', which identifies the originator of the public
identifier"
^^^^^^^^^^
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 3 06:02:18 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:16 2004
Subject: EMBED and validation
In-Reply-To:
References: <3.0.1.16.19971203001246.21bffac6@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971203065806.2c27004c@pop3.demon.co.uk>
At 22:59 02/12/97 -0500, David G. Durand wrote:
>At 12:12 AM -0000 12/3/97, Peter Murray-Rust wrote:
[...]
>>
>
> This should be:
>
>
>
> The notation is attached to the entity, not the citation of the entity.
Well I was just going by the spec (which I have clearly misread yet again :-).
[WD-xml-971117 - on the public W3 pages]
[53] says we need an ]>
>
>
>
>
>Finally, this is a bit overstated, the following lines could (and should)
>all be included in any reasonable CML DTD:
>
> PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION
> Multipurpose Internet Mail Extensions::image/gif//EN">
>
>
>
>
>
The whole point here is that you *have* to have a DTD of some sort to
manage this. An external DTD is yet another level of indirection for the
poor DXBH.
>So the internal subset would have to contain the following to support _one_
>gif:
>
>
>
>and the document would contain:
>
>
>
>Defining a DTD (and its associated stylesheets) generally requires careful
>thought about what external notations are required in the intended
>application. Predefined notation sets (in the form of external entities
>with Public indentifiers) are common as dirt in the SGML world, for the
>reasons of interchangeability and author sanity.
I accept this - in the SGML world. But in the HTML world - whose case I am
trying to present :-), 'my.gif' usually means a GIF and it works 10^7 times
a day pretty well :-)
>
>The only place the FPI need appear is in the shared declaration, the
>stylesheet (used to actually render or trigger processing of the non-XML
>data), can use the notation name "gif" to detect a GIF file. No FPI is
>involved at the "browser end" (non-validating processor augmented with a
>CML stylesheet).
Since I can't sleep, let's have a little story showing what the hacker has
to do to resolve problem. Lets' assume that we are looking for mentions of
GIFs in an XML document.
With the XLL approach (and hardcoded MIME attribute, we grep for
'MIME="image/gif"' - exactly.)
With the FPI NOTATION approach we have:
Elephant's Child:
Where are the GIFs in this document?
Parser-man (for it is he, and his hat reflects the rays of the SUN in more
than Oriental splendour):
Come here and be spanked for your curtiosity for it is All very Simple.
Find the NOTATIONs and follow their Indirections.
EC:
I have found the NOTATION, but where please (for the Elephant's Child was
always polite) do I go
PM:
Your mygif must be searched in the Hashtable of ENTITYs (Parser-men always
speak in long words)
EC:
I have found the Hashtable of ENTITYs but I am still lost.
PM:
Come here and be spanked again [and he was] Do you not see that
NotationDeclaration on the ENTITY (for the parser man *always* spoke in
Long Words)
EC (who saw the NotationDeclaration but didn't want to be spanked and asked
ever so ever so politely)
Where do I go now?
PM:
Your must find the NOTATION and its Formal Public Identifier (because
Parser Men *always* speak in Long Words, Best Beloved)
EC:
And so I want:
What do I do with it? (ever so politely, but he got spanked again for his
'satiable curtiosity).
PM:
You must travel to the deserts in the middle of Australia and speak to the
Big God Rick.
Then ran JUMBO, poor old JUMBO, dusty in the sunshine, very much bewildered
and came to the Big God Rick, and asked 'where do I go from here'?
[... to be continued ...]
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 3 08:03:14 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:16 2004
Subject: EMBED and validation
In-Reply-To:
References: <3.0.1.16.19971203001246.21bffac6@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971203090021.2c27b9ba@pop3.demon.co.uk>
At 22:59 02/12/97 -0500, David G. Durand wrote:
>At 12:12 AM -0000 12/3/97, Peter Murray-Rust wrote:
>>If you are going to 'include' binary 'files' (i.e. entities) then it gets
>>more complex. This is my current analysis. It's probably wrong. (Are there
>>any Java parsers which manage this?)
>
>Actually, I just noticed, it _is_ wrong (I removed > quoting because it's
>too gross for SGMl examples):
The Elephant's child has been spanked again...:-)
>
>
> PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION
> Multipurpose Internet Mail Extensions::image/gif//EN">
>
>
>
>
>
>
>
>
>>
>
> This should be:
>
>
>
> The notation is attached to the entity, not the citation of the entity.
Enlightenment has slowly come. I think we actually need an additional
NOTATION as well as SRC so that the final document reads.
]>
Have I finally got there? It seems to make sense... (The same levels of
indirection still apply, of course).
P.
[BTW I am sorry for the amount of noise during these postings. The genuine
purpose behind it was to write software that processes NOTATION. The spec
is correct AFAIK, but it is not easy for casual authors to write documents
from it. I would still urge people to write and publish examples that
exercise the whole spec.
We all are, of course, extremely grateful to James Clark for providing the
files to test parsers with. [Those new to SGML may like to know that James
is the author of sgmls, nsgmls, SP, and various other high-performance,
high-fidelity pieces of publicly available SGML software.] I have yet to
try them out, but I shall regard them as the much required 'gold standard'.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Wed Dec 3 11:04:03 1997
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 16:59:16 2004
Subject: Linking: IDREF vs. HREF
Message-ID: <01bcffdb$123dec20$1e09e391@mhklaptop.bra01.icl.co.uk>
There's been a lot of discussion recently on how to achieve "include"
effects in XML, my problem is how to include "reference" effects.
I am trying to design an XML DTD for encoding genealogical data sets. The
idea is to use the existing data model of the widely-used GEDCOM standard,
but to replace the encoding of the data model with an XML encoding. (Why?
Well I hope
it might be useful to someone and educational for me).
I am wondering how to handle the linking. GEDCOM links are used, for
example, to represent relationships between a person and a family, a person
and another person, or an event and a source. They are relationships in the
data modelling sense, with no hint of presentation semantics. Currently in
GEDCOM you can only have links within a single file: the file contains
records with unique identifiers, very like XML attributes of type ID, and
one record can refer to another using its ID, very like an XML attribute of
type IDREF. So ID/IDREF is the obvious and natural representation of the
current data model.
But I would quite like it to be extensible in the future so a record in one
file can refer to a record in another. This takes one into the domain of the
linking model, where instead of an XML attribute of type IDREF we seem to
need attributes named XML-LINK and HREF. And there are things I can do with
IDREF (like having two IDREF attributes in the same element,
to represent two different relationships) that I can't do with XML-LINKs.
I'm looking for a design that satisfies the immediate requirement, where
references are always to other elements within
the same document, but is naturally extensible to handle references to
elements in a different document (with a "compatible" DTD). Any suggestions
from the experts? Should I, for example, be trying to ensure that the
internal links
work with a parser that doesn't support XLL?
My current and uninformed impression, by the way, is that the whole linkage
model in XLL (and this includes the "embed"
or "include" capability discussed over recent days) has been thoroughly
crippled by a desire to maintain a compatibility with SGML that will give
very little benefit to most of XML's users. My own instinct would have been
to extend the set of attribute types beyond IDREF to include say XREF to say
that the attribute is an XPOINTER with relationship semantics, HREF that it
is an XPOINTER with hyperlink semantics, IREF to say that it is an XPOINTER
with include semantics, etc. If SGML does not allow the set of attribute
types to be extended, that is a serious weakness that should be fixed rather
than circumvented. It's silly to declare the value as being a "string" when
it is actually something much more specific. (I would also like to define
attributes of type boolean, numeric, or date!) Sorry if that opens up old
wounds.
Regards, Mike Kay, ICL
M.H.Kay@eng.icl.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tms at ansa.co.uk Wed Dec 3 11:45:32 1997
From: tms at ansa.co.uk (Toby Speight)
Date: Mon Jun 7 16:59:16 2004
Subject: EMBED and validation
In-Reply-To: Peter Murray-Rust's message of "Wed, 03 Dec 1997 09:00:21"
References: <3.0.1.16.19971203001246.21bffac6@pop3.demon.co.uk> <3.0.1.16.19971203090021.2c27b9ba@pop3.demon.co.uk>
Message-ID:
A non-text attachment was scrubbed...
Name: not available
Type: text/plain (pgp signed)
Size: 2658 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971203/35bac600/attachment.bin
From gmckenzi at JetForm.com Wed Dec 3 14:30:26 1997
From: gmckenzi at JetForm.com (Gavin McKenzie)
Date: Mon Jun 7 16:59:16 2004
Subject: EMBED and validation
Message-ID:
>-----Original Message-----
>From: dgd@cs.bu.edu [SMTP:dgd@cs.bu.edu]
>Sent: Tuesday, December 02, 1997 11:00 PM
>To: Gavin McKenzie; 'xml-dev@ic.ac.uk'
>Subject: RE: EMBED and validation
>
>I've reordered your mail for ease of response.
>
>At 12:56 AM -0000 12/3/97, Gavin McKenzie wrote:
>>Just some comments on this issue of 'inclusion'. I apologize if this
>>sounds like a ramble...
>
>>Although one thing remains unclear, despite the dozens of submissions
>>I've read: Is it, or is it not acceptable for an application to choose
>>to act upon an XLL linkage in a way that causes the target linked
>>content to be included and validated.
>
>I must have been rambling too much. No. It is not part of XML validation to
>performn such a step. Your application is free to implement any checks it
>reuqires (that the linked data is a well-formed XML tree, and that that
>subtree would be legal if put in place of the link). XML will give you no
>aid in such checking other than maybe providing the code to help you
>implement these checks.
>
>[Gavin McKenzie] That's fine. I didn't expect the parser to do this work
>for me.
>
>For that reason, I would not recommend doing this unless you know exactly
>what software will run across your data, or you can write a stylesheet to
>perform your required validation.
>
>[Gavin McKenzie] Well, I know about my software...but I don't want to do
>anything that would make it unduly difficult somebody else to write their own
>application to process my XML. If I can't follow the letter of XML in this
>respect, because it doesn't define this particular behaviour, then at least I
>wish to stick to the spirit. I don't want somebody who is considering
>writing their own ad-hoc XML processor to operate upon my XML to think,
>"Gee...this Gavin guy really missed the point" and decide it's too difficult
>or doesn't 'feel' like XML.
>
>> Another way, if I create an XML
>>derived format, and document that a processor of this derived format
>>should view a particular usage of an XLL construct as instructions to
>>"retrieved and include 'inline' the target content, and validate it
>>against the originating document's DTD as if the target content was part
>>of the original document".
>
>The retireve and view part yes, the validate part is a no, unless you
>provide the code for that step. It's not a part of XML. If you need XML
>parser-based inclusion you have to use entities.
>
>[Gavin McKenzie] Understood. Parser based inclusion no, application
>behaviour yes.
>
>>I understand the purpose and usefullness of declaring an entity in the
>>internal DTD subset and employing this mechanism as the proper and valid
>>way to include some (potentially marked up) text. But, echoing Rob
>>McDougall's closing statements, for *many* applications it is simply too
>>difficult for the application to 'predict' these inclusion points and
>>place a corresponding declaration in the internal DTD subset. In fact,
>>I would venture to say that most of my customers would walk away from
>>XML based on this issue alone.
>
>I don't fully understand why, which is not to say that I don't believe you.
>XML does not have another mechanism for textual inclusion. Some already
>believe that one mechanism is too many, so I don't know how likely this is
>to change.
>
>Why is this a show-stopper four your application?
>
>[Gavin McKenzie] see next comments.
>
>>Heavens, so many data processing shops still want to continue writing
>>data out in fixed length COBOL style records; and while it may be the
>>nineties, they are resistant to change. As much as it may seem to be a
>>stretch to bring these type of data producers into the XML world, I
>>(naively) think it is possible.
>
>fixed length records aren't such a problem for XML, but I suspect you have
>soemthing else in mind...
>
>[Gavin McKenzie] I'd prefer that these shops NOT write out fixed length
>records.
>
>I'm seeing the phrase 'XML hyperdocuments' in recent postings, and this seems
>to fit my bill of requirements. Imagine an application that is intended to
>produce a report of hazardous materials -- this application is going to write
>out an XML document that contains the various line items for each package of
>haz-mat and write in a linkage to another XML document that contains safety
>and handling instructions. The wished-for final results is a
>displayed/printed report with the safety instructions interleaved in-situ
>between the line items.
>
>It is very convenient for the application that produces this XML document to
>write out the links along the way, rather than predict them and write them
>into the top of the document as entities. Writing out entities before hand
>would be viewed as unacceptable because it most likely constitutes a second
>pass.
>
>So the links may be written with AUTO/EMBED, but it is up to the application
>to decide to validate the linked content, knowing that this behaviour is not
>defined within the spec.
>
>Anyway, I've concluded that it's ok for my application to resolve these
>linkages (in a manner that seems like inclusion) and create a new virtual
>document for the purpose of display. And, it's ok for my application to
>interpret EMBED as it chooses (gulp).
>
>But, this notion of hyperdocuments...who is working away at this? Pardon my
>ignorance, but is this what the Hy in HyTime stands for? And would a future
>layering of this technology onto XML be the answer to the hyperdocument
>problem? I can only hope it is simple enough for mortals given that the this
>group seems to not even expect most XML processing applications to be capable
>of XPointer processing.
>
>>So, after reading all the previous submissions (especially Peter's
>>display of the overhead for setting up a GIF reference via the external
>>entity method) I too wish to use an XLL based mechanism for expressing
>>an 'inclusion' linkage, and pine for some agreement on the semantics.
>
>Peter's example would be a great deal simpler, if he moved the element and
>notation specifications into the DTD, rather than keeping them in the
>subset. Since it is a validity constraint, and not a well-formedness
>constraint that a NOTATIONs be declared, they can be banished to the DTD,
>and the stylesheet can use the notation name to decide processing.
>
> -- David
>
>[Gavin McKenzie] Thanks for the help.
>_________________________________________
>David Durand dgd@cs.bu.edu \ david@dynamicDiagrams.com
>Boston University Computer Science \ Sr. Analyst
>http://www.cs.bu.edu/students/grads/dgd/ \ Dynamic Diagrams
>--------------------------------------------\
>http://www.dynamicDiagrams.com/
>MAPA: mapping for the WWW \__________________________
>
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Wed Dec 3 16:14:43 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:59:17 2004
Subject: EMBED and validation
In-Reply-To: <3.0.1.16.19971203090021.2c27b9ba@pop3.demon.co.uk> (message from
Peter Murray-Rust on Wed, 03 Dec 1997 09:00:21)
Message-ID: <199712031618.LAA14931@geode.ora.com>
The NOTATION type of attribute does *not* apply to entities referenced
by an element. Only the NDATA specification on the entity declaration
does that. The notation attribute refers to the element's *content*:
4.211 notation attribute: An attribute whose value is a _notation
name_ ([41]) that identifies the data content notation of the
element's _content_ ([24]).
The XML specification does not, however, make this point. In fact, it
doesn't appear to say what the effect of a notation attribute is.
-Chris
Copy to the editors.
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Wed Dec 3 17:22:42 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:17 2004
Subject: EMBED and validation
Message-ID:
On Dec 3, 6:58am, Peter Murray-Rust wrote:
> Subject: RE: EMBED and validation
> At 22:59 02/12/97 -0500, David G. Durand wrote:
> >At 12:12 AM -0000 12/3/97, Peter Murray-Rust wrote:
> [...]
> >>
> >
> > This should be:
> >
> >
> >
> > The notation is attached to the entity, not the citation of the entity.
>
> Well I was just going by the spec (which I have clearly misread yet again
:-).
> [WD-xml-971117 - on the public W3 pages]
>
> [53] says we need an [54] says we need an AttType in an AttDef
> [55] says I can choose an EnumeratedType for my AttDef
> [58] says this can be a Notation Type
> [... OK so far?]
> [59] has 'NOTATION' in black letters, which I thought meant you had to type
> it in and it has to be followed by a '(' which also needs typing in.
> Then it needs one or more Ntoks
> [60] An Ntok is a Name and from [VC Notation Attributes] it must be a
> notation (which is 'GIF')
> Then it needs a ')' and finally [53] we need a Default which
> [62] can be a #REQUIRED
This is finbe if you are using a _NOTATION _ attribute, perhaps to label the
content of the element. In your example, you wanted to include the GIF image,
not a reference to the GIF format, so you need to have an ENTITY attribute
(which cites, rather than including) an entity (which may have been declared
NDATA with an associated notation).
This is all clear in the standard, but worked examples will clearly have to go
into the FAQ!
>
> I am sorry that I have still failed to get it right after 3 goes :-)
No problem, but the next bit makes me a little annoyed, as you totally missed
my point.
> >]>
> >
> >
> >
> >
> >Finally, this is a bit overstated, the following lines could (and should)
> >all be included in any reasonable CML DTD:
> >
> > > PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION
> > Multipurpose Internet Mail Extensions::image/gif//EN">
> >
> >
> >
> >
> >
>
> The whole point here is that you *have* to have a DTD of some sort to
> manage this. An external DTD is yet another level of indirection for the
> poor DXBH.
Yes, but none of that information need be parsed except when _validating_ the
document. So an author, and a browser implementor need not deal with that
complexity. Furthermore, an application that does read the DTD can determine
_exactly_ what the notation "gif" is supposed to represent by checking the
PUBLIC and SYSTEM IDs given in the DTD. However, you can write a working
stylesheet that need know nothing about thsis stuff. It strikes me as
absolutely parallel to the case with element declarations, that are more useful
in production than in simple processing by applications that know the DTD in
question.
The same is true of notations. So what is the problem?
> I accept this - in the SGML world. But in the HTML world - whose case I am
> trying to present :-), 'my.gif' usually means a GIF and it works 10^7 times
> a day pretty well :-)
This depends on HTTP MIME typing, and can be implemented by XLL and any
sensible stylesheet language.
> >The only place the FPI need appear is in the shared declaration, the
> >stylesheet (used to actually render or trigger processing of the non-XML
> >data), can use the notation name "gif" to detect a GIF file. No FPI is
> >involved at the "browser end" (non-validating processor augmented with a
> >CML stylesheet).
>
> Since I can't sleep, let's have a little story showing what the hacker has
> to do to resolve problem. Lets' assume that we are looking for mentions of
> GIFs in an XML document.
Cute story clipped due to lack of relvance. You can use XLL and an appropoiate
styylesheet, or Entities and an appropriate stylesheet. I thought JUMBO was a
browser based on (at least a partial) DTD. So if you want to use NOTATION, you
can declare _the notations you expect to need_. If an author needs a new
notation, she can _declare_ the notations she needs. Of course, she needs to
either add to your stylesheet, or create a new one that knows what that
notation means, but this is not that hard. It's certainly no _harder_ than
creating a new MIME-type.
We both agree that a simple set of public identifiers for MIME-types would be
useful. So define one. or, if you prefer the XLL-based mechanism (which does
_not_ require delcaration of the MIME-type, and is probably much better
implemented _without_ such a declaration) then use it. If you are going to
declare the type in your document, perhaps because it is essential that you get
the correct format for a multi-format resource, then you might as well use
notation, since the markup is in fact simpler and less-redundant, unless you go
out of your way to complicate it. My modification or your example shows you how
to do this, so there's nothing stopping you.
-- David
------------------------------------------+----------------------------
David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/
| MAPA: mapping for the WWW
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dgd at cs.bu.edu Wed Dec 3 17:23:54 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun 7 16:59:17 2004
Subject: EMBED and validation
Message-ID:
On Dec 3, 9:00am, Peter Murray-Rust wrote:
> Subject: RE: EMBED and validation
> At 22:59 02/12/97 -0500, David G. Durand wrote:
> >At 12:12 AM -0000 12/3/97, Peter Murray-Rust wrote:
> Enlightenment has slowly come. I think we actually need an additional
> NOTATION as well as SRC so that the final document reads.
No. I will correct it again, levaing only _legal_ text in the message.
In the DTD, the following must appear (for XML-validating applications only):
The Inbternal subset would look like this:
]>
>
The notation _does not need_ to appear on the link. At all. It's a property of
the _entity_. Once you declatre the entity you are done.
I _think_ that you can also turn the same instance markup into XLL markup (by
using HREF instead of SRC (or can you rename it, I forget), and also changing
the entity declaration.
I have to reread the XLL spec., as it's been a while.
> Have I finally got there? It seems to make sense... (The same levels of
> indirection still apply, of course).
No, you're still making it too complicated. Look at my entity declarations, and
instance markup carefully, that's the only thing I had to change.
------------------------------------------+----------------------------
David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/
| MAPA: mapping for the WWW
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From RMcDouga at JetForm.com Wed Dec 3 19:48:06 1997
From: RMcDouga at JetForm.com (Rob McDougall)
Date: Mon Jun 7 16:59:17 2004
Subject: Problems with Entities (was re:Embed and validation)
Message-ID:
I'm seeing some disturbing similarities between Peter's problem and
mine. It reflects a general problem that I've seen in many other
languages. There seems to be two schools of thought about where
declarations should go:
(1) Declarations must be performed near the top of the file.
(2) Declarations should be performed near where they are used.
Method (1) works well for declarations that are going to be referenced
many times throughout the file, and is able to accommodate the cases
where a reference only occurs once. Method (2) works well for
declarations that are only referenced once, but works rather poorly for
ones that are referenced many times throughout the file.
Which school is right? I think the trend is to allow either. Take for
example C vs C++. C required you to define all your variables at the
top of a function, but C++ also allows you to define them just before
you use them. I don't think anyone would argue that the additional
flexibility is a bad thing.
Method (1) requires that the user be able to establish "order" in the
file (i.e. make sure the declarations occur at the top). This greatly
hinders creating files with declarations in them "on the fly". In order
to know what declarations will be used, the user must perform a first
pass on the data before writing it out. This is not always possible and
is seldom desirable.
I realise this inflexibility is something that has been inherited from
SGML, but I worry that this will impede XML's adoption into the
marketplace. This is the second time I've had to reject using XML's
entity substitution capabilities because of the need to declare all your
entities at the top of the file. I originally had wished to use the
entity substitution as a text substitution, but unfortunately, my users
will want to "re-define" the value of an entity several times throughout
the file. This cannot be done using XML entities.
The entity substitution capabilities within XML seem to get me 50% of
the way to where I want to be on a couple of different issues (file
inclusion and text substitution), but unfortunately, I've had to choose
alternative solutions because they don't get me 100% of the way. :(
Rob
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From hubick at medlib.com Thu Dec 4 01:14:37 1997
From: hubick at medlib.com (Chris Hubick)
Date: Mon Jun 7 16:59:17 2004
Subject: PI, XMLDecl, and EncodingPI
Message-ID: <34860337.ACBD09C8@medlib.com>
I am writing a recursive descent XML parser in Java and have
a couple questions....
The XML Working Draft dated 17-November-1997 states:
[24] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)?
[28] Misc ::= Comment | PI | S
[19] PI ::= '' Name (S (Char* - (Char* '?>' Char*)))? '?>'
[25] XMLDecl ::= ''
[79] EncodingPI ::= ''
Within a PI is the Name "xml" reserved? If it is, should
there not be a [wfc] on PI stating so?
By the current definition any XMLDecl and EncodingPI is also
a valid PI. In a prolog an XMLDecl is optional, and is followed
by Misc, which includes PI.
Ok, so I have can have an XML file with no XMLDecl
(it's optional) followed by "" which
matches PI, in my Misc*. And this is legal? My parser will
take this just fine as such, but I wonder about the others.
It makes detecting a bad XMLDecl impossible! My parser will just
say fine, that wasn't an XMLDecl, and feed it to Misc, which will
most likely match (or possibly spew) it as a PI.
Shouldn't [19] PI have an S? at the end before '?>' ?
Also shouldnt PCData be:
[17] PCData ::= [^<&]+
rather than the current:
[17] PCData ::= [^<&]*
[44] content ::= (element | PCData | Reference | CDSect | PI | Comment)*
because:
This is a test
In my recursive descent parses to:
This is a test
...
And we get infinite matches on a zero length PCData.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Thu Dec 4 01:44:07 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:17 2004
Subject: PI, XMLDecl, and EncodingPI
Message-ID: <3.0.32.19971203174532.00986d90@pop.intergate.bc.ca>
At 06:11 PM 03/12/97 -0700, Chris Hubick wrote:
> Within a PI is the Name "xml" reserved? If it is, should
>there not be a [wfc] on PI stating so?
In fact, in the latest rev, we wired it right into the grammar.
> By the current definition any XMLDecl and EncodingPI is also
>a valid PI. In a prolog an XMLDecl is optional, and is followed
>by Misc, which includes PI.
> Ok, so I have can have an XML file with no XMLDecl
>(it's optional) followed by "" which
>matches PI, in my Misc*. And this is legal?
Nope. And the grammar will getcha, because this no longer matches PI.
>Shouldn't [19] PI have an S? at the end before '?>' ?
No, because Char includes S
>[17] PCData ::= [^<&]+
>rather than the current:
>[17] PCData ::= [^<&]*
>In my recursive descent parses to:
It can't be +, because the empty string must match PCData. You'll just
have to figure out how to stop descending.
-Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Thu Dec 4 03:20:58 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:17 2004
Subject: Problems with Entities (was re:Embed and validation)
Message-ID: <199712040320.OAA20074@jawa.chilli.net.au>
> From: Rob McDougall
> (1) Declarations must be performed near the top of the file.
> (2) Declarations should be performed near where they are used.
There has been a proposal for inline declarations recently. They would
use declaration syntax but be inside a processing instruction, e.g.
(This, I believe, will not be in WebSGML, now being finalized. But it
may make it through the big SGML revision which looms.)
> I realise this inflexibility is something that has been inherited from
> SGML, but I worry that this will impede XML's adoption into the
> marketplace. This is the second time I've had to reject using XML's
> entity substitution capabilities because of the need to declare all your
> entities at the top of the file. I originally had wished to use the
> entity substitution as a text substitution, but unfortunately, my users
> will want to "re-define" the value of an entity several times throughout
> the file. This cannot be done using XML entities.
In XML, the system identifier of an entity is a URI. This can include
a query. The query can trigger an update of the value.
There is no way to update the value of an external entity dynamically in
XML, but that is because it is not a programming language. However, you
can markup that you want updates to take place. For example, if the
text was a running header, you could have an element like
blah
and make your software update the entity every time it was found. If you
want to embed this more clearly into your document, you could use a processing
instruction, for example
Use entities to bring data in and PIs to send messages out.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Thu Dec 4 12:57:44 1997
From: mecom-gmbh at mixx.de (james anderson too)
Date: Mon Jun 7 16:59:17 2004
Subject: Problems with Entities (was re:Embed and validation)
References: <199712040320.OAA20074@jawa.chilli.net.au>
Message-ID: <3486A9E0.CE1A923C@mixx.de>
why do inline declarations need an additional operator? what's wrong with
allowing element - or, in a similar sense, entity declarations - to appear with
their standard syntax?
Rick Jelliffe wrote:
> > From: Rob McDougall
>
> > (1) Declarations must be performed near the top of the file.
> > (2) Declarations should be performed near where they are used.
>
> There has been a proposal for inline declarations recently. They would
> use declaration syntax but be inside a processing instruction, e.g.
>
>
> (This, I believe, will not be in WebSGML, now being finalized. But it
> may make it through the big SGML revision which looms.)
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Thu Dec 4 15:48:03 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:17 2004
Subject: Problems with Entities (was re:Embed and validation)
Message-ID: <199712041547.CAA11063@jawa.chilli.net.au>
> From: james anderson too
> why do inline declarations need an additional operator? what's wrong with
> allowing element - or, in a similar sense, entity declarations - to appear with
> their standard syntax?
That all has to be discussed. (I really shouldnt have mentioned that, I find
it confusing enough trying to keep up with the latest XML draft, let alone
all the suggestions being considered for SGML! There is a big back catalog of
changes that WG4 has approved for the revision. The ones with a specific
correlation to XML have been expedited for WebSGML [the update to SGML]
so that XML and SGML will be in synch.)
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mtbryan at sgml.u-net.com Thu Dec 4 23:12:20 1997
From: mtbryan at sgml.u-net.com (Martin Bryan)
Date: Mon Jun 7 16:59:17 2004
Subject: Position Statement on FPIs sought
Message-ID: <01bd00f6$d5689e40$LocalHost@default>
Charles
For the record, David Durand has pointed out this week that:
ISO 9070 is very clear on the subject, and I quote:
"3.10 Owner name: the portion of a public identifier that names its owner.
NOTES
.... 13 The owner of a public identifier is not necessarily the owner of
the object it identifies"
and from the introduction:
"... and an 'owner name', which identifies the originator of the public
identifier"
ISO 8879 defines owner identifier as:
"The portion of a public identifier that identifies the owner or orignator
of public text"
and defines public text as:
"The text that is known beyond the context of a single document..."
There would seem to be a conflict here. 8879's two rules can be conflated to
read "_identifies the owner of the text_ that is known beyond the context of
a single document" whereas 9070's definitions can be conflated to read "the
portion of a public identifier that _names the owner of a public
identifier_, who is not necessarily the owner of the object it identifies".
These definitions seem to be contradictory.
Additionally David has said:
"IDN is not in 9070 rev 2, and thus is not suitable _de jure_; it is also
unsuitable _de facto_, since domain names can be reused by different
organizations. Unless Internet policies and 9070 have both changed, I think
this is also wrong."
and
"and 9070 is more recent, normatively cited by 8879, and edited by the same
editor; so I am inclined to prefer the 9070 reading."
We need to review the relationship between these two standards.
Martin
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Dec 5 10:27:02 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:17 2004
Subject: Vertical bar character
In-Reply-To:
Message-ID: <3.0.1.16.19971205111400.2187b142@pop3.demon.co.uk>
I am building a module to parse DTD content models and have a strange (to
me) problem on java DOS with the vertical bar character in the command
line. I am using W95, JDK1.02 and the DOS prompt window.
I type:
java jumbo.sgml.ContentChunk (A|B)
using the 'vertical bar' character on my keyboard (the 'or' symbol in Java/C).
I assume this has decimal value 124 (from 'man ascii').
Under jview, this character is created with a value of 166.
Under java it is created with a value of -90
If I quote the argument under java (i.e. "(A|B)" ), I get a value of 65446
[corresponds to 2^16 - 90]
A - is this symptomatic of a general problem (e.g. something in Unicode).
B - how can I 'quote' a '|' symbol in the DOS commandline?
P.
There is also a character 214 whose glyph seems to be a vertical bar with a
break in it. I assume this is unrelated to the problem (even though it is
what my keyboards display :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Fri Dec 5 11:41:34 1997
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 16:59:17 2004
Subject: Vertical bar character
Message-ID: <01bd0172$5fab4000$1e09e391@mhklaptop.bra01.icl.co.uk>
>I am building a module to parse DTD content models and have a strange (to
>me) problem on java DOS with the vertical bar character in the command
>line. I am using W95, JDK1.02 and the DOS prompt window.
Unicode and Latin-1 have:
VERTICAL BAR: 124
BROKEN BAR: 166
I believe that in the original ASCII, 124 was called vertical line, but many
printers
displayed it as a broken line. In the IBM PC-DOS code set 850, code 124
became
broken line, while in Latin-1 it remained as vertical bar with the new code
166
(your minus 90) being allocated to broken bar.
This means that software that is converting files between Latin-1 (or
UNICODE, or
Microsoft "ANSI") and PC-DOS code page 850 ought to perform a conversion on
these characters.
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Fri Dec 5 13:27:52 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 16:59:17 2004
Subject: MSXML 1.6 problem
Message-ID: <199712051327.NAA00475@mail.iol.ie>
I have just installed MSXML 1.6. I can run the applet viewer etc. from IE 4
but jview is giving
me a problem:-
c:\msxml>jview msxml samples\tire.xml
ERROR: java.lang.NoSuchMethodError: com/ms/xml/om/Document: method
setCaseInsenst
ive(Z)V not found
Any ideas?
Sean Mc Grath
sean at digitome dot com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From adrian at solero.force9.net Fri Dec 5 19:17:56 1997
From: adrian at solero.force9.net (Adrian Orlowski)
Date: Mon Jun 7 16:59:18 2004
Subject: Seems it's all been worthwhile :)
Message-ID: <199712051851.TAA03832@relay2.force9.net>
Some news.
In the July 1997 issue of EXE:The Software Developers'
Magazine there was an article of mine on XML which I ended
with this speculation:
"It's possible to look away from the small print of the XML
proposal to the larger picture of real world documents
perhaps sceptical of the changes being asked for [by W3C].
Arguably though XML is the best attempt yet to move on from
so-called plain text as the lowest common denominator for
document interchange... somewhere in my mind's dark recesses
I recall that Microsoft Word is based on an implicit
structured outline model of documents; what price Word 9 or
10 coming XML-enabled with a DTD to cover all documents ever
produced by versions 1 through 8?"
(If you would like a copy of the article point your whatsit
at http://www.dotexe.co.uk/ or email me and I will send you
the SGML original. Please allow for the fact that it was
written February based on the 1st XML draft.)
The news is that this scenario might not be that far away:
"Microsoft CEO Bill Gates recently said that XML will be the
data format for Office and HTML will be the display
standard." I have the following reference for this:
Vendors to push XML as all-purpose Web middleware format
http://www.infoworld.com/cgi-bin/displayStory.pl?97121.exml.htm
Some news. If you have that killer XML app in the works,
you'd better start looking to your laurels.
-- adrian
Adrian Orlowski
adrian@solero.force9.net
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Robert.H.Dolin at kp.ORG Fri Dec 5 23:13:50 1997
From: Robert.H.Dolin at kp.ORG (Dolin,Robert H)
Date: Mon Jun 7 16:59:18 2004
Subject: Message Length vs Processing Speed
Message-ID: <01BD0190.2C9A9AD0@gren-exch-1.kpscal.org>
Greetings XML-DEV list,
We've been working on an SGML (?XML) syntax for HL7 messages, and one of
the significant issues that has come us is the concern over message
length.
Here's a message I posted to the HL7 SGML/XML Listserver suggesting how
we might try to optimize the length/speed consideration. Would
appreciate any additional comments.
Thanks,
Bob
Bob Dolin, MD
Kaiser Permanente
Robert.H.Dolin@kp.org
-----------------------
>----------
>From: Dolin, Robert H
>Sent: Tuesday, December 02, 1997 11:24 PM
>To: 'HL7-SGML'
>Cc: Dolin,Robert H
>Subject: RE: DATATAG minimization
>
>I appreciate all the feedback on this 'innocent' posting of mine.
>
>Perhaps I should point out that just because we are looking for optimal
>minimization techniques, there need not be anything in a DTD that
>precludes it from being XML-compliant - well, at least this is partly
>correct... Can a DTD be part XML-compliant and part non-XML-compliant,
>and can the non-XML-compliant part be used only by those who need to
>minimize message length??
>
>As an aside, there are several knowledgeable and respected members of
>the HL7 community who continue to feel that message length is of
>significant concern. So, where length is of major concern, we can
>examine minimization techniques. Where the use of XML or where message
>parsing speed is of major concern, fully normalized messages/documents
>can be passed.
>
>And as John Spinosa points out, there may be a tradeoff in message
>length versus speed of parsing/validating messages.
>
>Here's an example DTD based on the HL7 Version 3.0 Draft to show how we
>might possibly enable both - tiny messages where length is important,
>and XML-compliant messages where parsing speed is important:
>
>(This DTD uses SHORTREF minimization, and actually can make messages
>SMALLER then their ER7 representation. The specifications for the
>SHORTREF can be added to an existing (XML-Compliant) DTD without
>changing the portion of the DTD that was already there.)
>
>Example 1: A sample ER7 message (based on a draft of the HL7 Version 3
>specifications) (361 Characters)
>
>Example 2: A sample DTD based on the same message used for Example 1.
>
>Example 3: A fully normalized SGML message conveying the same
>information as in Example 1, based on the DTD in Example 2. (708
>Characters).
>
>Example 4: The same SGML message as in Example 3, minimized using
>SHORTREF. (354 characters).
>
>Example 5: The DTD from Example 2, along with the SHORTREF mappings
>appended, which allow an SGML parser to take the minimized message in
>Example 4 and convert it to the fully normalized message in Example 3.
>
>
> ------------------------------------------------------------------------
>Example 1: A sample ER7 message (based on a draft of the HL7 Version 3
>specifications) (361 Characters) [there may be errors in my use of the
>ER7 syntax]
>
>MSH|~
>PE|X703421|I||~
>BC|IPChoice|I~
>IPE|3|4~
>PADM|Emergency Dept|9708170430|BAPT|{Jones^Houston}~
>PTP|Dallas, TX|HS~
>BL|Acnt~
>PTBA|X746343|198768353|D3|{X3^Trauma}~
>NX~
>PTBA|M1|D|D4|{Martha^Steward}~
>EL|Acnt~
>PCP|ABX1234567|CONS||{Jimmie^Steward}~
>BL|PartProv~
>EP|~
>HCP|19283746X-879||D2|{DD-15264^SNM}~
>NX~
>EP|ISO 8879 SGML~
>HCP|X12-EDI-HL7-XML|1999|X12-13|{F-12345^SNM}~
>EL|Acnt~
>
>
> ------------------------------------------------------------------------
>Example 2: A sample DTD based on the same message used for Example 1.
>
>
>
>
>
>
>
>
>
> A CDATA #IMPLIED
> B CDATA #IMPLIED>
>
> ------------------------------------------------------------------------
>Example 3: A fully normalized SGML message conveying the same
>information as in Example 1, based on the DTD in Example 2. (708
>Characters).
>
>
>X703421 I
>
>3 4
>
>Emergency Dept 9708170430 BAPT A="Jones" B="Houston">
>
>
>
>Dallas, TX HS
>
>X746343 198768353 D3
>
>
>M1 D D4
>
>
>ABX1234567 CONS B="Steward">
>
>
>
>
>19283746X-879 D2
>
>
>ISO 8879 SGML
>
>X12-EDI-HL7-XML 1999 X12-13 B="SNM">
>
>
> ------------------------------------------------------------------------
>Example 4: The same SGML message as in Example 3, minimized using
>SHORTREF. (354 characters).
>
>
>|X703421|I||
>
>|3|4|
>
>|Emergency Dept|9708170430|BAPT|
>|Dallas, TX|HS|
>
>|X746343|198768353|D3|
>|M1|D|D4|
>|ABX1234567|CONS||
>||
>
>|19283746X-879||D2|
>|ISO 8879 SGML|
>
>|X12-EDI-HL7-XML|1999|X12-13|
>
> ------------------------------------------------------------------------
>Example 5: The DTD from Example 2, along with the SHORTREF mappings
>appended, which allow an SGML parser to take the minimized message in
>Example 4 and convert it to the fully normalized message in Example 3.
>
>">
>">
>">
>">
>">
>">
>">
>">
>">
>">
>">
>">
>">
>">
>">
>">
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> A CDATA #IMPLIED
> B CDATA #IMPLIED>
>
>>
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 6 10:20:25 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:18 2004
Subject: Vertical bar character
In-Reply-To: <01bd0172$5fab4000$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID: <3.0.1.16.19971206110359.325f9208@pop3.demon.co.uk>
At 11:38 05/12/97 -0000, Michael Kay wrote:
>>I am building a module to parse DTD content models and have a strange (to
>>me) problem on java DOS with the vertical bar character in the command
>>line. I am using W95, JDK1.02 and the DOS prompt window.
>
>
>Unicode and Latin-1 have:
>VERTICAL BAR: 124
>BROKEN BAR: 166
>
>I believe that in the original ASCII, 124 was called vertical line, but many
>printers
>displayed it as a broken line. In the IBM PC-DOS code set 850, code 124
>became
>broken line, while in Latin-1 it remained as vertical bar with the new code
>166
>(your minus 90) being allocated to broken bar.
Thanks. This helps a good deal. I'm mystified as to why 166 (aka 'Broken
bar') is displayed as a minute formless squiggle and 214 is displayed as a
broken bar but I can survive without that knowledge
>
>This means that software that is converting files between Latin-1 (or
>UNICODE, or
>Microsoft "ANSI") and PC-DOS code page 850 ought to perform a conversion on
>these characters.
Yes. It performs an unwanted one :-). It looks like a problem between Java
and the DOS commandline. What particularly worried me was that simple Java
code using 'char' translated this character into 65446, which presumably
has a completely different meaning in Unicode. IOW there is a danger that
corruptions could take place.
P.
>
>Mike Kay
>
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at light.demon.co.uk Sat Dec 6 11:15:22 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun 7 16:59:18 2004
Subject: Seems it's all been worthwhile :)
In-Reply-To: <199712051851.TAA03832@relay2.force9.net>
Message-ID:
In message <199712051851.TAA03832@relay2.force9.net>, Adrian Orlowski
writes
>...
>The news is that this scenario might not be that far away:
>
>"Microsoft CEO Bill Gates recently said that XML will be the
>data format for Office and HTML will be the display
>standard." I have the following reference for this:
>Vendors to push XML as all-purpose Web middleware format
>http://www.infoworld.com/cgi-bin/displayStory.pl?97121.exml.htm
Excellent news. I make/made a similar plea in 'Presenting XML' ("XML-
Based Authoring", p.44-47).
Richard.
Richard Light
SGML/XML and Museum Information Consultancy
richard@light.demon.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Sat Dec 6 11:54:21 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:18 2004
Subject: Vertical bar character
Message-ID: <199712061154.WAA29576@jawa.chilli.net.au>
> From: Peter Murray-Rust
> Thanks. This helps a good deal. I'm mystified as to why 166 (aka 'Broken
> bar') is displayed as a minute formless squiggle and 214 is displayed as a
> broken bar but I can survive without that knowledge
> >
> >This means that software that is converting files between Latin-1 (or
> >UNICODE, or
> >Microsoft "ANSI") and PC-DOS code page 850 ought to perform a conversion on
> >these characters.
>
> Yes. It performs an unwanted one :-). It looks like a problem between Java
> and the DOS commandline. What particularly worried me was that simple Java
> code using 'char' translated this character into 65446, which presumably
> has a completely different meaning in Unicode. IOW there is a danger that
> corruptions could take place.
This must be a bug. 65446 = FFA6, but I figure that 166=00A6 which is suspiciously
close. FFA6 is a naughty Korean character, so I guess someone has
programmed wrong.
I dont know whay 214 = D6 is displayed as a broken bar. Have a look in
the keycaps application whether 214= D6 is indeed a broken bar in the
font you are using. (It is also quite possible for a font designer to
decide to use a broken bar glyph where a single bar is wanted, and vice
versa. If that is the case, change the font to one that isnt broken.)
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From adrian at solero.force9.net Sun Dec 7 18:35:41 1997
From: adrian at solero.force9.net (Adrian Orlowski)
Date: Mon Jun 7 16:59:18 2004
Subject: Seems it's all been worthwhile :)
In-Reply-To: <199712051851.TAA03832@relay2.force9.net>
Message-ID: <199712071815.TAA10431@relay1.force9.net>
On 5 Dec 97 at 19:39, Adrian Orlowski wrote:
> In the July 1997 issue of EXE:The Software Developers'
> Magazine there was an article of mine on XML
> (If you would like a copy of the article point your whatsit
> at http://www.dotexe.co.uk/ or email me and I will send you
Apologies to anyone on a wild goose chase to the above URL.
(It should have been http://www.exe.co.uk -- except you won't
find it there).
It is now available at http://www.solero.force9.co.uk/
-- adrian
Adrian Orlowski
adrian@solero.force9.net
-- ------------------------------------------ --
Adrian Orlowski adrian@solero.force9.net
Information Systems Software Ltd
20 Andover Road, Newbury, Berkshire RG14 6LR, UK
Voice/Fax: +44(0)1635 49574
E-mail: adrian@solero.force9.net
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ht at cogsci.ed.ac.uk Mon Dec 8 12:59:50 1997
From: ht at cogsci.ed.ac.uk (Henry Thompson)
Date: Mon Jun 7 16:59:18 2004
Subject: Message Length vs Processing Speed
In-Reply-To: "Dolin,Robert H"'s message of Fri, 5 Dec 1997 15:12:16 -0800
References: <01BD0190.2C9A9AD0@gren-exch-1.kpscal.org>
Message-ID:
In the words of our former president, "We could do that, but it would
be wrong." You SGML is impeccable, but without understanding why
people care about message length it's very hard to address the larger
issues you raise. Could you elaborate a bit on the numbers and
attitudes involved, i.e. average message size now (is your example
typical?), anticipated traffic volume, size of archives, etc.?
ht
--
Henry S. Thompson, Human Communication Research Centre, University of Edinburgh
2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
URL: http://www.cogsci.ed.ac.uk/~ht/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dkuhlman at netcom.com Tue Dec 9 00:55:25 1997
From: dkuhlman at netcom.com (G. David Kuhlman)
Date: Mon Jun 7 16:59:19 2004
Subject: MSXML 1.6 problem
In-Reply-To: <199712051327.NAA00475@mail.iol.ie> from "Sean Mc Grath" at Dec 5, 97 01:27:42 pm
Message-ID: <199712090055.QAA17149@netcom.netcom.com>
>
> I have just installed MSXML 1.6. I can run the applet viewer etc. from IE 4
> but jview is giving
> me a problem:-
>
> c:\msxml>jview msxml samples\tire.xml
>
> ERROR: java.lang.NoSuchMethodError: com/ms/xml/om/Document: method
> setCaseInsenst
> ive(Z)V not found
>
>
> Any ideas?
Add a path to the msxml classes. Something like:
jview /cp d:\msxml\classes msxml -d samples\Tire.xml
By the way, is anyone successfully running msxml under Linux? With
which version of the JDK? 1.1.3? I'm interested in comments on
this.
-- Dave
>
> Sean Mc Grath
> sean at digitome dot com
>
>
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eric at macweb.com Tue Dec 9 04:44:17 1997
From: eric at macweb.com (Eric Bickford)
Date: Mon Jun 7 16:59:19 2004
Subject: entity arrays/quotes
Message-ID: <1330508586-10638612@macweb.com>
I'm investigating the XML spec for conformance by my CGI, and I have a
couple questions:
1) I was surprised to see that a single quote is valid for attribute
values (as apposed to double quotes). Is this new with XML, or does HTML
also allow single quotes?
2) Is there some standard way to declare an ENTITY that includes an array
of values? To be specific, I'd like to include a list of values in a
document so my parser can build a SELECT menu of OPTIONS.
3) Does anyone have an opinion on how &entities; can best be used with a
database application? For example, assume you declare in your DTD a list
of &entities;, one for each database field name/value. If we are to
expect browsers to parse an xml document with entities, how can a found
table or hit list of values get substituted?
Eric Bickford eric@macweb.com
Web Broadcasting Corporation http://macweb.com/
Web Essentials for FileMaker Pro WEB FM, PICT FM, LOG FM, TAG FM
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Tue Dec 9 04:53:29 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:19 2004
Subject: entity arrays/quotes
Message-ID: <3.0.32.19971208205259.00c54fe0@pop.intergate.bc.ca>
At 08:44 PM 08/12/97 -0800, Eric Bickford wrote:
>1) I was surprised to see that a single quote is valid for attribute
>values (as apposed to double quotes). Is this new with XML, or does HTML
>also allow single quotes?
No, and yes.
>2) Is there some standard way to declare an ENTITY that includes an array
>of values? To be specific, I'd like to include a list of values in a
>document so my parser can build a SELECT menu of OPTIONS.
No, but you could have an entity whose value was
foo bar etc..
>3) Does anyone have an opinion on how &entities; can best be used with a
>database application? For example, assume you declare in your DTD a list
>of &entities;, one for each database field name/value. If we are to
>expect browsers to parse an xml document with entities, how can a found
>table or hit list of values get substituted?
I'm not sure entities are the way to go for this. WHy not just generate
your HTML or XML on the fly the way any number of excellent products
do right now? -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eric at macweb.com Tue Dec 9 06:13:15 1997
From: eric at macweb.com (Eric Bickford)
Date: Mon Jun 7 16:59:19 2004
Subject: entity arrays/quotes
Message-ID: <1330503248-10959761@macweb.com>
>>3) Does anyone have an opinion on how &entities; can best be used with a
>>database application? For example, assume you declare in your DTD a list
>>of &entities;, one for each database field name/value. If we are to
>>expect browsers to parse an xml document with entities, how can a found
>>table or hit list of values get substituted?
>
>I'm not sure entities are the way to go for this. WHy not just generate
>your HTML or XML on the fly the way any number of excellent products
>do right now? -Tim
But these database products don't currently comply with XML markup
standards. Granted they're server-side apps so it's not a real problem,
while XML seems largely designed from a client-side perspective. But I
think it important to extend the syntax standards of XML to server-side
apps as well. For example, a database &fieldname; entity could be the
proper markup for a database CGI to insert field data rather than perhaps
an .
Which leads to another question... I doubt declared &entity; names can
have a space (e.g. &field name;)? Can they?
I'd love to hear from anyone with opinions on how best to meld XML
templates with databases services.
Eric Bickford eric@macweb.com
Web Broadcasting Corporation http://macweb.com/
Web Essentials for FileMaker Pro WEB FM, PICT FM, LOG FM, TAG FM
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Tue Dec 9 07:57:41 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:20 2004
Subject: MSXML 1.6 problem
Message-ID: <004b01bd0477$9cfce280$0100007f@localhost>
Try MSXML 1.8 which just came out. It includes the portability fix.
Only problem with 1.8 is that it is missing an interface, namely
com.xml.util.XMLStreamReader. XMLStreamReader, BTW, is the interface used
to connect to native input stream class. I told Chris Lovett about it last
night but I suspect he might be on vacation (a well deserved one at that :)
because I have yet to receive a reply. If he did go on vacation, I would
say it was good timing since the new XML spec was released today. Imagine
sitting on a white sandy beach reading some boring spec (no offense, Tim
:-).
Meanwhile, I have attached the java file and class file for XMLStreamReader
interface. I don't think Chris's version is different from mine because
there is nothing much to change.
Merry Christmas to y'all and good luck to your gold cards,
Don "JStud" Park
Master Consultant
donpark@quake.net
Come visit my XML Example Catalog at
http://www.quake.net/~donpark/xmlcat.html
-----Original Message-----
From: G. David Kuhlman
To: digitome@iol.ie
Cc: xml-dev@ic.ac.uk
Date: Monday, December 08, 1997 4:58 PM
Subject: Re: MSXML 1.6 problem
>>
>> I have just installed MSXML 1.6. I can run the applet viewer etc. from IE
4
>> but jview is giving
>> me a problem:-
>>
>> c:\msxml>jview msxml samples\tire.xml
>>
>> ERROR: java.lang.NoSuchMethodError: com/ms/xml/om/Document: method
>> setCaseInsenst
>> ive(Z)V not found
>>
>>
>> Any ideas?
>
>Add a path to the msxml classes. Something like:
>
> jview /cp d:\msxml\classes msxml -d samples\Tire.xml
>
>By the way, is anyone successfully running msxml under Linux? With
>which version of the JDK? 1.1.3? I'm interested in comments on
>this.
>
>-- Dave
>
>>
>> Sean Mc Grath
>> sean at digitome dot com
>>
>>
>>
>>
>> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>> (un)subscribe xml-dev
>> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
>> subscribe xml-dev-digest
>> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>>
>>
>>
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: XMLStreamReader.java
Type: application/octet-stream
Size: 249 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971209/5424e304/XMLStreamReader.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: XMLStreamReader.class
Type: application/octet-stream
Size: 317 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971209/5424e304/XMLStreamReader-0001.obj
From ak117 at freenet.carleton.ca Tue Dec 9 12:26:32 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:20 2004
Subject: Public Release: PSGML XML Patches
Message-ID: <199712091226.HAA00286@unready.microstar.com>
I'm happy to announce a new, public version of my XML patches for
Lennart Staflin's PSGML (an SGML mode for Emacs). You can download
the patches from the following URL:
http://home.sprynet.com/sprynet/dmeggins/psgmlxml-19971208.zip
These patches allow you to use PSGML in Emacs as a non-validating XML
editor: all names will be case-sensitive, many (but not all) forbidden
constructions will generate errors, all attribute values will be
quoted, and PSGML will use the variant XML delimiters.
There are also two changes that are useful for full SGML as well as
XML:
- these patches add support for multiple ATTLIST declarations for the
same associated element type
- the variable sgml-namecase-general allows you to make element type names,
attribute names, and keywords case-sensitive in full SGML as well
You will need PSGML 1.0.1 to use these patches:
http://www.lysator.liu.se/projects/about_psgml.html
Install PSGML 1.0.1 first, then install these patches over it. If you
are not using PSGML's Makefile, make certain that you byte-compile
psgml-parse.el before psgml-dtd.el or psgml-edit.el.
Enjoy,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Tue Dec 9 16:08:17 1997
From: mecom-gmbh at mixx.de (admin/Mecom)
Date: Mon Jun 7 16:59:20 2004
Subject: MSXML 1.6 problem
Message-ID:
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Tue Dec 9 16:08:24 1997
From: mecom-gmbh at mixx.de (admin/Mecom)
Date: Mon Jun 7 16:59:20 2004
Subject: MS XML parser only works with IE...
Message-ID:
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SMANGAT at novell.com Tue Dec 9 17:46:58 1997
From: SMANGAT at novell.com (Satwinder Mangat)
Date: Mon Jun 7 16:59:20 2004
Subject: RTF merge
Message-ID:
Hi,
I'm looking for a RTF merge utiity to merge 2 or more files. RTF header in theses files have to be same and is removed from 2nd file onwards. I know it is pretty easy to write but why to spend time if it is available.
Let me know if you have it?
Thanks
Satwinder Mangat
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From hubick at medlib.com Tue Dec 9 19:44:51 1997
From: hubick at medlib.com (Chris Hubick)
Date: Mon Jun 7 16:59:20 2004
Subject: CharData
Message-ID: <348D9ED8.9E109B4E@medlib.com>
The proposed XML Spec (http://www.w3.org/TR/PR-xml-971208) states:
-----------
In the content of elements, character data is any string of characters which
does not contain the start-delimiter of any markup. In a CDATA section,
character data is any string of characters not including the
CDATA-section-close delimiter, "]]>".
[15] CharData ::= [^<&]* - ([^<&]* ']]>' [^<&]*)
[19] CDSect ::= CDStart CData CDEnd
[21] CData ::= (Char* - (Char* ']]>' Char*))
[43] content ::= (element | CharData | Reference | CDSect | PI | Comment)*
-----------
Why this was changed makes no sense to me? According to the productions,
content can no longer contain "]]>". The text seems to imply that the
CharData production should be used for CDSect as well, but then why is
CDSect not:
[19] CDSect ::= CDStart CharData CDEnd
All this change seems to have done is disallow "]]>" in element content!
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Tue Dec 9 21:45:05 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:20 2004
Subject: PR-xml-971208
Message-ID:
Congratulations to the XML Working Group and especially the editors for
yesterday's publication of the Proposed Recommendation.
I've only had a brief few moments to examine the new standard (I'm in Canada
on business, very business-busy), but it doesn't look that different from the
11/17 WD. Are there any significant changes lurking? I haven't seen any so
far, and suspect that there aren't. As the document won't be final until
January at least, it may be a little premature to ask, but I'm curious. The
book's in press already anyway (based on WD 8/7/97 and corrected for WD
11/17/97) but I'd like to know the expected lifetime of the documents I've
created so far.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cgingell at lavasys.com Tue Dec 9 22:50:48 1997
From: cgingell at lavasys.com (Craig Gingell)
Date: Mon Jun 7 16:59:20 2004
Subject: Microsoft's JScript XML Sample
Message-ID:
I am keen to exploit the potential of XML in a project I am currently
working on.
I have visited the Microsoft website page
http://www.microsoft.com/msdn/sdk/inetsdk/help/itt/xml/overview/Sample_4
.htm#Sample_4
and cut and pasted the JScript to my own file. Here is my file -
I then enter
http://www.microsoft.com/standards/xml/samples/Email.xml
as the XML file I wish to parse.
I then get the following error -
An error has occurred on the script on this page
Line 134
Char 5
Error The tag is invalid
Code 0
Has anyone else experienced this problem, or is it just me ?
I am running Microsoft Internet Explorer 4.0 version 4.71.1712.6 on NT
Can anyone help me ?
Craig Gingell
Senior Software Developer
Lava Systems Inc
cgingell@lavasys.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971209/98707c7a/xmlp.html
From ak117 at freenet.carleton.ca Wed Dec 10 00:22:09 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:20 2004
Subject: [NEW] AElfred: a small, fast XML Parser
Message-ID: <199712100018.TAA00263@unready.microstar.com>
Microstar Software Ltd. is happy to announce Ælfred (AElfred), a
small, fast, DTD-aware Java-based XML parser, especially suitable for
use in Java applets.
We've designed Ælfred for Java programmers who want to add XML support
to their applets and applications without doubling their size: Ælfred
consists of only two class files, with a total size of approximately
24K, and requires very little memory to run. Ælfred also implements
Java's java.lang.Runnable interface and a zero-argument constructor,
so it's easy to start Ælfred as a separate thread or to adapt it for
use as a JavaBean.
Ælfred is free for both commercial and non-commercial use, and COMES
WITH NO WARRANTEE. You can download a copy of version 1.0 (with
source code) from the following URL:
http://www.microstar.com/XML/index.htm
There is also an applet to let you try Ælfred online in your own
browser before download it.
*****************
DESIGN PRINCIPLES
*****************
1. Ælfred must be as small as possible, so that it doesn't add too
much to your applet's download time.
STATUS: Ælfred is currently about 24K in total, and we're still
looking for ways to shrink it further.
2. Ælfred must use as few class files as possible, to minimize the number
of HTTP connections necessary for applets.
STATUS: Ælfred consists of only two class files, the main parser
class (XmlParser.class) and a small interface for your own program
to implement (XmlProcessor.class). All other classes in the
distribution are just demonstrations.
3. Ælfred must be compatible with most or all Java implementations
and platforms.
STATUS: Ælfred uses only JDK 1.0.2 features, and we have tested it
successfully with the following Java implementations: JDK 1.1.1
(Linux), jview (Windows NT), Netscape 4 (Linux and Windows NT),
Internet Explorer 3 (Windows NT), and Internet Explorer 4 (Windows
NT).
4. Ælfred must use as little memory as possible, so that it does not take
away resources from the rest of your program.
STATUS: On a P75 Linux system, using JDK 1.1.1, running Ælfred
(with a 4MB XML document) takes only 2MB more memory than running
a simple "Hello world" Java application. Because Ælfred does not
build an in-memory parse tree, you can run it on very large input
files using little or no extra memory.
5. Ælfred must run as fast as possible, so that it does not slow down
the rest of your program.
STATUS: On a P75 Linux system, using JDK 1.1.1 (without a JIT
compiler), Ælfred parses XML test files at about 50K/second. On a
P166 NT workstation, using jview, Ælfred parses XML test files at
about 1MB/second.
6. Ælfred must produce correct output for well-formed and valid
documents, but need not reject every document that is not valid or
not well-formed.
STATUS: Ælfred is DTD-aware, and handles all current XML features,
including CDATA and INCLUDE/IGNORE marked sections, internal and
external entities, proper whitespace treatment in element content,
and default attribute values. It will sometimes accept input that
is technically incorrect, however, without reporting an error (see
README), since full error reporting would make the parser much
larger.
7. Ælfred must provide full internationalisation from the first release.
STATUS: Ælfred supports Unicode to the fullest extent possible in
Java. It correctly handles XML documents encoded using UTF-8,
UTF-16, ISO-10646-UCS-2, ISO-10646-UCS-4 (as far as surrogates
allow), and ISO-8859-1 (ISO Latin 1/Windows). With these
character sets, Ælfred can handle all of the world's major (and
most of its minor) languages.
***********************
ABOUT THE NAME "Ælfred"
***********************
Ælfred the Great (AElfred in ASCII) was king of Wessex, and at least
nominally of all England, at the time of his death in 899AD. Ælfred
introduced a wide-spread literacy program in the hope that his people
would learn to read English, at least, if Latin was too difficult for
them. This Ælfred hopes to bring another sort of literacy to Java,
using XML, at least, if full SGML is too difficult.
The initial "Æ" (AE ligature) is also a reminder that XML is not
limited to ASCII.
Enjoy!
David
---
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Wed Dec 10 03:33:28 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:20 2004
Subject: entity arrays/quotes
Message-ID: <199712100333.OAA24843@jawa.chilli.net.au>
> From: Eric Bickford
> I'd love to hear from anyone with opinions on how best to meld XML
> templates with databases services.
The system identifier of an entity is a URI. So you can embed a
query in that.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Ingo.Macherius at TU-Clausthal.de Wed Dec 10 07:26:19 1997
From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius)
Date: Mon Jun 7 16:59:20 2004
Subject: msxml 1.8 questions
Message-ID: <199712100726.IAA08523@sinfonix.rz.tu-clausthal.de>
Here's a list with problems regarding msxml 1.1.8.
1) Fast mode
Did anyone get msxml 1.8 to work with "-f" set ?
I tried with Sun-JDK 1.1.{2,3} on Linux, Sun-JDK 1.1.5 on Win95
and latest jview. All fail to parse any XML-document.
With Sun-JDK:
[inim@voyager samples]$ java msxml -f Hamlet.xml
java.lang.NoClassDefFoundError: NullElementFactory
at msxml.main(msxml.java)
With jview:
c:\temp\samples> jview msxml -f Hamlet.xml
Error: java.lang.NoClassDefFoundError: NullElementFactory
2) jview vs. Sun-JDK on win95
Called from commandline, jview fails this way:
> echo %CLASSPATH%
d:\devel\msxml;d:\devel\msxml\classes;.
> jview msxml Hamlet.xml
Error: java.lang.NoSuchMethodError: com/ms/xml/Document:
setLoadExternal(Z)V not found
Strange enough: Sun-JDK 1.1.5 works fine !
Once again clueless,
++im
--
Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Wed Dec 10 09:03:06 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:21 2004
Subject: msxml 1.8 questions
Message-ID: <001501bd0549$ed433c80$0100007f@localhost>
Ingo,
1) Fast mode
Fast mode is not part of MSXML framework. It is just something msxml test
harness class offers as an option. In 1.8 release, NullElementFactory class
was missing. I have attached it to this e-mail. Place the files where the
msxml.java and msxml.class files are. I have also attached missing
XMLStreamReader class just in case others missed them. Place them inside
com/ms/xml/util directory.
2) jview vs. Sun-JDK on win95
I have no such problem. But then I hand-updated IE 4.0's XML library with
the latest. You might be running into problem because the older version of
MSXML in IE 4.0 is interfering with your new ones.
Nog Nog. Who's there? EggNog!8^P
Don "JStud" Park
Master Consultant
donpark@quake.net
Come visit my XML Example Catalog at
http://www.quake.net/~donpark/xmlcat.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NullElementFactory.java
Type: application/octet-stream
Size: 1214 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971210/02b4dc15/NullElementFactory.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: NullElementFactory.class
Type: application/octet-stream
Size: 643 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971210/02b4dc15/NullElementFactory-0001.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: XMLStreamReader.java
Type: application/octet-stream
Size: 249 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971210/02b4dc15/XMLStreamReader.obj
-------------- next part --------------
A non-text attachment was scrubbed...
Name: XMLStreamReader.class
Type: application/octet-stream
Size: 317 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971210/02b4dc15/XMLStreamReader-0001.obj
From reast at esri.com Wed Dec 10 16:40:13 1997
From: reast at esri.com (Russell East)
Date: Mon Jun 7 16:59:21 2004
Subject: Mixed content not working for me
Message-ID: <348EC4DD.623632@esri.com>
How come the following doesn't work?
I basically want my element a to either form an hierarchy
*or* have some text data.
But it seems I'm forced to have
which I don't really want at all.
--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Russell East mailto:reast@esri.com
_|_| Programmer phn: +1 (909) 793 2853
_|_| ESRI, 380 New York St fax: +1 (909) 307 3067
Redlands CA 92373-8100 http://maps.esri.com/
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From reast at esri.com Wed Dec 10 18:39:59 1997
From: reast at esri.com (Russell East)
Date: Mon Jun 7 16:59:21 2004
Subject: Mixed content not working for me
References: <348EC4DD.623632@esri.com> <199712101717.JAA06472@homeplate.firstfloor.COM>
Message-ID: <348EE119.2BAEC73E@esri.com>
You're right, it should have been
but this is also incorrect according to both the standard
and AELFRED. Curiously, MSXML passes it no problem.....
What I want is stuff like:
text
text
text
but, I want to prevent this:
text
text
==========================================================
Mary Holstege wrote:
>
> Russell East writes:
> > How come the following doesn't work?
> >
> >
> > I basically want my element a to either form an hierarchy
> > *or* have some text data.
> >
> > But it seems I'm forced to have
> >
> >
> > which I don't really want at all.
>
> Try this:
>
>
>
> Yours is ambigious when you have nothing -- is it a list of a's of length zero
> or is it a #PCDATA with a null string?
>
> //Mary
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Russell East mailto:reast@esri.com
_|_| Programmer phn: +1 (909) 793 2853
_|_| ESRI, 380 New York St fax: +1 (909) 307 3067
Redlands CA 92373-8100 http://maps.esri.com/
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 10 19:28:05 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:21 2004
Subject: Mixed content not working for me
In-Reply-To: <348EC4DD.623632@esri.com>
References: <348EC4DD.623632@esri.com>
Message-ID: <199712101927.OAA00337@unready.microstar.com>
Russell East writes:
> How come the following doesn't work?
>
>
> I basically want my element a to either form an hierarchy
> *or* have some text data.
>
> But it seems I'm forced to have
>
XML bans this type of mixed content because it has been causing
trouble in full SGML for over a decade. The problem comes with
something like this:
After an SGML parser reads the opening tag, it doesn't know
whether the element will contain #PCDATA or subelements. The first
character it reads is a linefeed -- that's character data, so the
parser assumes that it is reading #PCDATA; when the parser finds the
tag a few characters later it throws an error.
You need to do two things:
1) submit a bug report to Microsoft; and
2) create a new subelement to hold the text:
Now you can have
This is some text
or
This is a subelement
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 10 20:08:16 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:21 2004
Subject: XML of Darkness
Message-ID: <199712102007.PAA00841@unready.microstar.com>
I have put online a rough-and-ready version of Conrad's HEART OF
DARKNESS, with an XML 1.0 DTD and markup. You can get at the document
through the following URL:
http://home.sprynet.com/sprynet/dmeggins/texts/
You may, of course, simply download the document and parse it on your
local system; however, if you happen to have an active Internet
connection, it's much more interesting (and much more in line with the
XML philosophy) to parse the document directory from its source URL:
http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml
For example, with Ælfred (http://www.microstar.com/XML/), you would
type
java EventDemo http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml
and watch the events roll down your screen. I have not tried this yet
with other XML parsers like Lark or MSXML.
For a _really_ fun test in the future, I might put different chapters
of the book on different Internet hosts (you could still parse it
through a single top-level URL). This is where XML can be exciting
for managing distributed information.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Wed Dec 10 20:10:35 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:59:21 2004
Subject: msxml 1.8 questions
Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C099F2E@red-msg-56.dns.microsoft.com>
1) I found the NullElementFactory problem. The msxml.class file in 1.8 is
out of date. Recompile msxml.java and everything should work fine.
NullElementFactory is inside msxml.java.
2) "Error: java.lang.NoSuchMethodError: com/ms/xml/Document:
setLoadExternal(Z)V not found" definately indicates an install or classpath
problem. Looks like it isn't picking up the new stuff. Try the following:
jview /cp:p "d:\devel\msxml;d:\devel\msxml\classes;" msxml
Hamlet.xml
Glad to hear that our parser works fine under Sun-JDK 1.1.5.
> -----Original Message-----
> From: Ingo Macherius [SMTP:Ingo.Macherius@TU-Clausthal.de]
> Sent: Wednesday, December 10, 1997 12:26 AM
> To: xml-dev@ic.ac.uk
> Subject: msxml 1.8 questions
>
> Here's a list with problems regarding msxml 1.1.8.
>
> 1) Fast mode
>
> Did anyone get msxml 1.8 to work with "-f" set ?
> I tried with Sun-JDK 1.1.{2,3} on Linux, Sun-JDK 1.1.5 on Win95
> and latest jview. All fail to parse any XML-document.
>
> With Sun-JDK:
> [inim@voyager samples]$ java msxml -f Hamlet.xml
> java.lang.NoClassDefFoundError: NullElementFactory
> at msxml.main(msxml.java)
>
> With jview:
> c:\temp\samples> jview msxml -f Hamlet.xml
> Error: java.lang.NoClassDefFoundError: NullElementFactory
>
> 2) jview vs. Sun-JDK on win95
>
> Called from commandline, jview fails this way:
>
> > echo %CLASSPATH%
> d:\devel\msxml;d:\devel\msxml\classes;.
> > jview msxml Hamlet.xml
> Error: java.lang.NoSuchMethodError: com/ms/xml/Document:
> setLoadExternal(Z)V not found
>
> Strange enough: Sun-JDK 1.1.5 works fine !
>
>
> Once again clueless,
> ++im
>
> --
> Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
> mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/
> Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank
> Zappa)
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 10 21:38:13 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:21 2004
Subject: LISTRIVIA (was Re: Microsoft's JScript XML Sample)
In-Reply-To:
Message-ID: <3.0.1.16.19971210221359.2c0f2bdc@pop3.demon.co.uk>
At 17:50 09/12/97 -0500, Craig Gingell wrote:
>I am keen to exploit the potential of XML in a project I am currently
>working on.
Good :-)
>I have visited the Microsoft website page
>http://www.microsoft.com/msdn/sdk/inetsdk/help/itt/xml/overview/Sample_4
>.htm#Sample_4
>and cut and pasted the JScript to my own file. Here is my file -
>
Please do not include attachments to posting to XML-DEV - the mailer and
the hypermail can get confused by them and they don't appear on the latter.
If these are useful resources, find a permanent site for them (we have
volunteers).
P.
(remember also that some people - certainly myself - have to pay personally
for all mail they received from XML-DEV).
Best of luck,
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 10 21:40:23 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:21 2004
Subject: RTF merge
In-Reply-To:
Message-ID: <3.0.1.16.19971210220932.2c0f4a36@pop3.demon.co.uk>
At 09:42 09/12/97 -0800, you wrote:
>Hi,
>
>I'm looking for a RTF merge utiity to merge 2 or more files. RTF header
in theses files have to be same and is removed from 2nd file onwards. I
know it is pretty easy to write but why to spend time if it is available.
This list is essentially for those interested in developing XML
applications :-) and not for general wordprocessing queries. There are
better newsgroups where you are more likely to find an answer.
Best of luck.
P.
>
>Let me know if you have it?
>
>Thanks
>Satwinder Mangat
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 10 21:41:09 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:21 2004
Subject: LISTRIVIA (was Re: msxml 1.8 questions)
In-Reply-To: <001501bd0549$ed433c80$0100007f@localhost>
Message-ID: <3.0.1.16.19971210220657.2c0fb3e8@pop3.demon.co.uk>
At 00:59 10/12/97 -0800, Don Park wrote:
[... useful help with problem...]
>
>
>Attachment Converted: "c:\eudora\attach\NullElem.java"
>
>Attachment Converted: "c:\eudora\attach\NullElem.class"
>
>Attachment Converted: "c:\eudora\attach\XMLStrea.java"
>
>Attachment Converted: "c:\eudora\attach\XMLStrea.class"
>
It's probably a poor idea to attach material that is going to a mailing
list which is then hypermailed. I have instances where non-printables have
crashed the Hypermail system on our machine, and the attachments don't go
anywhere useful.
We have already several volunteers for providing various XML resources and
I am sure some of those would mount material if asked.
P.
BTW it would be extremely useful to collect together all the MSXML-related
material somewhere since I think some of us are now confused by what we
need to download and what to so with it :-). And is there a WORA version
yet :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 10 21:53:35 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:21 2004
Subject: General comments on parsers (was [NEW] AElfred)
In-Reply-To: <199712100018.TAA00263@unready.microstar.com>
Message-ID: <3.0.1.16.19971210223816.2c0f5bd6@pop3.demon.co.uk>
At 19:18 09/12/97 -0500, David Megginson wrote:
>Microstar Software Ltd. is happy to announce lfred (AElfred), a
>small, fast, DTD-aware Java-based XML parser, especially suitable for
>use in Java applets.
Great!
I have bolted support for (AE)?lfred into JUMBO and tested the last but one
lfred pre-release. Many thanks to Microstar (and David) for having
approached JUMBO.
JUMBO now supports three parsers (in alpha order)
- Lark
- lfred
- NXP
(is MXSML WORA yet??)
They are run with the commandline
java jumbo.sgml.SGMLTree myfile.xml PARSER=AElfred (or whatever)
It has proved relatively easy to bolt these in, but there have been
significant differences in the interfaces offered and I hope that we can
move towards some uniformity - at least in the terminology. I shall post
more on this to XML-DEV.
Specific comments:
>lfred is free for both commercial and non-commercial use, and COMES
^^^^^
I am not sure whether the ligature has disappeared here or whether you have
shortened it to 'lfred' (5 chars). Although I support the use of Unicode,
many mailers don't (this is Eudora).
Note also that I use names for Java classes as well and so do authors, so
we have Lark.class, etc. I doubt whether JDK1.02 supports ligature.
There are 3 possibilities:
7 chars (AElfred)
6 chars (lfred)
5 chars (lfred)
I think you need to standardise on ONE!
[... valuable design points omitted...]
>
>6. lfred must produce correct output for well-formed and valid
> documents, but need not reject every document that is not valid or
> not well-formed.
>
> STATUS: lfred is DTD-aware, and handles all current XML features,
I can see several ways a parser can treat the DTD:
- ignore external and internal subsets completely
- read and parse the internal subset and apply ATTLISTs and ENTITYs
- ditto and provide handles for the application to retrieve DTD information
- ditto, but include the external subset
- as above, but validate attribute values
- as above but also validate content
Only the latter is full validation.
JUMBO wants to retrieve the DTD information for its authoring process, and
needs the ELEMENT and ATTLIST information. At my last attempt I was unable
to extract ELEMENT information from Lark (but can get ATTLISTs) and I don't
think I could get ELEMENT info from lfred. I haven't looked at NXP, and
perhaps Norbert could update us.
> including CDATA and INCLUDE/IGNORE marked sections, internal and
>
Again, many thanks to Microstar and David, Tim, Norbert (and the MSXML
players when we get the WORA version).
P.
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 10 22:00:26 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:21 2004
Subject: General comments on parsers (was [NEW] AElfred)
In-Reply-To: <3.0.1.16.19971210223816.2c0f5bd6@pop3.demon.co.uk>
References: <199712100018.TAA00263@unready.microstar.com>
<3.0.1.16.19971210223816.2c0f5bd6@pop3.demon.co.uk>
Message-ID: <199712102159.QAA00524@unready.microstar.com>
Peter Murray-Rust writes:
> There are 3 possibilities:
> 7 chars (AElfred)
> 6 chars (lfred)
> 5 chars (lfred)
>
> I think you need to standardise on ONE!
Just for clarification, the proper name is "?lfred" (with an AE
ligature at the start), but that will not come through older mailers;
the ASCII transliteration is "AElfred", but the point of the AE
ligature is that XML is not limited to ASCII (though many people's
e-mail is). The unimaginative Java class name is
com.microstar.xml.XmlParser, so there's no problem with ligatures
there.
We could type Ælfred, but we'd scare away the Java hackers.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Wed Dec 10 22:18:05 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:22 2004
Subject: LISTRIVIA (was Re: msxml 1.8 questions)
Message-ID: <001201bd05b8$ff3819f0$0100007f@localhost>
>It's probably a poor idea to attach material that is going to a mailing
>list which is then hypermailed. I have instances where non-printables have
>crashed the Hypermail system on our machine, and the attachments don't go
>anywhere useful.
Sorry about that. I have now uploaded XMLStreamReader.java at:
http://www.quake.net/~donpark/XMLStreamReader.java
I will keep it there until the corrected version of MSXML 1.8 is released
(should be RSN).
It turns out that NullElementFactory.java is not needed because it is inside
msxml.java. Just recompile and you should get the NullElementFactory.class
file.
>BTW it would be extremely useful to collect together all the MSXML-related
>material somewhere since I think some of us are now confused by what we
>need to download and what to so with it :-). And is there a WORA version
>yet :-)
I agree but I am short of disk space on my web site. Anyway, I am willing
to take responsibility for XML example files and DTDs.
MSXML 1.8 runs just fine on the latest JDK and MS Java SDK.
Don
>
> P.
>
>Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
>net connection
>VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
>http://www.venus.co.uk/vhg
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 10 22:20:03 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:22 2004
Subject: New AElfred Release (1.0beta2)
Message-ID: <199712102219.RAA00729@unready.microstar.com>
I have put up a new beta release of Microstar's Java-based XML parser,
?lfred (AElfred), with two minor bugs fixed:
1) When ?lfred finds "
but not
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mike at datachannel.com Wed Dec 10 22:23:04 1997
From: mike at datachannel.com (Mike Dierken)
Date: Mon Jun 7 16:59:22 2004
Subject: Latest news on NXP
Message-ID: <01BD0576.C7D81970@NEMO>
Peter,
Norbert is currently at XML'97, and I'm not sure if he is monitoring this list right now.
Here is a press release talking about the future development efforts and availability of NXP.
http://www.datachannel.com/pressroom/releases/Press32.htm
Here is a page with links to the parsers and samples:
http://www.datachannel.com/products/xml/index.html
Mike D
DataChannel
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at light.demon.co.uk Wed Dec 10 22:32:23 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun 7 16:59:22 2004
Subject:
Hi,
I notice that the current draft has switched the case of the XML
declaration and its arguments to lower case:
Now that case is significant, this presumably matters. Is there a
particular reason for this? Other PIs will have a PItarget where 'xml'
sits, and this isn't constrained to be any particular case. Wouldn't it
be kinder to make it '' ('XML'|'xml') ... ?!
(The DTD declarations (
Message-ID: <3.0.1.16.19971210234937.34e73996@pop3.demon.co.uk>
At 14:14 10/12/97 -0800, Don Park wrote:
[... thanks Don...]
>
>MSXML 1.8 runs just fine on the latest JDK and MS Java SDK.
Does this mean on either or are both necessary? i.e. do I have to download
the MS SDK?
Thanks,
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 10 23:28:06 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:22 2004
Subject:
Message-ID: <3.0.1.16.19971211001535.29676d76@pop3.demon.co.uk>
At 22:26 10/12/97 +0000, Richard Light wrote:
>
>Hi,
>
>I notice that the current draft has switched the case of the XML
>declaration and its arguments to lower case:
>
>
>
>Now that case is significant, this presumably matters. Is there a
>particular reason for this? Other PIs will have a PItarget where 'xml'
>sits, and this isn't constrained to be any particular case. Wouldn't it
>be kinder to make it '' ('XML'|'xml') ... ?!
>
>(The DTD declarations (for compatibility with what SGML systems produce.)
Maybe WG members authorised to speak about this will answer the 'why'
questions :-)
The main problems now facing XML-DEV'ers are:
- to remember what the various cases are in the XML spec. Of course the
parsers will remind us ungently :-) [These are Draconian bomb-out errors
unless I am mistaken :-)]
- to remember what the case sensitivity is in *other peoples* DTDs and
documents.
The second promises to be a real problem. (BTW I support the WG's motives
in introducing case sensitivity). I don't know whether we can help
ameliorate it here. This sort of thing:
[bringgg, bringgg]. "Hi Sue, my XML document has bombed out with 'unknown
element FOOBAR'."
"Mary, did you remember the capitals?"
"yes, I put them all in!"
"How many?"
"The whole lot."
"What? Two?"
"No, all SIX".
"Ah, you should only have two."
"Where?"
"The F and the B."
"Oh, well HTML is all caps".
"Yes, but this isn't HTML."
"Well it's a sort of extended HTML, isn't it?."
... and so on ...
I have no idea how to construct CML cases at present. If I follow the XML
spec I get all-lower-case-with-dashes-between-words. OK, except that -'- is
not a very friendly character for forming java names from. If I follow the
WC namespace proposal I get random upper and lower case for namespaces and
for elements. If I follow the RDF I get consistent namespace case and some
capitalisation in names.
So:
PLEA TO W3C
Please, it would help us a lot if at least the W3C could use a consistent
case style in their public-facing documents. At the moment it suggests they
haven't addressed this problem. [I don't believe they don't care.]
If this happened, at least some of the rest of us can follow W3C style.
I doubt we can convince the whole world to use one style, but languages
like Java and C++ do quite a good job of gently persuading people to use a
communal approach. XML/W3 could do, if they address it.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jones at nceas.ucsb.edu Thu Dec 11 00:15:10 1997
From: jones at nceas.ucsb.edu (Matt Jones)
Date: Mon Jun 7 16:59:22 2004
Subject: General comments on parsers (was [NEW] AElfred)
References: <3.0.1.16.19971210223816.2c0f5bd6@pop3.demon.co.uk>
Message-ID: <348F306B.ADB07AFA@nceas.ucsb.edu>
Thanks to the parser writers!
Like Peter, I am working on a project where we are building an XML
editing application in Java and therefore need access to the content
model for determining allowable content. The msxml parser currently
doesn't make its internal representation of the DTD public -- Chris
Lovett suggested using the XML-Data Schemas instead of trying to access
the DTD info directly. When one wants access to the DTD, what is the
recommended method? Is there any concensus? Do any of the available
parsers (Lark, MSXML, NXP, PaxSyn, etc.) plan on offering access to the
DTD through their APIs at some point?
Standardization of APIs (a la XAPI-J) would make life better as well --
are people working on this (Lark? MSXML? etc?)?
Thanks in advance,
Matt
--
******************************************************************
Matt Jones jones@nceas.ucsb.edu
http://www.nceas.ucsb.edu/ Ph: 805-892-2508 Fax: 805-892-2510
National Center for Ecological Analysis and Synthesis (NCEAS)
******************************************************************
Peter Murray-Rust wrote:
> JUMBO wants to retrieve the DTD information for its authoring process,
> and
> needs the ELEMENT and ATTLIST information. At my last attempt I was
> unable
> to extract ELEMENT information from Lark (but can get ATTLISTs) and I
> don't
> think I could get ELEMENT info from lfred. I haven't looked at NXP,
> and
> perhaps Norbert could update us.
>
> Again, many thanks to Microstar and David, Tim, Norbert (and the MSXML
>
> players when we get the WORA version).
>
> P.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Thu Dec 11 01:41:33 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:22 2004
Subject: msxml 1.8 questions
Message-ID: <000f01bd05d5$6ef33f10$0100007f@localhost>
>>
>>MSXML 1.8 runs just fine on the latest JDK and MS Java SDK.
>
>Does this mean on either or are both necessary? i.e. do I have to download
>the MS SDK?
No. Either one should be just fine. Both is also fine. None would pose a
little difficulty. I typically compile using MS Java SDK and run using JDK.
Have fun,
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 11 01:52:45 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:22 2004
Subject: General comments on parsers
In-Reply-To: <348F306B.ADB07AFA@nceas.ucsb.edu>
References: <3.0.1.16.19971210223816.2c0f5bd6@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971211024212.2d87bafc@pop3.demon.co.uk>
At 16:14 10/12/97 -0800, Matt Jones wrote:
>Thanks to the parser writers!
>
>Like Peter, I am working on a project where we are building an XML
>editing application in Java and therefore need access to the content
>model for determining allowable content. The msxml parser currently
Since my last posting I have been hacking AElfred into JUMBO and it does a
nice job of getting almost everything from the DTD *except* the content.
[It seems to require an external DTD for this - it complains about elements
in the internal subset, although this is the pre-beta version :-)]
>doesn't make its internal representation of the DTD public -- Chris
>Lovett suggested using the XML-Data Schemas instead of trying to access
I am going to post something along these lines tomorrow (I hope).
>the DTD info directly. When one wants access to the DTD, what is the
>recommended method? Is there any concensus? Do any of the available
>parsers (Lark, MSXML, NXP, PaxSyn, etc.) plan on offering access to the
>DTD through their APIs at some point?
>
>Standardization of APIs (a la XAPI-J) would make life better as well --
>are people working on this (Lark? MSXML? etc?)?
Yes, please. This list (especially John Tigue) worked hard to come up with
Xapi-J - everyone seemed to think it was a good way forward, but no parsers
implement it. Instead we have an increasing (and rather difficult) variety
of approaches (and especially terminology). For example, it's clear that
AElfred and Lark use 'Entity' in different ways [I'm slightly confused by
Lark's use of Entity].
Parsers are NOT equivalent, and there are many reasons why an application
may wish to use more than one.
- different interfaces, giving different views of the document
- different optimisations of speed, memory, etc.
- different treatment of entities
- different features
It's very tedious to have to implement different interfaces for each
(AElfred has about 30 methods - and they are all valuable). So:
- Chris
- David
- James
- John
- Norbert
- Tim
any comments on a common interface :-)?
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 11 05:45:10 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:22 2004
Subject: LISTRIVIA
In-Reply-To: <001201bd05b8$ff3819f0$0100007f@localhost>
Message-ID: <3.0.1.16.19971211064248.17ef2e4e@pop3.demon.co.uk>
A gentle reminder to posters to clip quoted material before posting.
Including the whole text of a previous posting is rarely necessary, and
means that (a) the disk space for the list gets filled up and (b) that
people like me who pay for mail out of their own personal pockets have to
pay more.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From smith at interlog.com Thu Dec 11 08:01:03 1997
From: smith at interlog.com (Chris Smith)
Date: Mon Jun 7 16:59:22 2004
Subject: XML vs the Dreaded Whitespace
In-Reply-To:
Message-ID:
I'm part of a group that has decided to use XML as an encoding for
documents which are effectively carrying transactions. Seeing XML make
it to Proposed Recommendation is great, and makes our decision less of
a concern.
Part of this work requires that these documents carry document
authentication information. This, in turn, requires that some regions
of an XML document must be transported *exactly*, and must be received
and checked identically so that the message authentication actually
works. That fact that we are considering the idea of including email
as a transport mechanism doesn't help matters.
There are two questions at hand, largely directed at those creating
parsers. I'd like to know if the application requirements we are
proposing ("what to do with the document") are going to be incredibly
difficult to manage, given what the parsers are providing. I confess
I'm just getting started here - I will get to investigating the
various parsers. For now the questions may be useful anyway.
The first criteria is that message authentication is applied to an
element in the document. This is a start to precisely defining what is
being checked. The second criteria is that the message authentication
must be applied to the XML document as represented in UTF-16 encoding,
with big-endian convention, AS IT IS WRITTEN. This is to prevent us
having to specify a consistent *internal* representation. The XML spec
itself helps define a consistent *external* representation, which we
figure is easier to stick with than dealing with all the
cross-platform issues. The question: can this readily be dealt with?
Is it straight-forward to ask for MessageAuthentication over
... , with all the content included?
The second question is much less firm right now. We would like make
whitespace handling robust - if someone along the way uses a tool
which breaks a line, we should be able to fix it rather than die.
If we add the following character entities to our DTD,
then it should be possible to use these to represent 'wanted'
whitespace, and thus allow for a simple rule prior to checking message
authentication - that is, remove all 'native' space, tab, LF, and CR
from the #PCDATA and check what remains (whitespace inside tags is
handled in a more draconian fashion). (According to the previous
section, "Hi&spc;there!" will be checked exactly that way you see it
here - not as "Hi there!" The question? - is this distinction (between
eg the native 0x0009 and &tab; (which converts to 0x0009) going to be
difficult to keep track of?
---------------------------------------------------------------------------
Chris Smith
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 11 09:51:44 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:23 2004
Subject: XML vs the Dreaded Whitespace
In-Reply-To:
References:
Message-ID: <3.0.1.16.19971211103053.17efbd0a@pop3.demon.co.uk>
Thanks very much Chris,
I'm probably not going to be much practical help, but I hope your posting
catalyses a practical response from the SGML experts. I'd be surprised if
conventional XML-enhanced SGML tools couldn't handle this problem, but I
have no idea what they would cost. [The last flier I got was 2 orders of
magnitude greater than an impecunious academic could afford.]
At 03:00 11/12/97 -0500, Chris Smith wrote:
>
[... first problem punted ...]
>The second question is much less firm right now. We would like make
>whitespace handling robust - if someone along the way uses a tool
>which breaks a line, we should be able to fix it rather than die.
>
>If we add the following character entities to our DTD,
>
>
>
>
>
>
>then it should be possible to use these to represent 'wanted'
>whitespace, and thus allow for a simple rule prior to checking message
>authentication - that is, remove all 'native' space, tab, LF, and CR
>from the #PCDATA and check what remains (whitespace inside tags is
>handled in a more draconian fashion). (According to the previous
>section, "Hi&spc;there!" will be checked exactly that way you see it
>here - not as "Hi there!" The question? - is this distinction (between
>eg the native 0x0009 and &tab; (which converts to 0x0009) going to be
>difficult to keep track of?
As one of the few authors of a generic native XML application I have to
face this problem and have repeatedly failed to get practical solutions.
the main response is:
Yes, its' a problem and
Yes, it's your problem
As I understand it, your XML document may contain two sorts of white space:
whitespace that matters
whitespace that doesn't matter
The latter may be inserted randomly by authors whose lines don't wrap. From
my very limited experience of SGML I would say your approach looks a
sensible one.
However the major problem is 'where is your application software going to
come from?' I have argued very strongly (and shall continue to do so), that
there need to be generic conventions honoured by common application
programs. Otherwise you have to write your own application for your
problem. At present you have only two options:
- write it yourself (and maintain it)
- pay an SGML house to solve your problem for you
I hope shortly to propose some generic whitespace problems (implemented in
JUMBO) for certain types of document. I don't know whether they would solve
your problems, but thanks for giving me the chance to think about a real
problem. :-)
As a corollary: Is anyone testing the ESIS output of the current crop of
XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace model
or the value of xml:space they should all produce identical ESIS (right?)
If not, then one or more is wrong. And all applications should (IMO) be
prepared to work with ESIS which I think is isomorphous with a WF XML
document.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Thu Dec 11 10:57:59 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:23 2004
Subject: XML vs the Dreaded Whitespace
Message-ID: <199712111033.VAA09204@jawa.chilli.net.au>
Attached is a repost summary of the white-space characters available in
XML from ISO 10646. Of course, it is still up to applications to implement
them correctly.
At the moment, spaces and newlines are very overloaded which causes all
sorts of problems. So it would solve many problems to use these characters.
For example, if you want a hard return, use the hard return character
and if you need non-collapsing white-space, use
In this particular case, one thing to do is put an attribute at the top-level element
xml:space="preserve"
to prevent collapsing and stripping of spaces and tabs. As far as CR/LF,
I think the XML spec can only be interpreted to mean that
should be preserved. This is because 2.11
"To simplify the tasks of applications, wherever an external
parsed entity or the literal entity value of an internal parsed
entity contains either the literal two-character sequence
"#xD#xA" or a standalone literal #xD, an XML processor must
pass to the application the single character #xA. (This behavior
can conveniently be produced by normalizing all line breaks
to #xA on input, before parsing.)"
So normalization *should* apply only to direct characters, not
references. However, I dont think you can trust parsers to
do this. So if you want to send facsimile documents
with whitespace preserved, you might find you have to
use a Unicode private-use-area character to substitute for CR.
Your application at the other end has to replace that character
again to reconstruct the document.
For example, you could use
This is a case where you want to do something that is definitely
contrary to the simplifying rules of XML, so don't be alarmed
that you have to use markup (which you give a significance to)
rather than being able to do it direct.
Rick Jelliffe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: space (1).htm
Type: application/octet-stream
Size: 2841 bytes
Desc: space (1).htm (Internet Document (HTML))
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971211/7421f8e2/space1.obj
From ak117 at freenet.carleton.ca Thu Dec 11 11:35:21 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:23 2004
Subject: XML vs the Dreaded Whitespace
In-Reply-To:
References:
Message-ID: <199712111134.GAA00411@unready.microstar.com>
Chris Smith writes:
> There are two questions at hand, largely directed at those creating
> parsers. I'd like to know if the application requirements we are
> proposing ("what to do with the document") are going to be incredibly
> difficult to manage, given what the parsers are providing. I confess
> I'm just getting started here - I will get to investigating the
> various parsers. For now the questions may be useful anyway.
>
> The first criteria is that message authentication is applied to an
> element in the document. This is a start to precisely defining what is
> being checked. The second criteria is that the message authentication
> must be applied to the XML document as represented in UTF-16 encoding,
> with big-endian convention, AS IT IS WRITTEN. This is to prevent us
> having to specify a consistent *internal* representation. The XML spec
> itself helps define a consistent *external* representation, which we
> figure is easier to stick with than dealing with all the
> cross-platform issues. The question: can this readily be dealt with?
> Is it straight-forward to ask for MessageAuthentication over
> ... , with all the content included?
It would be possible to use a parser to do authentication by
generating checksums based on a normalised version of each element,
but not to do it based on the external representation. Right now,
parsers must report whitespace in mixed content and sort-of report it
in element content (yech). There is no requirement to report
whitespace within markup, however.
As a result, parsers are very unlikely to report any difference
between the following two examples (assuming that the "idrefs"
attribute is declared as IDREFS in the DTD):
Example 1:
This is a link.
Example 2:
This is a link.
There are many other problems too, include comments, whitespace
outside of the document element, etc., etc.
I'd recommend that you do your checksum validation on any files that
you have transmitted _before_ you parse them; that way, you can use
existing software (it doesn't have to be XML-aware).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Dec 11 11:42:29 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:23 2004
Subject: XML vs the Dreaded Whitespace
In-Reply-To: <3.0.1.16.19971211103053.17efbd0a@pop3.demon.co.uk>
References:
<3.0.1.16.19971211103053.17efbd0a@pop3.demon.co.uk>
Message-ID: <199712111141.GAA00445@unready.microstar.com>
Peter Murray-Rust writes:
> As a corollary: Is anyone testing the ESIS output of the current crop of
> XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace model
> or the value of xml:space they should all produce identical ESIS (right?)
> If not, then one or more is wrong. And all applications should (IMO) be
> prepared to work with ESIS which I think is isomorphous with a WF XML
> document.
There are quite a few more XML parsers out there, including at least
one in TCL -- see
http://www.sil.org/sgml/XML.html#xmlSoftware
As for ESIS, there are some problems that we'd have to overcome first:
1) How should empty elements be represented? Right now, ?lfred generates a
startElement event immediately followed by an endElement event.
2) How should the XML declaration be represented? Should it appear as
a processing instruction, or should it be ignored?
3) How should space in element content be handled? According to the
spec, a DTD-aware parser should handle whitespace in element
content differently from whitespace in mixed content (?lfred just
ignores whitespace in element content right now).
4) DTD-aware and non-DTD-aware parsers will handle whitespace in
attribute values differently. Non-DTD-aware parsers will treat all
attributes as CDATA, but DTD-aware parsers will treat tokenised
attributes specially, by stripping all leading an trailing
whitespace, and normalising internal whitespace to single spaces.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Thu Dec 11 13:38:38 1997
From: mecom-gmbh at mixx.de (james anderson too)
Date: Mon Jun 7 16:59:23 2004
Subject:
Message-ID: <348FEDF2.FD454815@mixx.de>
i think the "proposed recommendation" drafters agree with you. to wit (from
http://www.w3.org/TR/PR-xml-971208):
[17] PI ::= '' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
[18] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
Richard Light wrote:
> Hi,
>
> I notice that the current draft has switched the case of the XML
> declaration and its arguments to lower case:
>
>
>
> Now that case is significant, this presumably matters. Is there a
> particular reason for this? Other PIs will have a PItarget where 'xml'
> sits, and this isn't constrained to be any particular case. Wouldn't it
> be kinder to make it '' ('XML'|'xml') ... ?!
>
> (The DTD declarations ( for compatibility with what SGML systems produce.)
>
> Richard.
>
> Richard Light
> SGML/XML and Museum Information Consultancy
> richard@light.demon.co.uk
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 11 13:39:31 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:23 2004
Subject: XML vs the Dreaded Whitespace
In-Reply-To: <199712111141.GAA00445@unready.microstar.com>
References: <3.0.1.16.19971211103053.17efbd0a@pop3.demon.co.uk>
<3.0.1.16.19971211103053.17efbd0a@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971211143739.37172be2@pop3.demon.co.uk>
At 06:41 11/12/97 -0500, David Megginson wrote:
>Peter Murray-Rust writes:
>
> > As a corollary: Is anyone testing the ESIS output of the current crop of
> > XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace model
> > or the value of xml:space they should all produce identical ESIS (right?)
> > If not, then one or more is wrong. And all applications should (IMO) be
> > prepared to work with ESIS which I think is isomorphous with a WF XML
> > document.
>
>There are quite a few more XML parsers out there, including at least
>one in TCL -- see
>
> http://www.sil.org/sgml/XML.html#xmlSoftware
Apologies to anyone I missed. I am a great fan of tcl and wrote costwish in
it to sit on top of Joe English's CoST...
>
>As for ESIS, there are some problems that we'd have to overcome first:
Are there? How does a WF document differ from the corresponding ESIS
stream? IOW if I do the transformation:
WF -> ESIS -> WF shouldn't I be able to recover the original?
>
>1) How should empty elements be represented? Right now, ?lfred generates a
> startElement event immediately followed by an endElement event.
Yes - and JUMBO is happy with that. As far as JUMBO os concerned
and are processed in the same way and I will need a very
clear argument to convince me that it should do different.
>
>2) How should the XML declaration be represented? Should it appear as
> a processing instruction, or should it be ignored?
JUMBO regards it as a PI. I hang all PIs off the preceding ELEMENT (not
PCDATA). In that way the tree can be processed with these intact. JUMBO
understands namespace PIs, PIs and will also store the
others. It's useful to store them in case one wants to compare trees. BTW -
although it is nowhere stated most people seem to create PIs as name-value
pairs and JUMBO expects this.
>
>3) How should space in element content be handled? According to the
> spec, a DTD-aware parser should handle whitespace in element
> content differently from whitespace in mixed content (?lfred just
> ignores whitespace in element content right now).
This is a critical area for the parser writers to agree on. I assume that
for the DTD-aware stuff there has to be a validating parser (i.e. one that
matches contentspec against element content). I am not sure what algorithms
are being used - JUMBO wants a java one for its birthday, please - but I
can imagine that with certain contentspecs they might get different answers.
>
>4) DTD-aware and non-DTD-aware parsers will handle whitespace in
> attribute values differently. Non-DTD-aware parsers will treat all
> attributes as CDATA, but DTD-aware parsers will treat tokenised
> attributes specially, by stripping all leading an trailing
> whitespace, and normalising internal whitespace to single spaces.
In this case presumably only the TYPE in the ATTLIST is needed.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Dec 11 15:08:00 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:23 2004
Subject:
References:
<348FEDF2.FD454815@mixx.de>
Message-ID: <199712111506.KAA00685@unready.microstar.com>
james anderson too writes:
> i think the "proposed recommendation" drafters agree with you. to wit (from
> http://www.w3.org/TR/PR-xml-971208):
>
> [17] PI ::= '' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
> [18] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))
No, not at all. At least as I read it (and I'm not in the WG or the
SIG), you _must_ begin the XML declaration with lowercase "
The situation is complicated by the fact that W3C is working on and has not
yet released its own version of Java XML Object Model. Since it will be
difficult to have all existing Java XML parsers to conform to a single
object model, I think the best approach is for someone to write a new Java
parser framework which provides a reasonable object model and acts as the
Universal XML Parser (UXP?:-).
UXP should use some kind of simple registry scheme and a UI to allow users
to plug in new UXP compatible parsers. Writing UXP adapters for each of
existing Java XML parsers should not be too hard. Once UXP is in place, new
parsers will start to conform. When W3C XML API is out, all we need to do
is write two adapters:
1) UXP to W3C adapter so programs using W3C XML API can use UXP parsers
(i.e. JavaScript).
2) W3C to UXP adapter so programs using UXP can use any XML parsers
providing W3C XML API.
BTW, I have taken a look at Xapi-J and W3C OM API and, frankly, I am not
satisfied with either of them. Enumeration by index is problematic and
callbacks are either not supported or primitive. Not that I can offer any
better in the near future . Call me a stuck up critic, if you will.
Don
>Yes, please. This list (especially John Tigue) worked hard to come up with
>Xapi-J - everyone seemed to think it was a good way forward, but no parsers
>implement it. Instead we have an increasing (and rather difficult) variety
>of approaches (and especially terminology). For example, it's clear that
>AElfred and Lark use 'Entity' in different ways [I'm slightly confused by
>Lark's use of Entity].
>
>Parsers are NOT equivalent, and there are many reasons why an application
>may wish to use more than one.
> - different interfaces, giving different views of the document
> - different optimisations of speed, memory, etc.
> - different treatment of entities
> - different features
>
>It's very tedious to have to implement different interfaces for each
>(AElfred has about 30 methods - and they are all valuable). So:
> - Chris
> - David
> - James
> - John
> - Norbert
> - Tim
>any comments on a common interface :-)?
>
> P.
>
>Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Thu Dec 11 17:09:15 1997
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 16:59:23 2004
Subject: MSXML 1.8 Viewer Applet problem
Message-ID: <01bd0657$6f259fa0$1e09e391@mhklaptop.bra01.icl.co.uk>
I'm using the XML Viewer applet in MSXML 1.8
Having trouble because there doesn't seem to be any way of closing the file
after you've finished with it, so all subsequent attempts to edit the XML
file after viewing it fail saying "file in use".
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dima at paragraph.com Thu Dec 11 17:20:00 1997
From: dima at paragraph.com (Dmitri Kondratiev)
Date: Mon Jun 7 16:59:24 2004
Subject: Newbie Q: NXP attrribute validation
Message-ID: <2.2.32.19971211172105.0091a70c@dream.paragraph.com>
Please help a newbie DTD writer :) I am trying to validate with NXP
attribute ID in element Foo :
With the following DTD:
As a result I get "Attribute has not be declared : ID" error. What am I
doing wrong ?
Thanks,
Dima
---
NXP output:
NXP - Norbert's XML Parser 0.97 - 05.08.1997
Fetch file : test/test.xml
Start parsing ...
Validate : true
Fetch file : test/FooBar.dtd
"
"
Error :
Attribute has not be declared : ID
"
"
"
"
Error :
Parsing stopped with exception :
java.util.EmptyStackException
Parsing finished - Time : 490 msec.
---------------------------
dima@paragraph.com
102401.2457@compuserve.com
http://www.geocities.com/SiliconValley/Lakes/3767/
tel: 07-095-464-9241
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Patrice.Bonhomme at loria.fr Thu Dec 11 18:02:08 1997
From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme)
Date: Mon Jun 7 16:59:24 2004
Subject: BUG : msxml 1.6
In-Reply-To: Your message of "Mon, 01 Dec 1997 09:33:28 PST."
<2F2DC5CE035DD1118C8E00805FFE354C099E93@red-msg-56.dns.microsoft.com>
Message-ID: <199712111801.TAA06638@chimay.loria.fr>
I have downloaded msxml 1.8 and tried to run it on my sample files and it
seems that the EXTENTITYDCL has not been fixed. I have always a "stammering"
inclusion of the external data !
What's wrong ?
Thanks,
Pat.
DOCUMENT
|---XMLDECL
| +---CDATA " VERSION="1.0" "
|---WHITESPACE 0xa
|---DOCTYPE NAME="EXAMPLE"
| |---WHITESPACE 0xa
| |---ELEMENTDECL EXAMPLE (P)+
| |---WHITESPACE 0xa
| |---ELEMENTDECL P (#PCDATA|S)*
| |---WHITESPACE 0xa
| |---ELEMENTDECL S (#PCDATA)*
| |---WHITESPACE 0xa
| +---EXTENTITYDCL incs
| |---ELEMENT S
| | +---PCDATA "a third."
| +---PCDATA "a third. " <--- HERE
|---WHITESPACE 0xa
|---ELEMENT EXAMPLE
| |---WHITESPACE 0xa
| |---ELEMENT P
| | |---ELEMENT S
| | | +---PCDATA "A sentence."
| | |---ELEMENT S
| | | +---PCDATA "An another."
| | +---ENTITYREF incs "a third.a third. " <--- AND HERE
| +---WHITESPACE 0xa
+---WHITESPACE 0xa
[] Chris Lovett said:
[]---------------------------------
] Thanks, I have a fix already, and will be posting it shortly.
]
] > -----Original Message-----
] > From: Patrice Bonhomme [SMTP:Patrice.Bonhomme@loria.fr]
] > Sent: Saturday, November 29, 1997 1:37 AM
] > To: Chris Lovett
] > Subject: BUG : msxml 1.6
] >
] >
] > Hi,
] >
] > I found a bug in msxml 1.6 relative to the External Entity checking.
] >
] > Main file (test-ent.xml):
] >
] >
] >
] >
] >
] >
] >
] > ]>
] >
] >
] > a sentence. an another.
] >
] > &inc-s;
] >
] >
] > Auxiliary file (inc-s.xml):
] > a third.
] >
] > And i ve got this message :
] >
] > % java msxml -i -d test-ext-ent.xml
] > Invalid element 'PCDATA' in content of 'P'. Expected [S]
] > Location: file:test-ext-ent.xml(14,5)
] > Context:
] >
] > The parser should make a difference between ENTITYREF and SYSTEM
] > ENTITYREF.
] >
] > Pat.
] > --
] > ==============================================================
] > bonhomme@loria.fr | Office : B.228
] > http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37
] > --------------------------------------------------------------
] > * Projet Aquarelle : http://aqua.inria.fr
] > * Serveur Silfide : http://www.loria.fr/Projet/Silfide
] > ==============================================================
] >
[]---------------------------------
--
==============================================================
bonhomme@loria.fr | Office : B.228
http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37
--------------------------------------------------------------
* Projet Aquarelle : http://aqua.inria.fr
* Serveur Silfide : http://www.loria.fr/Projet/Silfide
==============================================================
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 11 18:18:24 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:24 2004
Subject: General comments on parsers
In-Reply-To: <000801bd0650$77334140$0100007f@localhost>
Message-ID: <3.0.1.16.19971211183623.166722a2@pop3.demon.co.uk>
At 08:18 11/12/97 -0800, Don Park wrote:
>The situation is complicated by the fact that W3C is working on and has not
>yet released its own version of Java XML Object Model. Since it will be
Is this the same as DOM? If so, is there any timescale.
Not being part of the DOM process I am now somewhat confused. Does this
mean that there is a formal program to produce an API for XML parsers? If
so, what is the timescale? I'm sure there are some readers who are involved
;-)
I'm an impatient beast and I worry about waiting for things like this to
happen if it's going to be a long time. During that time we'll have another
5-10 Java based parsers, all with different terminology. In another
proposal I will try to address the terminology :-)
>difficult to have all existing Java XML parsers to conform to a single
>object model, I think the best approach is for someone to write a new Java
>parser framework which provides a reasonable object model and acts as the
>Universal XML Parser (UXP?:-).
Is this a short-term or long term solution? If long term, what is the
difference/benefit between this and the OM?
>
>UXP should use some kind of simple registry scheme and a UI to allow users
Please [ignorance] what does a registry scheme entail?
>to plug in new UXP compatible parsers. Writing UXP adapters for each of
>existing Java XML parsers should not be too hard. Once UXP is in place, new
>parsers will start to conform. When W3C XML API is out, all we need to do
>is write two adapters:
>
>1) UXP to W3C adapter so programs using W3C XML API can use UXP parsers
>(i.e. JavaScript).
>2) W3C to UXP adapter so programs using UXP can use any XML parsers
>providing W3C XML API.
>
>BTW, I have taken a look at Xapi-J and W3C OM API and, frankly, I am not
Where is the reference for W3C OM API?
>satisfied with either of them. Enumeration by index is problematic and
>callbacks are either not supported or primitive. Not that I can offer any
>better in the near future . Call me a stuck up critic, if you will.
>
I take a very simple approach and find that the AElfred approach gives me
almost everything I want. It allows me to extract the components of the
document (start/end/content, PIs, entities) and it allows me to get almost
everything from the DTD (except the contentspec). I don't think that *I*
need anything more. I just don't want - and don't intend to write 30
adapter functions for every new parser. If everyone had
getContentSpec(String elementType) that is the level I am quite happy with :-)
P.
>Don
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mike at datachannel.com Thu Dec 11 18:39:19 1997
From: mike at datachannel.com (Mike Dierken)
Date: Mon Jun 7 16:59:24 2004
Subject: General comments on parsers
Message-ID: <01BD0620.5FC70300@NEMO>
I do believe that the Java XML Object Model referred to is the same as the W3C DOM. However, the DOM is programming language independent.
I don't know the timeframe for final acceptance, however, XML parser writers are free to read up on the working draft and align their code with the defined functionality.
The W3C DOM page is here: http://www.w3.org/DOM/
The DOM Spec is here: http://www.w3.org/TR/WD-DOM/
"The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents."
Mike D
DataChannel
-----Original Message-----
From: Peter Murray-Rust [SMTP:peter@ursus.demon.co.uk]
Sent: Thursday, December 11, 1997 10:36 AM
To: xml-dev@ic.ac.uk
Subject: Re: General comments on parsers
At 08:18 11/12/97 -0800, Don Park wrote:
>The situation is complicated by the fact that W3C is working on and has not
>yet released its own version of Java XML Object Model. Since it will be
Is this the same as DOM? If so, is there any timescale.
Not being part of the DOM process I am now somewhat confused. Does this
mean that there is a formal program to produce an API for XML parsers? If
so, what is the timescale? I'm sure there are some readers who are involved
;-)
I'm an impatient beast and I worry about waiting for things like this to
happen if it's going to be a long time. During that time we'll have another
5-10 Java based parsers, all with different terminology. In another
proposal I will try to address the terminology :-)
>difficult to have all existing Java XML parsers to conform to a single
>object model, I think the best approach is for someone to write a new Java
>parser framework which provides a reasonable object model and acts as the
>Universal XML Parser (UXP?:-).
Is this a short-term or long term solution? If long term, what is the
difference/benefit between this and the OM?
>
>UXP should use some kind of simple registry scheme and a UI to allow
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Thu Dec 11 18:45:34 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:24 2004
Subject: General comments on parsers
Message-ID: <000c01bd0664$7a677a20$0100007f@localhost>
>At 08:18 11/12/97 -0800, Don Park wrote:
>>The situation is complicated by the fact that W3C is working on and has
not
>>yet released its own version of Java XML Object Model. Since it will be
>
>Is this the same as DOM? If so, is there any timescale.
>
>Not being part of the DOM process I am now somewhat confused. Does this
>mean that there is a formal program to produce an API for XML parsers? If
>so, what is the timescale? I'm sure there are some readers who are involved
>;-)
Sorry about the confusion. I am pretty careless with names and stuff. I
was refering to DOM level-one XML which btw is out already in draft form
(reality lag) at http://www.w3.org/TR/WD-DOM/level-one-xml-971209.html.
They also have one for HTML so I should be able to get through another
weekend with buying a book to read .
So, we could probably implement the UXP based on XML DOM (gosh, I am
provising terms left and right).
>I'm an impatient beast and I worry about waiting for things like this to
>happen if it's going to be a long time. During that time we'll have another
>5-10 Java based parsers, all with different terminology. In another
>proposal I will try to address the terminology :-)
That was the shortest wait ever, eh?
>Is this a short-term or long term solution? If long term, what is the
>difference/benefit between this and the OM?
Long term solution. No difference now since we have better outline of XML
DOM to work with.
>Please [ignorance] what does a registry scheme entail?
I don't know how your JUMBO allows different parsers to be used but I was
talking about registry for storing current user preferences as far as which
parser to use in your application. It could even involve some migrating DOM
liason classes for enhancing visual representation of XML documents.
Currently, I have this vexing problem of trying to figure out how to
represent an XML document as a tree of objects where each object is
something more than a tag. CDF has a Channel object which contains
attributes which represented as tags as well as contents of tags. Exposing
those attributes as a tree node would be too distracting, especially since I
have a perfectly nice object inspector to show the attributes in.
>Where is the reference for W3C OM API?
See above. Sorry again about the confucious glibbing (here I go again,
making sense only to myself).
>I take a very simple approach and find that the AElfred approach gives me
>almost everything I want. It allows me to extract the components of the
>document (start/end/content, PIs, entities) and it allows me to get almost
>everything from the DTD (except the contentspec). I don't think that *I*
>need anything more. I just don't want - and don't intend to write 30
>adapter functions for every new parser. If everyone had
>getContentSpec(String elementType) that is the level I am quite happy with
:-)
Is this a different song? Hmm, I swear I heard something else before...;-)
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Thu Dec 11 19:12:44 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 16:59:24 2004
Subject: General comments on parsers (was [NEW] AElfred)
Message-ID: <199712111912.TAA25575@GPO.iol.ie>
> Chris
>Lovett suggested using the XML-Data Schemas instead of trying to access
>the DTD info directly. When one wants access to the DTD, what is the
>recommended method?
>
You can use the XML-Data approach but maintain the ability to work
with the standard DTD syntax by using msxml to spit out the XML-Data encoding
of the DTD info and then re-parse it.
Maybe this is what you meant though... If so, sorry. If not, hope this helps.
Sean Mc Grath
sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dima at paragraph.com Thu Dec 11 21:57:02 1997
From: dima at paragraph.com (Dmitri Kondratiev)
Date: Mon Jun 7 16:59:25 2004
Subject: *Validating* XML Parser written in Java ?
Message-ID: <2.2.32.19971211215544.006caa0c@dream.paragraph.com>
Hi,
Does anybody know any free *validating* XML parserers written in Java ?
With NXP I haven't managed to suceed to validate the following :
With FooBar.dtd file in the same directory :
I have the following output :
java NXP.Cl -v -f test/test.xml
NXP - Norbert's XML Parser 0.97 - 05.08.1997
Fetch file : test/test.xml
Start parsing ...
Validate : true
Fetch file : test/FooBar.dtd
"
"
Error :
Attribute has not be declared : ID
"
"
Parsing finished - Time : 1260 msec.
Any help is most welcome!
Dima
-----------------
Dmitri Kondratiev
dima@paragraph.com
102401.2457@compuserve.com
http://www.geocities.com/SiliconValley/Lakes/3767/
tel: 07-095-464-9241
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Dec 11 22:03:53 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:25 2004
Subject: *Validating* XML Parser written in Java ?
In-Reply-To: <2.2.32.19971211215544.006caa0c@dream.paragraph.com>
References: <2.2.32.19971211215544.006caa0c@dream.paragraph.com>
Message-ID: <199712112202.RAA05850@unready.microstar.com>
Dmitri Kondratiev writes:
> Does anybody know any free *validating* XML parserers written in Java ?
There is a serious problem right now with the XML terminology. There
are at least four Java-based XML parsers right now that will parse a
DTD:
- Lark
- MSXML
- NXP (a little out of date)
- ?lfred
Of these, I think that only MSXML claims to be validating. Do you
need full validation, or do you just need a DTD-driven parser that
will pick up entity declarations, default attribute values, etc? We
really need to invent some better terms, since validation and
DTD-awareness are really separate concepts.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Dec 11 22:23:33 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:25 2004
Subject: AElfred 1.0beta3 release
Message-ID: <199712112222.RAA06459@unready.microstar.com>
There is a new release of Microstar's Ælfred XML parser at
http://www.microstar.com/XML/
The new version is still interface-compatible with the first two
public betas, but it adds the ability to query for content models and
enumerated attribute types (both returned as normalised strings, with
whitespace removed and parameter entities resolved).
With the new query routines, Ælfred is now capable of producing a
normalised version of an XML document's DTD; in fact, the distribution
now includes a new demonstration class, DtdDemo.java, that does
exactly that.
Enjoy!
David (on behalf of Microstar)
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 11 23:19:24 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:25 2004
Subject: Newbie Q: NXP attribute validation
In-Reply-To: <2.2.32.19971211172105.0091a70c@dream.paragraph.com>
Message-ID: <3.0.1.16.19971211185135.2007d876@pop3.demon.co.uk>
At 20:21 11/12/97 +0300, Dmitri Kondratiev wrote:
Hi Dima,
>Please help a newbie DTD writer :) I am trying to validate with NXP
>attribute ID in element Foo :
>
>
>
>
>
>
>
This is not a well-formed document and the last line should (probably) be:
instead of
[...]
>
>As a result I get "Attribute has not be declared : ID" error. What am I
>doing wrong ?
>
One of the problems with XML parsers (rather like compilers) is that it can
be quite difficult to produce error messages that tell you precisely what
is wrong. So I can't tell you *why* you got this message, but most error
messages are 'somewhere near' the error.
Sometimes it can be helpful to run more than one parser because they often
give different clues.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dima at paragraph.com Thu Dec 11 23:50:23 1997
From: dima at paragraph.com (Dmitri Kondratiev)
Date: Mon Jun 7 16:59:25 2004
Subject: Newbie Q: NXP attribute validation
Message-ID: <2.2.32.19971211234901.006d9474@dream.paragraph.com>
At 18:51 11.12.97, Peter Murray-Rust wrote:
...
>
>
>This is not a well-formed document and the last line should (probably) be:
>
>
instead of
>
>[...]
>>
>>As a result I get "Attribute has not be declared : ID" error. What am I
>>doing wrong ?
>>
>One of the problems with XML parsers (rather like compilers) is that it can
>be quite difficult to produce error messages that tell you precisely what
>is wrong. So I can't tell you *why* you got this message, but most error
>messages are 'somewhere near' the error.
>
>Sometimes it can be helpful to run more than one parser because they often
>give different clues.
>
Peter,
Thanks for your help. With my stupid bug ( to )
corrected, I still have the same error ! The following NXP output shows that
it uses the FooBar.dtd, as specified in test.xml file. What can be wrong then ?
--Dima
NXP output:
NXP - Norbert's XML Parser 0.97 - 05.08.1997
Fetch file : test/test.xml
Start parsing ...
Validate : true
Fetch file : test/FooBar.dtd
"
"
Error :
Attribute has not be declared : ID
"
"
Parsing finished - Time : 1260 msec.
-----------------
Dmitri Kondratiev
dima@paragraph.com
102401.2457@compuserve.com
http://www.geocities.com/SiliconValley/Lakes/3767/
tel: 07-095-464-9241
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Dec 12 00:11:46 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:25 2004
Subject: General comments on parsers
In-Reply-To: <000c01bd0664$7a677a20$0100007f@localhost>
Message-ID: <3.0.1.16.19971212005949.190f27fa@pop3.demon.co.uk>
At 10:39 11/12/97 -0800, Don Park wrote:
>Sorry about the confusion. I am pretty careless with names and stuff. I
>was refering to DOM level-one XML which btw is out already in draft form
>(reality lag) at http://www.w3.org/TR/WD-DOM/level-one-xml-971209.html.
>They also have one for HTML so I should be able to get through another
>weekend with buying a book to read .
I have had a spook through it this evening... I appreciate that an API may
come out of it.
[...]
>
>
>That was the shortest wait ever, eh?
???
>
>
>I don't know how your JUMBO allows different parsers to be used but I was
Very simple. I read the interfaces, try to understand what they are
talking about, try to configure JUMBO so it reads them, see if I understand
the results
and take it from there. Unfortunately this has to be done for very parser :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Dec 12 00:19:19 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:25 2004
Subject: *Validating* XML Parser written in Java ?
In-Reply-To: <199712112202.RAA05850@unready.microstar.com>
References: <2.2.32.19971211215544.006caa0c@dream.paragraph.com>
<2.2.32.19971211215544.006caa0c@dream.paragraph.com>
Message-ID: <3.0.1.16.19971212010654.2aefd570@pop3.demon.co.uk>
At 17:02 11/12/97 -0500, David Megginson wrote:
[...]
>Of these, I think that only MSXML claims to be validating. Do you
>need full validation, or do you just need a DTD-driven parser that
>will pick up entity declarations, default attribute values, etc? We
>really need to invent some better terms, since validation and
>DTD-awareness are really separate concepts.
Terminology is really critical here and I shall address it later. If
everyone agrees on the terms half the problems will be solved. :-)
P
Wait for the XML-based hyperglossary of XML terminology (next week)
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Fri Dec 12 04:32:56 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:59:25 2004
Subject: BUG : msxml 1.6
Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C099F41@red-msg-56.dns.microsoft.com>
You're right. I'll look into it.
> -----Original Message-----
> From: Patrice Bonhomme [SMTP:Patrice.Bonhomme@loria.fr]
> Sent: Thursday, December 11, 1997 10:01 AM
> To: Chris Lovett
> Cc: xml-dev Mailing List
> Subject: Re: BUG : msxml 1.6
>
>
>
> I have downloaded msxml 1.8 and tried to run it on my sample files and it
> seems that the EXTENTITYDCL has not been fixed. I have always a
> "stammering"
> inclusion of the external data !
>
> What's wrong ?
>
> Thanks,
>
> Pat.
>
> DOCUMENT
> |---XMLDECL
> | +---CDATA " VERSION="1.0" "
> |---WHITESPACE 0xa
> |---DOCTYPE NAME="EXAMPLE"
> | |---WHITESPACE 0xa
> | |---ELEMENTDECL EXAMPLE (P)+
> | |---WHITESPACE 0xa
> | |---ELEMENTDECL P (#PCDATA|S)*
> | |---WHITESPACE 0xa
> | |---ELEMENTDECL S (#PCDATA)*
> | |---WHITESPACE 0xa
> | +---EXTENTITYDCL incs
> | |---ELEMENT S
> | | +---PCDATA "a third."
> | +---PCDATA "a third. " <--- HERE
> |---WHITESPACE 0xa
> |---ELEMENT EXAMPLE
> | |---WHITESPACE 0xa
> | |---ELEMENT P
> | | |---ELEMENT S
> | | | +---PCDATA "A sentence."
> | | |---ELEMENT S
> | | | +---PCDATA "An another."
> | | +---ENTITYREF incs "a third.a third. " <--- AND HERE
> | +---WHITESPACE 0xa
> +---WHITESPACE 0xa
>
>
> [] Chris Lovett said:
> []---------------------------------
> ] Thanks, I have a fix already, and will be posting it shortly.
> ]
> ] > -----Original Message-----
> ] > From: Patrice Bonhomme [SMTP:Patrice.Bonhomme@loria.fr]
> ] > Sent: Saturday, November 29, 1997 1:37 AM
> ] > To: Chris Lovett
> ] > Subject: BUG : msxml 1.6
> ] >
> ] >
> ] > Hi,
> ] >
> ] > I found a bug in msxml 1.6 relative to the External Entity checking.
> ] >
> ] > Main file (test-ent.xml):
> ] >
> ] > ] >
> ] >
> ] >
> ] >
> ] >
> ] >
> ] > ]>
> ] >
> ] >
> ] > a sentence. an another.
> ] >
> ] > &inc-s;
> ] >
> ] >
> ] > Auxiliary file (inc-s.xml):
> ] > a third.
> ] >
> ] > And i ve got this message :
> ] >
> ] > % java msxml -i -d test-ext-ent.xml
> ] > Invalid element 'PCDATA' in content of 'P'. Expected [S]
> ] > Location: file:test-ext-ent.xml(14,5)
> ] > Context:
> ] >
> ] > The parser should make a difference between ENTITYREF and SYSTEM
> ] > ENTITYREF.
> ] >
> ] > Pat.
> ] > --
> ] > ==============================================================
> ] > bonhomme@loria.fr | Office : B.228
> ] > http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37
> ] > --------------------------------------------------------------
> ] > * Projet Aquarelle : http://aqua.inria.fr
> ] > * Serveur Silfide : http://www.loria.fr/Projet/Silfide
> ] > ==============================================================
> ] >
> []---------------------------------
>
>
> --
> ==============================================================
> bonhomme@loria.fr | Office : B.228
> http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37
> --------------------------------------------------------------
> * Projet Aquarelle : http://aqua.inria.fr
> * Serveur Silfide : http://www.loria.fr/Projet/Silfide
> ==============================================================
>
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Fri Dec 12 07:02:36 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:59:25 2004
Subject: MSXML 1.8 Viewer Applet problem
Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C099F4F@red-msg-56.dns.microsoft.com>
Wow, I don't get this at all. It should close the file immediately after
it's finished parsing it. Try reinstalling, there was a bad xmlinst up
there for a couple of days.
> -----Original Message-----
> From: Michael Kay [SMTP:M.H.Kay@eng.icl.co.uk]
> Sent: Thursday, December 11, 1997 9:09 AM
> To: xml-dev@ic.ac.uk
> Subject: MSXML 1.8 Viewer Applet problem
>
> I'm using the XML Viewer applet in MSXML 1.8
>
> Having trouble because there doesn't seem to be any way of closing the
> file
> after you've finished with it, so all subsequent attempts to edit the XML
> file after viewing it fail saying "file in use".
>
> Mike Kay
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From clovett at microsoft.com Fri Dec 12 07:11:50 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun 7 16:59:25 2004
Subject: *Validating* XML Parser written in Java ?
Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C099F50@red-msg-56.dns.microsoft.com>
The Microsoft XML Parser for Java is available for download from
http://www.microsoft.com/standards/xml/xmlparse.htm and I've tested it on
your example below and it works just fine. The exact terms of use are
described in the License Agreement in
http://www.microsoft.com/standards/xml/xmllic.htm, which I think you'll find
to be very open ended.
> -----Original Message-----
> From: Dmitri Kondratiev [SMTP:dima@paragraph.com]
> Sent: Thursday, December 11, 1997 1:56 PM
> To: xml-dev@ic.ac.uk
> Subject: *Validating* XML Parser written in Java ?
>
> Hi,
>
> Does anybody know any free *validating* XML parserers written in Java ?
> With NXP I haven't managed to suceed to validate the following :
>
>
>
>
>
>
>
>
> With FooBar.dtd file in the same directory :
>
>
>
>
>
> ID ID #REQUIRED>
>
>
> I have the following output :
>
> java NXP.Cl -v -f test/test.xml
>
> NXP - Norbert's XML Parser 0.97 - 05.08.1997
>
> Fetch file : test/test.xml
> Start parsing ...
> Validate : true
> Fetch file : test/FooBar.dtd
>
> "
> "
>
> Error :
> Attribute has not be declared : ID
>
> "
> "
>
> Parsing finished - Time : 1260 msec.
>
> Any help is most welcome!
> Dima
>
>
> -----------------
> Dmitri Kondratiev
> dima@paragraph.com
> 102401.2457@compuserve.com
> http://www.geocities.com/SiliconValley/Lakes/3767/
> tel: 07-095-464-9241
>
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
> message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Fri Dec 12 08:50:27 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 16:59:25 2004
Subject: Classification of XML Parsers
Message-ID: <199712120850.IAA18479@GPO.iol.ie>
The real truth behind XML's simplicity and ease of implementaton is being badly
let down by the haziness with with parsers are classified:-
Well Formed
Valid
Type Valid (In the DOM level 1 spec.)
Tag Valid (ditto)
DTD Aware (Aelfred)
Then there is a bevvy of terminology to do with what the parsers do and do
not provide
the application
- Comments
- Expansion of general entities
- Access to element type declarations
etc.
Given that it is on this list that most of the implementors hang out I think
we could
usefully attempt to put together a classification.
Also, from a quick reading of the DOM there does not seem to be a node type
for unexpanded
general entity. How come?
Sean Mc Grath
sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Fri Dec 12 11:29:27 1997
From: mecom-gmbh at mixx.de (james anderson)
Date: Mon Jun 7 16:59:25 2004
Subject: external dtd subset content
Message-ID: <34912136.2C5BEBBA@mixx.de>
we're trying to understand the necessary form for the external dtd
subset.
in particular two questions have arisen.
1? since the external subset contains markup declarations only, it
would appear that it establishes no constraint on the root element. is
it legitimate to use the same dtd for various xml documents, each with a
different root element?
2? among the example DTD's we've found, some begin with an
form. others don't. isn't that form excluded from being a PI and thus
from being a markupdecl?
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Fri Dec 12 14:22:25 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:25 2004
Subject: Java DOM ObjectBuilder
Message-ID:
I have a first pass at an ObjectBuilder that generates objects based on
the W3C Java DOM Interfaces[*]. So any XML-Parser with a BuilderClient
[currently MS-XML and Aelfred] can generate DOM objects, including the
Model information itself. It is also easy to modify both the objects and
the construction process to be different from the DOM specific ones (e.g.
"Tag" specific objects instead of generic Elements). This applies to the
DTD objects (Use a different Node, ElementDefinition, or any other
interface/class) as well as the normal Element content.
If enough people are interested I will try to make a specific release of
this code and the minimum amount of MONDO that is needed to make it work
(see below for size information), otherwise I will include it as an
example in the next MONDO release. The rest of this mail just discusses
the details a bit more.
-------------------------------------------------------
The DOM ObjectBuilding process can generally be 1-pass (direct) from the
parser, except for the DTD which parsers digest first and must be
'redescribed' to the builder. For Aelfred, it looks something like this:
+---------------+
XmlParser->| XmlProcessor |-->DOMObBuilder->SpecificFactory->>DOMObject
| BuilderClient |^ or BeanFactory v
+---------------+ \<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Where the '->>' indicates the Factory actually creates an object (at
least conceptually) and the '<<<' is a return line for that object to be
used in the subsequent recipe. Generally the arrows to the right are the
response to an ESIS type of event, but the ordering for building is
sometimes a little different (attribute processing occurs inside an
object's context, not before). You can think of Recipes as a more
general ESIS event model with a feedback loop.
Sending the DTD across is the one exception in terms of the ESIS analogy,
because it is not a part of the event flow. It needs to be redescribed
as soon as it is available. For Aelfred, the DTD is sent to the builder
at the 'doctypeDecl' which looks like this at the moment [We are in the
XmlProcessor/BuilderClient]:
public void doctypeDecl (XmlParser p, String name, String pubid,
String sysid)
{
this.startObject(DOM_DOCUMENT_TYPE_RECIPE);
this.startParameter("externalSubset");
Enumeration enum = p.declaredElements();
while (enum.hasMoreElements()) {
String elementName = (String) enum.nextElement();
buildObjectForElementDefNamed_in(elementName,p);
}
this.finishParameter();
this.finishObject();
}
Conceptually, the recipe for a DocumentType looks like:
-----------------
occurrence =
tokens = (
)
>
>
...
-----------------
or in an XML-Recipe form it would look like:
-----------------
...
-----------------
The DTD recipe and the normal Element content recipes are shipped to the
ObjectBuilder which has the necessary factories to build objects from the
recipe. For the DTD recipes it builds pre-known and very specific
classes: "Document", "ElementDefinition", "ModelGroup", etc. For the
Element content the ObjectBuilder currently builds a generic Element
hierarchy. The construction process for both the DTD and the Elements
can be easily (and almost arbitrarily) changed. The two semi-constants
ar the DOM recipes which are encoded into the DOM-oriented BuilderClient
and the source document itself. It is also easy to turn on and off the
DTD generation in the BuilderClient, and the result of a document without
a DTD is a DOM Document object with a null DocumentType.
SIZE and Other Info
===================
The total amount of MONDO-oriented DOM Building code is about 10K. This
is divided into 6 factories for the enumerated types, 1 factory for
Document, and 1 main builder. The rest of the DOM was done with a Bean
Factory. The BuilderClient is another 9K for a stack-based version
(Aelfred) and a bit less for an object-based version (MS-XML).
BuilderClients are pretty easy to write, about two hours or so for me,
but I haven't gotten around to the other parsers yet.
MONDO itself is a bit large (~100K + requires ~100K general library)
but I am trying to produce a version (mindo) that only includes what is
needed for this type of task which may be 50K for mindo and 40-60K for
the general library.
The DOM interfaces are about 10K and the skeleton classes are 16K. The
classes only serve the purpose of construction and printing (i.e.
dumping). More interesting classes would be quite a bit larger.
[*] Note that I modified the DOM interfaces to: (1) fix what I thought
were bugs or deprecated behavior (2) Provide some extra services (e.g.
Integer objects for the 'int's) (3) collapse specific types into more
generic Map and List collections and (4) Added a naming convention (i.e.
suffixing an interface which only has constants in it with 'Constants').
Changing things back into the original form should be easy (I have the
originals from the spec also) and should have little significance to the
rest of the process.
==========================================
For more information on MONDO see
http://www.chimu.com/projects/mondo
Part of the design document is in HTML now and for this particular topic
(XML->DOM Objects), you might want to look at Chapters 2&4 at:
http://www.chimu.com/projects/mondo/design/part0002.html
http://www.chimu.com/projects/mondo/design/part0004.html
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Fri Dec 12 15:10:10 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:26 2004
Subject: Classification: XML Parser Features
In-Reply-To: <199712120850.IAA18479@GPO.iol.ie>
References: <199712120850.IAA18479@GPO.iol.ie>
Message-ID: <199712121508.KAA00821@unready.microstar.com>
Sean Mc Grath writes:
> The real truth behind XML's simplicity and ease of implementaton is being badly
> let down by the haziness with with parsers are classified:-
>
> Well Formed
> Valid
> Type Valid (In the DOM level 1 spec.)
> Tag Valid (ditto)
> DTD Aware (Aelfred)
I'd suggest that there at least three logically-separate realms of
here, all of which we've been overloading onto the same single set of
terminology. Here's what I suggest:
Realm #1: Functionality
a) Scanning
This type of parser simply skips the DOCTYPE declaration (using
regular expressions) and parses the markup in the document
instances. It is not required to handle any but the built-in
entities, and as a result, does not include any external entities.
For the purposes of whitespace handling, it assumes that all
specified attributes are CDATA and that all elements have mixed
content.
Optionally, a scanning parser may attempt to extract some
information from the DOCTYPE declaration, such as entity
declarations and attribute default values.
b) DTD-driven
This type of parser reads the DTD (both internal and external
subsets) to obtain entity declarations, attribute declarations, and
element-type declarations. It handles any entities declared in the
DTD (internal or external), and provides default values when
attributes are not specified. For the purposes of whitespace
handling, it uses the declared type for each attribute, and
distinguishes between element types with element content and
elements with mixed content.
Realm #2: Validation
a) Non-validating
This type of parser assumes that its input document is both
well-formed and valid, and is not required to report any errors at
all.
Optionally, a non-validating parser may report some lexical or
DTD-related errors, but it does not qualify as a well-formed or
validating parser unless it reports _all_ relevant errors.
b) Well-formed
This type of parser reports any lexical errors in an XML document
(including well-formedness constraints in the spec), but is not
required to report DTD-related errors (such as attribute-type
mismatches, elements out of context, etc.). A well-formed parser
must report an error for all 141 tests in James Clark's test suite.
Optionally, a well-formed parser may report some DTD-related
errors, but it does not qualify as a validating parser unless it
reports _all_ DTD-related errors.
c) Validating
A validating parser must report all of the errors reported by a
well-formed parser, together with all DTD-related errors ("validity
constraints" in the spec), such as elements in contexts not allowed
by the current content model, attempts to change #FIXED attributes,
failure to specify #REQUIRED attributes, unresolved IDREFS, and
attribute-type-mismatches.
Validating parsers must provide DTD-driven functionality.
Realm #3: Interface
a) Event-based
An event parser returns a series of XML document events,
such as character data or the start or end of an element, usually
through call-backs to user-defined handlers. Events are returned
in the order that they occur in the XML source document.
b) Tree-based
A tree-based parser builds an in-memory tree of an entire
document, then provides some means for the user to navigate the
tree. The user is not constrained to navigating the tree in the
order that it was parser. Tree-based parsers are often built on
top of an event-based layer.
According to this classification, ?lfred is a DTD-driven,
non-validating, event-based XML parser.
There are other realms, including the type of information delivered by
a parser (simple ESIS-like production information, or full information
for an XML editor, such as comments, ignored whitespace, etc.), but I
think that we would be best standardise a few basic terms first.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Fri Dec 12 15:14:09 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:26 2004
Subject: external dtd subset content
In-Reply-To: <34912136.2C5BEBBA@mixx.de>
References: <34912136.2C5BEBBA@mixx.de>
Message-ID: <199712121513.KAA00842@unready.microstar.com>
james anderson writes:
> we're trying to understand the necessary form for the external dtd
> subset.
> in particular two questions have arisen.
>
> 1? since the external subset contains markup declarations only, it
> would appear that it establishes no constraint on the root element. is
> it legitimate to use the same dtd for various xml documents, each with a
> different root element?
Yes -- that's standard practice in the SGML world (you can use the
same external DTD for an entire book or for just one chapter of it).
> 2? among the example DTD's we've found, some begin with an
> form. others don't. isn't that form excluded from being a PI and thus
> from being a markupdecl?
No, on two counts:
1) The grammatical production for markupdecl [30] explicitly includes
processing instructions.
2) The that you see at the beginning of the external subset
is not a processing instruction but a text declaration (which is
similar but not identical to an XML declaration). For example, if
my external subset were encoded in ISO Latin 1, I would be required
to put the following declaration at the top:
If it were in ASCII, however, I could just let the encoding default
to UTF-8.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Fri Dec 12 15:37:28 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:26 2004
Subject: external dtd subset content
Message-ID: <3.0.32.19971212073419.00967b00@pop.intergate.bc.ca>
At 12:34 PM 12/12/97 +0100, james anderson wrote:
>1? since the external subset contains markup declarations only, it
>would appear that it establishes no constraint on the root element. is
>it legitimate to use the same dtd for various xml documents, each with a
>different root element?
That's right.
>2? among the example DTD's we've found, some begin with an
>form. others don't. isn't that form excluded from being a PI and thus
>from being a markupdecl?
I think that should be OK since a DTD is an external parsed entity.
But I've put your mail in the errata file to make sure it's clear enough.
-Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Fri Dec 12 15:37:31 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:26 2004
Subject: Classification: XML Parser Features
Message-ID: <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca>
At 10:08 AM 12/12/97 -0500, David Megginson wrote:
>Realm #1: Functionality
>
>a) Scanning
> This type of parser simply skips the DOCTYPE declaration (using
> regular expressions) and parses the markup in the document
> instances.
This is not a conformant XML processor per the spec.
There are certain things a processor is required to do with the internal
subset, including parse it and check it for syntax.
>b) DTD-driven
There are a whole range of behaviors. Parsers may, not must, read
external markup declarations and external parsed entities.
>Realm #2: Validation
>
>a) Non-validating
> This type of parser assumes that its input document is both
> well-formed and valid, and is not required to report any errors at
> all.
No such animal is envisioned in the standard. If it doesn't check for
WF problems, it's not an XML processor.
I'll stop here. I suggest you go back and re-work your (potentially helpful)
list based on a re-reading of the specification. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Fri Dec 12 15:41:35 1997
From: mecom-gmbh at mixx.de (james anderson)
Date: Mon Jun 7 16:59:26 2004
Subject: external dtd subset content
References: <34912136.2C5BEBBA@mixx.de> <199712121513.KAA00842@unready.microstar.com>
Message-ID: <34915C46.21E8527A@mixx.de>
aha!
i was missing the relation between external parsed entity and external subset.
to the drafters: a link from the discussion between [30] and [31] down to the
discussion concerning parsed entities ([78]+) would help here.
thanks.
David Megginson wrote:
> james anderson writes:
>
> > 2? among the example DTD's we've found, some begin with an
> > form. others don't. isn't that form excluded from being a PI and thus
> > from being a markupdecl?
>
> No, on two counts:
>
> ...
> 2) The that you see at the beginning of the external subset
> is not a processing instruction but a text declaration (which is
> similar but not identical to an XML declaration). For example, if
> my external subset were encoded in ISO Latin 1, I would be required
> to put the following declaration at the top:
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Fri Dec 12 17:19:13 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:26 2004
Subject: Classification: XML Parser Features
In-Reply-To: <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca>
References: <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca>
Message-ID: <199712121717.MAA01762@unready.microstar.com>
Tim Bray writes:
> >a) Scanning
> > This type of parser simply skips the DOCTYPE declaration (using
> > regular expressions) and parses the markup in the document
> > instances.
>
> This is not a conformant XML processor per the spec.
>
> There are certain things a processor is required to do with the internal
> subset, including parse it and check it for syntax.
Quite right; to my knowledge, however, there exist no XML processors
that do so, except possibly for James's new one (I haven't tried it).
In particular, few handle UTF-8 correctly. As I've mentioned in
private e-mail, even the 1997-12-08 spec is not currently well-formed,
since it uses ISO-8859-1 encoding without saying so in its encoding
declaration, so any conforming processor would have to reject it.
More generally, this requirement makes no provision for the desperate
Perl hacker who has played such a central role in XML discussions.
Creating a truly well-formed parser is very, very difficult, because
of the enormous number of constraints imposed both explicitly and
implicitly by the grammar (I could probably write a full SGML parser
with about the same level of effort, especially if I limited myself to
a single, simple SGML declaration).
For example, both ?lfred and Lark fail to report the two errors in the
following document:
This is a ]]> paragraph.
I could support complete well-formedness error reporting in ?lfred,
but its size would bloat to about 35-40K (entity-boundary checking, in
particular, would be messy), while I still want to get it down to
under 20K so that Java applet writers can use it. I did have a
version that passed the first 101 of James Clark's 141 tests, but it
was already at about 30K, and I was aware of many other cases that he
wasn't testing for.
> >b) DTD-driven
>
> There are a whole range of behaviors. Parsers may, not must, read
> external markup declarations and external parsed entities.
Yes, you control that using the standalone declaration. I am
recommending that parsers that do not handle the full DTD (internal
and external) be referred to as "scanning parsers", while parsers that
handle everything be referred to as "DTD-driven parsers". If
necessary, we could always add another degree in the middle.
> >Realm #2: Validation
> >
> >a) Non-validating
> > This type of parser assumes that its input document is both
> > well-formed and valid, and is not required to report any errors at
> > all.
>
> No such animal is envisioned in the standard. If it doesn't check for
> WF problems, it's not an XML processor.
I am aware of the constraints in the spec, but I believe that this is
a serious strategic error. ?lfred is a non-conforming XML processor,
as are Lark, MSXML, and all others that I have had a chance to try:
?lfred will produce correct output for valid and well-formed XML
documents, but will not necessarily report errors for documents that
are not valid/well-formed.
If the XML spec does not make allowance for software tools like these,
then it will have little to distinguish it from full SGML except for a
bit of marketing hype.
> I'll stop here. I suggest you go back and re-work your
> (potentially helpful) list based on a re-reading of the
> specification. -Tim
Thank you very much for your comments. I am grateful for the work
that you and the rest of the WG have done with the spec, and I hope
that you find my comments constructive rather than confrontational.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Dec 12 19:22:04 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:26 2004
Subject: LISTRIVIA
In-Reply-To: <2F2DC5CE035DD1118C8E00805FFE354C099F41@red-msg-56.dns.micr
osoft.com>
Message-ID: <3.0.1.16.19971212200149.30077e3a@pop3.demon.co.uk>
At 20:32 11/12/97 -0800, [several people] wrote:
A very short message
>but
>included
>a
>great
>deal
>of
>unnecessary
>quoted
>material
Please try to cut down the volume of material you quote :-) As I said
before I have to pay for this personally. Some people are charged by volume
of e-mail.
Any mailer that quotes is also able to delete material. Good quoting is
not only courteous, but it makes what you write more valuable to read.
Unlike many lists, one of the purposes of XML is to produce attractive
documents :-)
P.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Dec 12 23:34:56 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:26 2004
Subject: Classification: XML Parser Features
In-Reply-To: <199712121717.MAA01762@unready.microstar.com>
References: <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca>
<3.0.32.19971212073841.009ac460@pop.intergate.bc.ca>
Message-ID: <3.0.1.16.19971213000403.3177233e@pop3.demon.co.uk>
At 12:17 12/12/97 -0500, David Megginson wrote:
>Tim Bray writes:
[.. extremely important discussion deleted ...]
I also (unfortunately) have sympathy with David's view that it's harder to
write a conforming parser than appears on first reading. I agree that there
are few if any fully conforming parsers at present.
> > I'll stop here. I suggest you go back and re-work your
> > (potentially helpful) list based on a re-reading of the
> > specification. -Tim
>
>Thank you very much for your comments. I am grateful for the work
>that you and the rest of the WG have done with the spec, and I hope
>that you find my comments constructive rather than confrontational.
>
I am sure this is not a confrontational issue. I think David has made an
excellent first pass at defining what we need to do. WG and SIG discussions
(which David has not seen) are confidential, but it's clear from the
relatively recent introduction of 'standalone' that this issue has been
thought about.
I do not believe this problem is solved yet. I have always felt that until
we get working prototypes we shall not uncover all the difficult semantic
problems. It is exactly now that they will start to appear with a 'stable'
spec and a crop of new software. If you think 'no need to write a new
parser, it's all been done' that's probably optimistic.
The problem is that the semantics are very hidden and depend on what your
background is. You may use SGML as a marker and it would be *logical* to
design an XML parser to do exactly what an SGML one does. However, XML
deliberately introduces flexibility into the spec, and in so doing
introduces fuzziness. If anyone thinks this isn't a fuzzy area, state
precisely what you think of David's classification (amended if necessary).
Only if most of the 'XML experts' agree, can we say it isn't fuzzy.
There will be worse fuzziness introduced if it isn't clear to
'non-XML-experts' what to do. IMO there are still areas of difficulty and
different authors will introduce different 'features' - often without
realising it.
I suspect that a useful way forward will be to attach commandline options
to parsers. They are already potentially required for 'may' clauses.
Perhaps we should identify the areas where there are two schools of thought
(e.g
'assume document is WF'/'check for WF error') and add a switch. Then the
newcomers will understand that there is an area they have to think about.
These may also help to clarify the drafters' minds if necessary.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Dec 12 23:35:53 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:26 2004
Subject: external dtd subset content
In-Reply-To: <34912136.2C5BEBBA@mixx.de>
Message-ID: <3.0.1.16.19971213001510.30077718@pop3.demon.co.uk>
At 12:34 12/12/97 +0100, james anderson wrote:
>we're trying to understand the necessary form for the external dtd
>subset.
>in particular two questions have arisen.
>
>1? since the external subset contains markup declarations only, it
>would appear that it establishes no constraint on the root element. is
>it legitimate to use the same dtd for various xml documents, each with a
>different root element?
Good point! I have never really understood why it's necessary to have
consistency between the root element and the doctypedeclName. For example
If I am authoring HTML 2.0 (assume there is an official XML DTD) and I write:
This is a para
that is presumably valid, but:
This is a para
is invalid. Is this what the WG intends? If so, what's the rationale?
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Jon.Bosak at eng.Sun.COM Fri Dec 12 23:59:22 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun 7 16:59:26 2004
Subject: XML 1.0 Proposed Recommendation
Message-ID: <199712122357.PAA21803@boethius.eng.sun.com>
XML 1.0 is now a W3C Proposed Recommendation:
http://www.w3.org/TR/PR-xml
The announcement was formally made at the SGML/XML '97 Conference in
Washington, D.C. on Monday, December 8, 1997. This is the same
conference at which the first Working Draft of the XML specification
was released in November, 1996.
W3C member organizations now have about six weeks to vote on the PR.
Organizations may vote yes; yes, with comments; no, unless specified
deficiencies are corrected; or no, this Proposed Recommendation should
be abandoned. During this voting period, the XML Working Group
expects to resolve minor technical issues and communicate its results
to the W3C Director. After this time, the Director will announce the
disposition of the document; it may become a W3C Recommendation
(possibly with minor changes), revert to Working Draft status, or may
be dropped as a W3C work item.
While the disposition of the Proposed Recommendation is entirely at
the discretion of the Director, the XML Working Group considers its
work on XML 1.0 to be complete and does not expect to be making
substantive changes to the proposal as it now stands. There have been
a number of requests for enhancement to the specification that will be
considered for XML 1.1, but at this time the WG is strongly inclined
to delay work on XML 1.1 until some experience has been gained with
implementations of XML 1.0. In the meantime, the WG will continue its
work on XLL, the part of the XML family of specifications that deals
with linking and addressing.
Jon Bosak
Chairman, W3C XML Working Group
----------------------------------------------------------------------
Jon Bosak, Online Information Technology Architect, Sun Microsystems
----------------------------------------------------------------------
901 San Antonio Road, MPK17-101 | Best is he that inuents,
Palo Alto, California 94303 | the next he that followes
ISO/IEC JTC1/WG4::NCITS V1::SGML Open | forth and eekes out a good
Davenport Group::W3C XML WG and SIG | inuention.
----------------------------------------------------------------------
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From dima at paragraph.com Sat Dec 13 00:18:46 1997
From: dima at paragraph.com (Dmitri Kondratiev)
Date: Mon Jun 7 16:59:26 2004
Subject: ?SGML decl. for XML to run in NSGMLS?
Message-ID: <2.2.32.19971213001737.006c0330@dream.paragraph.com>
I am trying to validate my xml with James Clark's nsgmls. When I use xml.dcl
that SP distribution has together with nsgmls, I get lots of the following
error messages :
SPAM\BIN\NSGMLS.EXE:spam\pubtext\xml.dcl:48:20:E: there is no unique
character in the document character set corresponding to character number
12288 in the syntax reference character set
What SGML declaration for XML should I use ? Any help is most welcome!
Thanks,
Dima
-----------------
Dmitri Kondratiev
dima@paragraph.com
102401.2457@compuserve.com
http://www.geocities.com/SiliconValley/Lakes/3767/
tel: 07-095-464-9241
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sat Dec 13 00:24:48 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:27 2004
Subject: ?SGML decl. for XML to run in NSGMLS?
Message-ID: <3.0.32.19971212162547.0099cb50@pop.intergate.bc.ca>
At 03:17 AM 13/12/97 +0300, Dmitri Kondratiev wrote:
>I am trying to validate my xml with James Clark's nsgmls. When I use xml.dcl
>that SP distribution has together with nsgmls, I get lots of the following
>error messages :
First of all, you should use James' nsgmlsu (the u is for Unicode).
Second, you might want to try the attached for an SGML declaration,
I'm not sure James has finished polishing it, but it's what we use
for the XML spec and it's close. -Tim
-------------- next part --------------
"
PIC "?>"
SHORTREF SGMLREF
NAMES SGMLREF
QUANTITY SGMLREF
-- Quantities are not restricted in XML --
ATTCNT 99999999
ATTSPLEN 99999999
-- BSEQLEN NOT USED --
-- DTAGLEN NOT USED --
-- DTEMPLEN NOT USED --
ENTLVL 99999999
GRPCNT 99999999
GRPGTCNT 99999999
GRPLVL 99999999
LITLEN 99999999
NAMELEN 99999999
-- NORMSEP NO NEED TO CHANGE IT --
PILEN 99999999
TAGLEN 99999999
TAGLVL 99999999
FEATURES
MINIMIZE
DATATAG NO
OMITTAG NO
RANK NO
-- SHORTTAG is the only allowed feature. It is required. --
SHORTTAG YES -- SHORTTAG is needed for NET --
LINK
SIMPLE NO
IMPLICIT NO
EXPLICIT NO
OTHER
CONCUR NO
SUBDOC NO
FORMAL NO
APPINFO NONE -- ??? Do we want some APPINFO ??? --
>
From tbray at textuality.com Sat Dec 13 00:53:03 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:27 2004
Subject: external dtd subset content
Message-ID: <3.0.32.19971212165211.009a78c0@pop.intergate.bc.ca>
At 12:34 PM 12/12/97 +0100, james anderson wrote:
>2? among the example DTD's we've found, some begin with an
>form. others don't. isn't that form excluded from being a PI and thus
>from being a markupdecl?
I just got around to checking the spec, and it's pretty clear. Section 4.3.2
makes it clear that an external PE can begin with an
At 12:17 PM 12/12/97 -0500, David Megginson wrote:
>Creating a truly well-formed parser is very, very difficult, because
>of the enormous number of constraints imposed both explicitly and
>implicitly by the grammar (I could probably write a full SGML parser
>with about the same level of effort, especially if I limited myself to
>a single, simple SGML declaration).
To start with, "full SGML parser" is directly contradictory to "a single
SGML declaration" - abstract syntax in fact being one of the things
that makes a full parser hard to write.
As to David's main point, that a WF parser is hard to write, I don't
agree; most of the work can be done in the low-level lexer, the number
of constraints that require ad-hoc code is pretty small. Two things
are in fact hard, it seems:
1. handling multiple input encodings, and
2. making it run real fast while you're doing #1.
These don't really bother me that much as we are in the infancy of
learning what the right way is to build truly internationalized
software; for example, I can parse the UTF16 Japanese version of the
XML spec in a few seconds; then it takes the best part of a minute
to load the .ttf for the Unicode font so you can look at anything;
so we have a few problems in this area.
Having said that, I am now in the middle of coding up validation for
Lark, and there are a TREMENDOUS NUMBER of irritating little
details about that. No rocket science at all, but the code is going
to be substantially larger than the rest of Lark and it's all real
code; more than half of Lark is compressed parser tables.
Mind you, the validator is in a separate package and can be bypassed, so
Lark effectively need be no larger. But still; I wonder if validation
is intrinsically hard or we could have found a better 80/20 point? -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 13 01:08:13 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:27 2004
Subject: LISTRIVIA and Re: ?SGML decl. for XML to run in NSGMLS?
In-Reply-To: <3.0.32.19971212162547.0099cb50@pop.intergate.bc.ca>
Message-ID: <3.0.1.16.19971213015731.323f1db4@pop3.demon.co.uk>
I would be grateful if poster to xml-dev did not attach documents since
they do not appear on the hypermail and can also cause problems with the
software. I suspect that there a considerable number of people who read the
XML-DEV list through the hypermail system rather than subscribing.
I have therefore included the attachment in clear in this message. FWIW the
word 'Alphbet' is an unusual spelling, and since it occurs in an FPI is
presumably significant.
P.
At 16:26 12/12/97 -0800, Tim Bray wrote:
[... human-readable text deleted ...]
>
>Attachment Converted: "c:\eudora\attach\xml.dcl"
"
PIC "?>"
SHORTREF SGMLREF
NAMES SGMLREF
QUANTITY SGMLREF
-- Quantities are not restricted in XML --
ATTCNT 99999999
ATTSPLEN 99999999
-- BSEQLEN NOT USED --
-- DTAGLEN NOT USED --
-- DTEMPLEN NOT USED --
ENTLVL 99999999
GRPCNT 99999999
GRPGTCNT 99999999
GRPLVL 99999999
LITLEN 99999999
NAMELEN 99999999
-- NORMSEP NO NEED TO CHANGE IT --
PILEN 99999999
TAGLEN 99999999
TAGLVL 99999999
FEATURES
MINIMIZE
DATATAG NO
OMITTAG NO
RANK NO
-- SHORTTAG is the only allowed feature. It is required. --
SHORTTAG YES -- SHORTTAG is needed for NET --
LINK
SIMPLE NO
IMPLICIT NO
EXPLICIT NO
OTHER
CONCUR NO
SUBDOC NO
FORMAL NO
APPINFO NONE -- ??? Do we want some APPINFO ??? --
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Sat Dec 13 06:03:07 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:27 2004
Subject: Classification: XML Parser Features
References: <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca> <199712121717.MAA01762@unready.microstar.com>
Message-ID: <3492CB04.6A99DD70@jclark.com>
David Megginson wrote:
>
> Tim Bray writes:
>
> > >a) Scanning
> > > This type of parser simply skips the DOCTYPE declaration (using
> > > regular expressions) and parses the markup in the document
> > > instances.
> >
> > This is not a conformant XML processor per the spec.
> >
> > There are certain things a processor is required to do with the internal
> > subset, including parse it and check it for syntax.
>
> Quite right; to my knowledge, however, there exist no XML processors
> that do so, except possibly for James's new one (I haven't tried it).
> In particular, few handle UTF-8 correctly. As I've mentioned in
> private e-mail, even the 1997-12-08 spec is not currently well-formed,
> since it uses ISO-8859-1 encoding without saying so in its encoding
> declaration, so any conforming processor would have to reject it.
The spec says that not specifying the right encoding is merely an error
(which means a processor is not required to detect it) rather than a
fatal error. In general a processor can't detect whether the specified
encoding is correct or not (consider ISO-8859-1 v ISO-8859-2).
> More generally, this requirement makes no provision for the desperate
> Perl hacker who has played such a central role in XML discussions.
The desperate Perl hacker doesn't require his code to be blessed as a
conforming XML processor. One reason for requiring conforming parsers
to detect and report errors is to avoid the situation we see now with
HTML where it has become extremely difficult to create a production
quality HTML processor because users have come to expect an HTML
processor to accept almost any random garbage they throw at it.
Personally I would have preferred to see XML allow conforming processors
to continue processing in the presence of errors, but I think the
decision to require that errors be detected and reported was the right
one.
> Creating a truly well-formed parser is very, very difficult, because
> of the enormous number of constraints imposed both explicitly and
> implicitly by the grammar (I could probably write a full SGML parser
> with about the same level of effort, especially if I limited myself to
> a single, simple SGML declaration).
I think that assessment is way off base. My xmlwf processor aims to
catch all well-formedness errors. There are a couple of cases I know
the current version doesn't catch and there are probably a few cases
I've missed, but I think it is pretty close. I wouldn't say writing it
was very, very difficult. However it's certainly not trivial, and does
require considerable attention to detail. I think having a test suite
should help here. Getting good performance also requires effort.
There are a couple of things in this area I would like to see 1.1
change:
- for well-formedness almost any character should be allowed as a name
character; detailed checking of a character against the table of name
characters should be a validity check;
- whitespace in the prolog shouldn't be handled in the grammar, but
should instead be regularised (still compatible with ISO 8879 of course)
and handled at a lexical level.
A fully conforming SGML parser (even one limited to a single SGML
declaration) is substantially more difficult. For example, in order to
enforce the RS/RE ignoring rules a parser has to determine whether an
element is an inclusion or not, which in turn requires it to do content
checking.
> I did have a
> version that passed the first 101 of James Clark's 141 tests, but it
> was already at about 30K, and I was aware of many other cases that he
> wasn't testing for.
Additional test cases are welcome. (By the way, test 088.xml was
overtaken by events and is now well-formed.)
> > >b) DTD-driven
> >
> > There are a whole range of behaviors. Parsers may, not must, read
> > external markup declarations and external parsed entities.
>
> Yes, you control that using the standalone declaration. I am
> recommending that parsers that do not handle the full DTD (internal
> and external) be referred to as "scanning parsers", while parsers that
> handle everything be referred to as "DTD-driven parsers". If
> necessary, we could always add another degree in the middle.
The intent (at least as I understand it) was to enable the following two
classes of parser:
- standalone parsers which can handle only the internal subset (and
hence which are able to produce the correct parse only for documents
which specify or could specify standalone="yes")
- full parsers which can parse the complete DTD.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Sat Dec 13 10:10:06 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:28 2004
Subject: Test cases and xmltok updated
Message-ID: <3493068A.57CB5B4A@jclark.com>
I've updated my collection of test cases at
ftp://ftp.jclark.com/pub/test/xmltest.zip
I changed one test case (088.xml) to reflect a change in the XML spec
and added some more tests. There are now 164 test cases which all fail
to be well formed according to the XML Proposed Tecommendation.
I've also updated my XML tokenizer/well-formedness checker at
ftp://ftp.jclark.com/pub/test/xmltok.zip
I believe this is now up to date for the XML Proposed Recommendation. I
know of one well-formedness violation it fails to detect: when the
encoding is UTF-8 it fails to detect illegal characters whose encoding
requires more than one byte (ie 0xFFFF, 0xFFFE, surrogates and
characters >= 0x10000). If you find any others, please let me know.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rsiera at steunpunt.be Sat Dec 13 13:42:34 1997
From: rsiera at steunpunt.be (Robrecht Siera)
Date: Mon Jun 7 16:59:28 2004
Subject: XML software for Visual Basic
Message-ID: <34937a0a.957564@mailhost.innet.be>
It is getting more and more interesting to start programming
applications using XML. Until now Java gets the most attention to
do this programming in (for obvious reasons).
But to have some programming routines for Visual Basic would be very
welcome also. Because we would like to develop a data management
and data exchange application where XML is used as file format.
Is anybody capable of developing such parser routines or API usable
in Visual Basic ?
Groetjes,
Robrecht Siera
------------------------------------------------
In Petto - Jeugddienst Informatie en Preventie
In Petto - National Youth Service for Youth Information and Prevention
Diksmuidelaan 50, 2600 Berchem, Belgium
tel +32/3/366.15.20, +32/3/366.45.45
fax +32/3/366.11.58
email: inpetto@cybco.be
www : http://www.cybco.be/inpetto
------------------------------------------------
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 13 14:22:46 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:28 2004
Subject: Classification: XML Parser Features
In-Reply-To: <3.0.32.19971212170758.009ab470@pop.intergate.bc.ca>
Message-ID: <3.0.1.16.19971213121125.3f37eb1e@pop3.demon.co.uk>
At 17:08 12/12/97 -0800, Tim Bray wrote:
>At 12:17 PM 12/12/97 -0500, David Megginson wrote:
>>Creating a truly well-formed parser is very, very difficult, because
^^^^^^^^^^^^^^^^^^^^
I think I would rephrase this - like TimB - to read something like:
"Creating a WF parser is a *lot* of work with a large number of small
decisions where the author may not always get help from the spec."
The author has to make (small) decisions which may appear intuitive to her
but may be interpreted differently by others. These decisions may not
matter in the vast majority of cases.
There is/was a measure for XML that a 'mythical computer science graduate
student' could hack up a parser in a couple of weeks. Armed with this
promise I set about writing a recursive descent parser (which still exists
in JUMBO and is the default). But I have stopped working on it because (a)
others have written much better ones and (b) it's a lot more work than it
looks. Not difficult, I suspect, (it *was* difficult with the early version
of PEs) but lots of unrelated niggles.
As an example I started writing an editor for WF XML, including editing
elementTypes and attributes. I suddenly realised that I had to check for
Name validity - as highlighted by James Clark. This requires validating
characters against Appendix B of the spec. I applaud and support the WG's
concentration on Internationalization (i18n) but when confronted with
Appendix B at midnight, the heart sinks. The tendency is just to insert
'This document is not yet i18n-conformant' and get on with more exciting
things (like why the program crashes).
In writing JUMBO I have come across a large number of these little things
which I don't feel the spec resolves. I am very happy to leave the
parser-related things to those people who do it better (than me). But
SeanM/DavidM correctly raise the question of what a parser emits. I am
still not sure what the distinction between a parser, a processor and an
application is - I keep asking and have failed to get a reply. This is
dangerous because (a) 'processor' is used in the spec but 'parser' isn't
(b) it's quite clear from discussions on this list that:
- some people think processor and parser are synonyms.
---------------------- --------------------
|Parser aka Processor| ----------> | Application |
---------------------- --------------------
- some people think parser and processor are completely separate
-------- ------------ --------------------
|Parser| -------> | Processor| ----------> | Application |
-------- ------------ --------------------
- some people think that a processor is a unit which contains a parser but
has additional integrated facilities.
---------------------------------
| Processor |
| ----------- | --------------------
| | Parser | | ----------> | Application |
| ----------- | --------------------
---------------------------------
*** I suggest that the first time anyone uses the word 'parser' or
'processor' in this discussion they indicate what they think a processor
is. Unless we have some ideas of each other's ontologies we shall have
serious problems.
The problems with what a parser is, are tricky but nothing compared with
the semantic difficulties of passing the output of 'a processor' to 'an
application'. The spec gives no help with this, except to highlight some
areas of difficulty and - effectively - to say 'this is up to you'. I'd
like it to be partly 'up to XML-DEV', which is why this discussion is *so*
important.
Please don't think that anyone raising problems here is simply unable to
understand the spec or hasn't read it properly. Those involved in writing
the spec have a combined weight of perhaps 500 years of working with SGML
and other document processing tools. Many of the readers of this list are
coming to these discussions with different backgrounds and do not pick up
the 'implied' or 'given' semantics in the spec. I'm one, and I think that
if someone genuinely can't *implement* the spec because of semantic
uncertainties, there is a problem. [I am also clear, and have said so all
along, that many problems will *only* come to light when people try to
implement them.]. However, it's also important to realise that the spec is
written with very great care, very great precision and many sentences need
to be read very carefully and repeatedly. [In this alone I doubt that many
MCSGS can effectively understand all the concepts in the spec in less than
two weeks. And most DPHs and DumbXMLBrowserHackers (like me) will miss a
lot of the subtlety, through cursory reading.]
>>of the enormous number of constraints imposed both explicitly and
>>implicitly by the grammar (I could probably write a full SGML parser
>>with about the same level of effort, especially if I limited myself to
>>a single, simple SGML declaration).
I think the problems are different. SGML is complex, but precise. A year or
two back someone estimated on comp.text.sgml that SGML defined something
like 2^16 variants. I think that XML is one such variant, and one of the
simplest. Writing a full SGML parser is very hard, with the result that
very few complete standalone parsers were ever written. In one sense that
was very valuable because people like me would just run their document
through sgmls - if it crashed, the document was wrong. [I have no idea
whether there are parsers which take a semantically different view of 8879
from sgmls. However, even sgmls did not implement all the hairy options in
SGML, and many of these are not covered in many textbooks].
The XML process is very different. The syntax is trivial to write a parser
for. But the freedom of WF documents presents difficult and unresolved
problems of semantics. Therefore the time writing an XML parser is not in
coding the BNF, but worrying about what to do with the code. In particular
the question of 'validity' is fuzzy and crops up repeatedly. Where features
are optional in an XML document (e.g. the DOCTYPE statement) does its
*presence* (not its content) imply anything about how the software should
behave. I don't find this easy, but it's a very different sort of
difficulty from the difficulty of coding a validating algorithm for content
in full SGML.
[Tim's areas of difficulty]
>1. handling multiple input encodings, and
>2. making it run real fast while you're doing #1.
>
>These don't really bother me that much as we are in the infancy of
>learning what the right way is to build truly internationalized
>software; for example, I can parse the UTF16 Japanese version of the
>XML spec in a few seconds; then it takes the best part of a minute
>to load the .ttf for the Unicode font so you can look at anything;
>so we have a few problems in this area.
Because this is uncharted territory it's certain to throw up problems.
>
>Having said that, I am now in the middle of coding up validation for
>Lark, and there are a TREMENDOUS NUMBER of irritating little
^^^^^^^^^^^^^^^^^
Yup, yup, yup.
Each of this is 'small'. Let's assume that 95% of people agree with your
interpretation for each one in precise implementation (e.g. implementation
of Name), and let's assume that you have 20 such problems. 0.95^20 is 0.35;
so 35% of people will think that Lark is totally conforming and does
exactly what they want. This is a possibly naughty way of addressing the
problem, but it can only (IMO) be resolved by identifying those niggling
problems and agreeing communally either the 'right' way, or adding a switch
to the operation. Simply making personal decisions by each parser writer is
a guarantee that parsers will behave differently.
This is why JUMBO can use multiple parsers. DavidD suggested that it was
because they had bugs. In a sense that's exactly right ('features' is
probably more accurate). [It's also because no one has - yet - got a
complete Java implementation of a 'parser'.]
The thing that really frustrates me is that we lost the communal will to
create an API for parsers. Why, why, why - can't we do this?
I'm going to suggest a slightly revised approach. AElfred comes close to it.
I'll write another msg, rather than make this too long.
[...]
>
>Mind you, the validator is in a separate package and can be bypassed, so
>Lark effectively need be no larger. But still; I wonder if validation
>is intrinsically hard or we could have found a better 80/20 point? -Tim
You're going to find out whether it's hard when you try to implement it
:-). I have no idea whether it's *really hard*. I think I could do content
validation in a week on a desert island. I would probably use a completely
stupid approach.
However I have received a gift of a validator (not in Java, but many
thanks) and please keep them coming. We need more than one, precisely to
see whether we all agree :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 13 14:25:43 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:28 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <3.0.32.19971212170758.009ab470@pop.intergate.bc.ca>
Message-ID: <3.0.1.16.19971213151941.3f37ed14@pop3.demon.co.uk>
In case anyone has missed my postings over the last 10 months, I would like
an API for XML parsers :-). JUMBO has been interfaced to 3 publicly
available Java parsers (besides its Mus Michaelis one) and finds it
sufficiently hard grunt work adding more because of the inconsistency of
what's presented through the existing APIs. Note that all three parsers
(Lark0.97, NXP97-09, AElfred1.0beta) provide EventDriven interfaces. I have
not tried - and do not at present intend to try - to interface with someone
else's Tree or Grove model. [Lark builds a tree if required - the others
don't. NXP has classes for a CompleteGrove - I haven't used them.]
*** Please understand that any apparent frustration below is NOT criticism
of these three parsers and their authors - all of whom have made an
extremely important contribution. Nor is the omission of MSXML, tcl-based
parsers and JamesC's software anything than lack of time ***
It's also clear that none of the three allow me to get at all the
information in the document I want, though I think AElfred is almost there
[I haven't looked at the latest version.] Let's assume I want the Name in
the DOCTYPE [29] - the root elementType.
In Lark097:
public boolean doDoctype(Entity e, String rootType, String publicID, String
systemID);
OK - I can manage this, but I have no idea what the Entity class is in any
of Lark's calls. "Those names Element, Attribute and Entity are obvious in
their function.". This is just another example of my Dumbness, but it's a
reality. I don't have time to explore precisely what it is - and I can't
actually print it out.
In NXP97-09-05 (I think) I can grep and find (XML.java):
final public String doctypedecl();
Since the code is autogenerated by JACC I haven't the first idea what the
contents of the String are (I would have to experiment). If it goes by the
spec it's the whole String contents of all the subsets, I assume.
In AElfred1.0beta:
public abstract void doctypedecl(XmlParser parser, String name, String
pubid, String sysid);
This is fully documented in javadoc.
[Note: javadoc is free, comes with the system, is relatively easy to use
after you have fought the classpath and there is no good reason not to use
it.]
So three parsers, three quite different interfaces, three more midnight
hacks for JUMBO. I haven't looked at MSXML but I would be amazed if there
wasn't yetanotherinterface.
All of this makes JUMBO very tired.
There seem to be several reasons for this lethargy in producing an API -
we've been at this since February. Since there is relatively little
discussion I am guessing these reasons from "vibes". :-)
- it's too early to do anything - the language spec has only been
published this week.
- it's all in the spec - if you can't work out what to do properly that's
not our problem.
- a proper grove plan takes care of this. Anything simpler is inadequate.
- this will all be sorted out by the DOM, so let's do nothing until this
happens.
- parsers are unlikely to be interoperable anyway.
- this is an area which should be left to the software houses - the W3C is
primarily to develop markets for its members.
- it's in our interests to have non-interoperability because we'll protect
our markets that way.
- it's too difficult and I'm not paid to spend the time thinking about it.
So - as a first step - I make the following proposal and ask for
constructive comments. I am quite prepared to be shown it's shallow and
unworkable.
*Simple* Java interfaces are usually built by identifying the objects
involved and using a consistent style for naming objects, methods,
interfaces and related hooks. An example is Java Beans, where getXyz() and
setXyz() have semantics which the Beans reflection mechanism can identify.
The XML spec has very precise definitions of the components that are
required in an interface.
My proposal is simply that we should use these two approaches wherever
possible in naming classes and methods, and that we should list the
functions in the interface. That's all :-).
If I want the rootType of the document I refer to [29] and see that it is a
Name. Therefore I could do all I want with code like:
/** extract the string directly from the document [29] */
public String Document.getDoctypedeclName() OR:
/** or have a class for Doctypedecl [29] */
public Doctypedecl Document.getDocumentdecl();
public String Doctypedecl.getName();
To get the contentspec and default attribute value for the Bar attribute
name of the Element Foo: (note the differences in capitalisation of the
string 'decl' in the spec);
Enumeration elementdecls = Document.getElementdecls(); /*[29-30]*/
while (elementdecls.hasMoreElements()) {
Elementdecl elementdecl = (Elementdecl) elementdecls.nextElement();
if (elementdecl.getName().equals("Foo")) { /*[45]*/
String contentspec = elementdecl.getContentspec();
}
}
Enumeration attlistdecls = Document.getAttlistDecls(); /*[29, 30]*/
while (attlistdecls.hasMoreElements()) {
AttlistDecl attlistDecl = (AttlistDecl) attlistdecls.nextElement();
if (attlistDecl.getName().equals("Foo")) {
Vector attDefVector = attlistDecl.getAttdefs(); /*[52]*/
for (int i = 0; i < attDefVector.size(); i++) {
AttDef attDef = (AttDef) attdefVector.elementAt(i);
if (attDef.getName().equals("Bar")) { /*[53]*/
String value = attDef.getDefault(); /*[54]*/
}
}
}
}
If something is defined in the spec, it has a clear place where it is
defined, and a clear term. Why not use this? It should only take a few
hours to go through the 82 productions and decide which of them returned
anything useful (we are unlikely to require [26], for example :-); - many
productions are irrelevant to the parsed, normalised document. The
semantics are clear (at least as clear as the spec can provide), and can be
precisely pinpointed
We have to decide which components require classes and which are simply
Strings. In some cases capitalisation is a problem. Java strongly urges
initial caps so I would write:
public Prolog getProlog()/*[23]*/
(I am not sure whether there are name collisions separated only by case).
In some cases the names clash with existing java classes, so in [59] we
might have to write:
public jumbo.parser.Enumeration getEnumeration();
since there is a java.util.Enumeration.
In some cases there are repeatable values [e.g. [58] ] where we might need:
public String[] NotationType.getNames();
or we may choose to have Vector, etc.
The use of many classes might make the parsers too large or slow, so maybe
some other style might be useful.
This is simple, and is easy to implement. Dumb hackers like me can
understand it by reading the spec - they don't need to know about groves,
DOM or whatever. I expect that it's not comprehensive - there is no error
model for example - but I can't see much that I need from a document that
isn't in the spec. Anything else would be parser-specific flags, or perhaps
retrieval of unnormalised input.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sat Dec 13 15:45:47 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:28 2004
Subject: Classification: XML Parser Features
In-Reply-To: <3.0.1.16.19971213121125.3f37eb1e@pop3.demon.co.uk>
References: <3.0.32.19971212170758.009ab470@pop.intergate.bc.ca>
<3.0.1.16.19971213121125.3f37eb1e@pop3.demon.co.uk>
Message-ID: <199712131544.KAA00367@unready.microstar.com>
Peter Murray-Rust writes:
> - some people think processor and parser are synonyms.
> - some people think that a processor is a unit which contains
> a parser but has additional integrated facilities.
The problem is a misalignment in terminology. In SGML, an "SGML
application" is a DTD together with other support information (such as
documentation, conventions, etc.). And although the terms are not
formally defined, SGML people often use 'parser' to describe the
logical component that translates the external representation of a
document into some sort of abstract internal format, and 'processor'
(or 'processing software', or 'formatter', in some cases), to describe
the logical component that acts on the information delivered by the
parser.
In XML, the spec confusingly defines 'processor' to fill the same
logical role as 'parser' in normal SGML usage, and 'application' to
fill the same logical role as 'processor' or 'processing software' in
normal SGML usage. Of course, this confusion will exist only for
people who are already used to SGML.
I prefer 'parser', because it is at least unambiguous for both sides,
even if slightly unfamiliar for XML-only people; if I use 'processor',
I risk causing confusion for the sake of being strictly XML
conformant.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eliot at isogen.com Sat Dec 13 16:28:57 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 16:59:28 2004
Subject: XML software for Visual Basic
Message-ID: <3.0.32.19971213102401.00dac840@swbell.net>
At 01:42 PM 12/13/97 GMT, Robrecht Siera wrote:
>It is getting more and more interesting to start programming
>applications using XML. Until now Java gets the most attention to
>do this programming in (for obvious reasons).
>
>But to have some programming routines for Visual Basic would be very
>welcome also. Because we would like to develop a data management
>and data exchange application where XML is used as file format.
>
>Is anybody capable of developing such parser routines or API usable
>in Visual Basic ?
Part of the Jade package (James' DSSSL Engine) is the groveoa.dll, an OLE
Automation DLL that you can use easily with Visual Basic to operate on SGML
documents. I don't know if James has enabled the XML parsing mode that
he's putting into SP, but it probably wouldn't be too hard to hack it to do
it. The grove that groveoa.dll creates reflects the SGML property set as
defined in the DSSSL and HyTime standards, rather than the DOM design,
although the two designs are close enough that code developed for one
should be easily adapted to the other. I've created a little toy
application, GroveView, that demonstrates using the groveoa.dll. You can
find it at "http://www.isogen.com/demos/groveview.html". Source code is
available upon request. Jade is available a "http://www.jclark.com".
Cheers.
Eliot
--
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sat Dec 13 17:58:18 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:28 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <3.0.32.19971213095945.0079eb00@pop.intergate.bc.ca>
At 03:19 PM 13/12/97, Peter Murray-Rust wrote:
I agree with Peter that we should just buckle down and get on with what used
to be known as XAPI.
But my approach would be quite different. I think that the first step
should be the end-user's API, the kind of thing that someone using a SMIL
or RDF processor would need. Such a person really doesn't want to wrestle
with entities and references and PIs and marked sections; all they want
is elements and attributes and the basic doctype info; they want the
processor to deal with entities and refs and quote marks and white space in
markup and encodings and so on.
This would go a long way to address the whinings of the RDF & SMIL type
people, who thought XML just meant elements and attributes. I think that
from their point if view, it should be, all the other stuff in the syntax
is strictly to support authoring and management convenience.
It should come in event-stream flavor and tree flavor.
Minimal event stream API:
1. Doctype, returns: root type, external subset system/public idents
2. Element start, returns: type, element name-value pairs, whether it's empty
3. Text
4. End Element, returns: type
Minimal tree API:
1. Document, with methods: root type, system ID, public ID, root element
2. Element, with methods: parent, children, attributeValueByName, allAttributes
3. Attribute, with methods: name, value
4. Text (presumably hiding lazy evaluation)
I acknowledge this is grossly insufficient for basing an editor on. You want
that, use the DOM. Only a few choices have design implications:
1. How are children returned; possibilities would be to have Element and
Text crammed into the same class with a method for asking which is which,
or have separate Text and Element classes, then children returns an Object
array or a Vector, and you can find out what kind of child each member
is using the instanceof operator. I favor the latter, Lark does this
2. Whether it's worthwhile putting children into, as opposed to a native
array or Vector, a special ChildList class with enumerator and indexing
so you can hide a lazy-evaluation behind it. I favor the latter, the
DOM does this but Lark doesn't.
3. Whether the processor should be required to coalesce adjacent Text
objects. Suppose you have foo bar &ref; baz ,
it's immensely less work if the processor can give this to the app
as 4 Text chunks. I think most of the processors do this now.
If I formalized and published this, it would look a lot like part of
Lark's interface, but I bet all the other parsers could implement it.
Should I? -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sat Dec 13 18:46:16 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:28 2004
Subject: Error Reporting: XML vs ISO 8879
Message-ID: <199712131844.NAA00394@unready.microstar.com>
This has been a fascinating discussion on what XML conformance means
in an XML processor -- I think that it has helped people like me (who
are not in the SIG or in the WG) to understand more of the WG's
reasoning on the very strict rules for XML conformance.
SGML PARALLELS
--------------
I recognise James's concern that explicitly allowing
non-error-reporting XML processors could cause non-conforming variants
of XML to become common -- given the unfortunate history of HTML, I am
not prepared to dismiss that concern lightly. It is surprising,
however, that although some proponents (not James) claim XML as "a
simplified form of SGML," XML is actually much more rigid than full
SGML on this point. Let me quote from a (non-normative) note to the
SGML standard, ISO 8879:1986, clause 15.4:
NOTE -- A conforming SGML system need not have a validating SGML
parser. Implementors can therefore decide whether to incur the
overhead of validation in a given system. A user whose text editing
system allowed the validation and correction of SGML documents, for
example, would not require the validation process to be repeated
when the documents are processed by a formatting system.
In other words, if I have read the standard correctly (something that
all of us fail to do at times), full SGML allows parsers that do not
report errors, but XML does not.
It is ironic that we can call PSGML a "conforming, non-validating"
SGML editor, but that we must call it a "non-conforming" XML editor
(even with my XML patches).
CODE SIZE AND THE INTERNET
--------------------------
This inflexibility on XML's part is especially surprising given that
XML is designed for the Internet, where code size (whether for Java
applets or ActiveX controls) is _much_ more critical than it is in a
closed system.
Imagine a Java programmer who has just written a 100K applet, and is
considering adding XML support as an extra feature. I am concerned
that we could not convince that programmer to add even a 24K XML
parser like Ælfred (especially after she's spent three weeks
optimising for size); we certainly will not convince her to add 50K or
100K of class files for a full error-reporting XML parser, doubling
the size of the applet. As it stands, however, her applet will be
non-conforming unless it uses a conforming parser, so strictly
speaking, the programmer will not be able to claim XML support if she
uses a smaller XML parser like Ælfred.
Ideally, I'd like to get Ælfred to under 10K to help with acceptance
in the Java community; practically, I'll be thrilled if I can get it
down to under 20K. I cannot justify bloating it to 40K or 50K.
PRAGMATISM AND DEVIANT BEHAVIOUR
--------------------------------
The strongest argument, however, comes from pragmatism. A W3C
recommendation has relatively little moral force compared even to an
IETF RFC, much less an International Standard, so if conformance is
too difficult, most people just won't bother conforming (look at some
of the widely-ignored HTML drafts that have come out).
It makes sense, then, for XML to try to channel and regulate deviant
behaviour rather than simply looking away and denying its existence.
Instead of declaring every simple, non-error-reporting processor
"non-conforming" (and thus, not regulating it at all), why not define
a standard behaviour for those parsers as well, and create standard
terms for labelling them? At least then, people will know what
they're getting.
GUARDING THE GRAIL
------------------
Like a former rebel who has just found a job, bought a house, or
become a new parent, the XML WG now has something to protect, and they
are naturally adapting precisely the conservatism that a vocal
minority of XML supporters used to attack in the SGML establishment
(and sometimes, as in the case of error-reporting, they have outdone
the SGML community in their conservatism).
This is a normal and expected development, but I expect that
privately, at least, some of the original XML evangelists must be
starting to look more sympathetically at what they used to consider
unnecessary rigidity and purism in the SGML community.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Sat Dec 13 18:57:49 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 16:59:29 2004
Subject: OFE
Message-ID: <199712131857.SAA31062@GPO.iol.ie>
Does anyone happen to know if OFE-Open Financial Exchange (currently SGML)
will be XML in the future?
I cannot find anything in the OFE spec or websites about it yet it is
linked to from a number of XML resource pages.
Sean Mc Grath
sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 13 19:18:11 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:29 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <3.0.32.19971213095945.0079eb00@pop.intergate.bc.ca>
Message-ID: <3.0.1.16.19971213193641.3af79528@pop3.demon.co.uk>
At 09:59 13/12/97 -0800, Tim Bray wrote:
>At 03:19 PM 13/12/97, Peter Murray-Rust wrote:
>
>I agree with Peter that we should just buckle down and get on with what used
>to be known as XAPI.
>
>But my approach would be quite different. I think that the first step
I'm missing something :-) Your approach below seems almost identical to
what I was suggesting.
>should be the end-user's API, the kind of thing that someone using a SMIL
>or RDF processor would need. Such a person really doesn't want to wrestle
>with entities and references and PIs and marked sections; all they want
Agreed - and they wouldn't be in what I wanted as well.
>is elements and attributes and the basic doctype info; they want the
yup
yup
yup
>processor to deal with entities and refs and quote marks and white space in
>markup and encodings and so on.
>
>This would go a long way to address the whinings of the RDF & SMIL type
>people, who thought XML just meant elements and attributes. I think that
>from their point if view, it should be, all the other stuff in the syntax
>is strictly to support authoring and management convenience.
>
>It should come in event-stream flavor and tree flavor.
>
>Minimal event stream API:
>
>1. Doctype, returns: root type, external subset system/public idents
I would like the elements as well. If the parser doesn't do them, we just
return null. But if it does...
>2. Element start, returns: type, element name-value pairs, whether it's empty
is "type" the elementType? This is the sort of terminological problem we
have.
>3. Text
>4. End Element, returns: type
>
>Minimal tree API:
>
>1. Document, with methods: root type, system ID, public ID, root element
>2. Element, with methods: parent, children, attributeValueByName,
allAttributes
>3. Attribute, with methods: name, value
>4. Text (presumably hiding lazy evaluation)
Sounds OK.
>
>I acknowledge this is grossly insufficient for basing an editor on. You want
I don't want much for an editor. Just the attribute stuff and contentspec.
I don't want PE's, comments, marked sections and so on.
>that, use the DOM. Only a few choices have design implications:
>
>1. How are children returned; possibilities would be to have Element and
> Text crammed into the same class with a method for asking which is which,
> or have separate Text and Element classes, then children returns an Object
> array or a Vector, and you can find out what kind of child each member
> is using the instanceof operator. I favor the latter, Lark does this
I'm easy - **as long as we all agree**
>
>2. Whether it's worthwhile putting children into, as opposed to a native
> array or Vector, a special ChildList class with enumerator and indexing
> so you can hide a lazy-evaluation behind it. I favor the latter, the
which is 'the latter'? :-)
> DOM does this but Lark doesn't.
>
>3. Whether the processor should be required to coalesce adjacent Text
> objects. Suppose you have foo bar &ref; baz ,
> it's immensely less work if the processor can give this to the app
> as 4 Text chunks. I think most of the processors do this now.
I don't have a problem here...
>
>If I formalized and published this, it would look a lot like part of
>Lark's interface, but I bet all the other parsers could implement it.
>Should I? -Tim
I bet they could. It is very important, however, that everyone agrees on
the terminology.
I have never seen this as a difficult problem. I think it would take a week
to come up with a reasonable working draft. I hope that XML-DEVers will see
the value of a simple interface and not - as has happened before - keep
getting more and more complex. the three parsers we have are simple - it's
a slightly depressing situation that we haven't got an interface for them
to use.
I suggest that Tim goes ahead, but I'll also produce my interface from the
spec. After all, that will show what the *consumer* (i.e. JUMBO) would
like. As always I shall be happy to junk anything I do if it helps us make
progress :-)
It might also be useful for us to set ourselves a deadline.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sat Dec 13 20:30:28 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:29 2004
Subject: OFE
Message-ID: <3.0.32.19971213122935.009a1950@pop.intergate.bc.ca>
At 07:27 PM 13/12/97 +0000, Sean Mc Grath wrote:
>Does anyone happen to know if OFE-Open Financial Exchange (currently SGML)
>will be XML in the future?
It's normally acronymed OFX I think. Good question. I believe there have
public statements of intent to go XML, I'd think that Microsoft would be
in the leadership position on this one. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Sat Dec 13 22:05:36 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:29 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <005801bd0812$a1af84b0$0100007f@localhost>
Tim and Peter,
From: Tim Bray
>It should come in event-stream flavor and tree flavor.
>
>Minimal event stream API:
>
>1. Doctype, returns: root type, external subset system/public idents
>2. Element start, returns: type, element name-value pairs, whether it's
empty
>3. Text
>4. End Element, returns: type
>
>Minimal tree API:
>
>1. Document, with methods: root type, system ID, public ID, root element
>2. Element, with methods: parent, children, attributeValueByName,
allAttributes
>3. Attribute, with methods: name, value
>4. Text (presumably hiding lazy evaluation)
IMHO, it would be major mistake to combine XML parser client API and service
provider API. I would much rather see something like Swing's TreeModel
interface used as XML parser service provider API with opaque objects.
public interface XmlTreeModel {
public Object getRoot ();
public Object getParent (Object child);
...
}
public interface XmlEventModel {
public String getElementName (Object event);
...
}
public interface XmlEventProducer {
public void addConsumer (XmlEventConsumer c);
public void removeConsumer (XmlEventConsumer c);
...
}
public interface XmlEventConsumer {
public void elementStarted (XmlElementEvent evt);
public void elementEnd ed (XmlElementEvent evt);
...
}
XmlEvent is part of the client API which is mostly convenience class
framework:
public class XmlEvent extends EventObject {
protected XmlEventModel model;
protected Object object;
...
}
public class XmlElementEvent extends XmlEvent {
public String getElementName () {
return model.getElementName(object);
...
}
>I acknowledge this is grossly insufficient for basing an editor on. You
want
>that, use the DOM. Only a few choices have design implications:
I think editing should be supported with another layer of interfaces so that
basic interface can remain simpler.
public interface MutableXmlTreeModel {
public Object newElement (String name, ...);
public void addAttribute(Object elem, String name, String value);
...
}
XML parser service provider API is mostly just interfaces and deals with
opaque objects returned by XML parser implementations. XML parser client
API consists of DOM classes uses opaque objects to drive parsers
implementations (see XmlElementEvent above).
Don Park
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sat Dec 13 22:45:50 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:29 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <3.0.32.19971213144717.009ba3f0@pop.intergate.bc.ca>
At 02:00 PM 13/12/97 -0800, Don Park wrote:
>IMHO, it would be major mistake to combine XML parser client API and service
>provider API. I would much rather see something like Swing's TreeModel
>interface used as XML parser service provider API with opaque objects.
Hmm, your proposal is coherent, but why is it better? It's certainly a
bit more complex than what I proposed, and I'd need to see evidence that
my proposal fails to meet the needs of the basic application programmer.
One of the things I did with Lark was hook it up to the Swing Tree Renderer/
JTree package, got a nice little XML document tree-walker, even works with
Unicode fonts; I only needed calls like the ones I outlined and it was
no big deal. - Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sat Dec 13 22:55:26 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:29 2004
Subject: XML vs the Dreaded Whitespace
Message-ID: <3.0.32.19971213145704.0096f9d0@pop.intergate.bc.ca>
At 03:00 AM 11/12/97 -0500, Chris Smith wrote:
>Part of this work requires that these documents carry document
>authentication information. This, in turn, requires that some regions
>of an XML document must be transported *exactly*, and must be received
>and checked identically so that the message authentication actually
>works. That fact that we are considering the idea of including email
>as a transport mechanism doesn't help matters.
So your proposal is:
(1) transcode into UTF-16 if necessary
(2) digitally sign what you get after (1).
I think this is a sensible way to go. Obviously, there are
anomalies;
will not be the same as
which is surprising, but trying to find solutions may well not be
cost-effective.
You *might* want to consider losing the prologue and start checking
just at the root element.
You *might* want to consider normalizing namespace prefixes.
You *might* want to normalize whitespace in markup.
You *might*, etc etc etc etc; unless you are willing to commit to
a full grove/propert-set model a la SGML's extended facilities, you
may well be better off signing the instance as it sits.
In particular, I think there are lots of things that would be easier
and less trouble-prone to work around than line-breaking, which is well
known to be highly error-prone. For example, in the line-break HERE->
how many space characters that you can't see follow the ">"?
There might be a useful halfway point as follows; run it through an
XML processor and sign just the combination of element type, attribute
name-value pairs, and textual content that the processor emits; this
allows you to finesse a lot of quoting/white-space/line-end issues;
also it allows authors to use tricks like default attributes and
internal entities that don't "really" change the content.
On the other hand, I'd say that off the top, just digitally signing the
UTF-i-fied characters as they sit is a reasonable way to go. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sat Dec 13 23:00:16 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:29 2004
Subject: XML Architectural Forms
Message-ID: <199712132258.RAA00384@unready.microstar.com>
I don't remember seeing an announcement here (apologies if I'm
mistaken), but Eliot Kimber and James Clark have announced on
comp.text.sgml a proposed ammendment to ISO 10744 that will make it
possible to use Architectural Forms in XML. You can find the text of
the ammendment at the following URL:
http://www.ornl.gov/sgml/wg8/document/1957.htm
Here's Eliot's example of a simple, well-formed XML document that uses
the base architecture "isobase":
This is very exciting, because if accepted, the ammendment will make
it possible to solve the XML namespace problem with an International
Standard, instead of forcing the W3C to throw together a consortium
standard. Base architectures also provide a simple and elegant
solution to multiple inheritance; for example, here's Eliot's example
modified to implement _two_ base architectures:
The element corresponds to in the isobase
namespace and to in the mslbase namespace at the
same time.
Even more interesting is the ability to embed the architectural
attributes in a DTD, so that they do not appear in the document
instance at all. For example, you can create an external DTD like
this:
Now, every XML document that uses this DTD will implement the two
architectures automatically, with no additional markup required:
Authors won't even have to know that they're using architectural
forms.
Congratulations are due to Eliot and James for taking the time to
start this process.
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 13 23:27:09 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:29 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <005801bd0812$a1af84b0$0100007f@localhost>
Message-ID: <3.0.1.16.19971214000336.3af777c6@pop3.demon.co.uk>
I am listing the main calls from Lark and AElfred that I find useful. As
you can see there is a great similarity - I confess that I find the AElfred
ones slightly easier to understand.
I suggest that Tim, David, Norbert if he's free, me and *anyone else who
has written a java parser* decide on a synthesis of this lot. I think
everyone has to be slightly flexible. I if I were to suggest, I like the
AElfred model for accessing the DOCTYPE stuff - its simple and fairly close
to the spec. I'd change the names where possible to be spec-compliant. I
think Lark may have more precision on Entities.
There is nothing difficult here - we don't need anything more - we just
need to do it. I don't see why we can't iterate on these and come up with
something in a week.
I will undertake to hack JUMBO do it uses the resultant interface by choice.
Let's get our act together!
P.
AElfred - document instance related stuff
attribute(XmlParser, String, String, boolean)
data(XmlParser, String)
doctypeDecl(XmlParser, String, String, String)
error(XmlParser, String, String, String, URL, int)
processingInstruction(XmlParser, String, String)
resolveEntity(XmlParser, String, String, URL)
startDocument(XmlParser, String, URL)
endDocument(XmlParser, int)
startElement(XmlParser, String)
endElement(XmlParser, String)
XmlParser()
XmlParser(String, URL)
------
Lark
public boolean doAttlist(Entity e, Object[] parts)
public boolean doDoctype(Entity e, String rootType,
String publicID, String systemID)
public boolean doEntityReference(Entity e, String name)
public boolean doETag(Entity e, Element element)
public boolean doInternalEntity(Entity e, String name, char[] value)
public boolean doPI(Entity e, String PI)
public boolean doSTag(Entity e, Element element)
public boolean doSyntaxError(Entity e, String message, int c)
public boolean doSystemBinaryEntity(Entity e, String name,
String extID, String notation)
public boolean doSystemTextEntity(Entity e, String name, String extID)
public boolean doText(Entity ent, Element el, char[] text, int length)
public boolean doWarning(Entity e, String message)
public Element element()
public class Attribute
public Attribute(String name, String value)
public Attribute(String name, Text text)
public String name()
public void setName(String name)
public String value()
public void setValue(String value)
public void setValue(Text text)
}
public class Element
public String type();
public Attribute[] allAttributes()
public void setAllAttributes(Attribute[] attributes)
public Attribute attribute(String name)
public void setAttribute(String name, String value)
public Vector children()
public Element parent()
}
class Text
public void addSegment(Segment segment)
public Vector segments() { return mSegments; }
public String string()
}
----------------------
AElfred - DTD related stuff
declaredAttributes(String)
declaredElements()
declaredEntities()
declaredNotations()
getAttributeDefaultValue(String, String)
getAttributeDefaultValueType(String, String)
getAttributeEnumeration(String, String)
getAttributeExpandedValue(String, String)
getAttributeType(String, String)
getElementContentModel(String)
getElementContentType(String)
getEntityNotationName(String)
getEntityPublicId(String)
getEntitySystemId(String)
getEntityType(String)
getEntityValue(String)
getNotationPublicId(String)
getNotationSystemId(String)
getProcessor()
getPublicId()
getSystemId()
run()
run(XmlProcessor)
setProcessor(XmlProcessor)
setPublicId(String)
setSystemId(URL)
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 13 23:32:03 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:29 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <005801bd0812$a1af84b0$0100007f@localhost>
Message-ID: <3.0.1.16.19971214001236.5b0f1eac@pop3.demon.co.uk>
At 14:00 13/12/97 -0800, Don Park wrote:
>Tim and Peter,
[...]
>
>IMHO, it would be major mistake to combine XML parser client API and service
>provider API. I would much rather see something like Swing's TreeModel
>interface used as XML parser service provider API with opaque objects.
I think it's clear that we are not going to see just one API. Your
suggestion, the grove plan, Xapi-J are all viable ways forward. The point
is that Tim, DavidM, Norbert and I have all - independently - come up with
fairly simple models for APIs which have a large degree of communality.
They have the merit of being fairly simple for newcomers. None are required
to be tree-structured.
>
>public interface XmlTreeModel {
> public Object getRoot ();
> public Object getParent (Object child);
> ...
>}
>
>public interface XmlEventModel {
> public String getElementName (Object event);
> ...
>}
>
>public interface XmlEventProducer {
> public void addConsumer (XmlEventConsumer c);
> public void removeConsumer (XmlEventConsumer c);
> ...
>}
>
>public interface XmlEventConsumer {
> public void elementStarted (XmlElementEvent evt);
> public void elementEnd ed (XmlElementEvent evt);
> ...
I have looked at TreeModel in Swing and even implemented a simple JUMBO
display on it. I have to confess that, being a Dumb Browser Hacker, I found
it quite tough going. If the only interfaces to XML parsers are based on
this level of abstraction a lot of people will find them hard.
WE have been part way down this road before - look through XML-DEV
discussions 6+ months ago. I think it's essential we home in on a
moderately simple parser NOW - we know what we need to do - we simply need
to agree on the precise components and the terminology.
[...]
>
>>I acknowledge this is grossly insufficient for basing an editor on. You
>want
>>that, use the DOM. Only a few choices have design implications:
>
All I want is to get the DOCTYPE stuff from the file. AElfred now provides
exactly what I want - we just need to agree it.
>
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sun Dec 14 00:05:36 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:29 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <3.0.32.19971213160700.00970410@pop.intergate.bc.ca>
At 12:03 AM 14/12/97, Peter Murray-Rust wrote:
>I am listing the main calls from Lark and AElfred that I find useful. As
>you can see there is a great similarity - I confess that I find the AElfred
>ones slightly easier to understand.
OK, let's get concrete. I think that the AElfred callbacks each having
an XMLParser argument is a good idea. Also AElfred's names are better,
the "Do*" prefix in Lark is silly. So on the event-stream stuff, I'd
go with the AElfred model modulo the following changes:
> attribute(XmlParser, String, String, boolean)
It seems completely wrong to have an attribute event separate from
start-element events. To start with, it suggests that the order of
attributes is significant, which it is incorrect. Secondly, since much
element-specific processing depends on what attributes are there, it is
less convenient for the application programmer. Third, if the processor
(as it must) does defaulting, he's going to have to do some attribute
list wrangling anyhow, so it can't really be extra work.
What's the boolean? I don't think the application author should
to have to deal with anything but the name and value of attributes.
Anyhow, I'd go with
startElement(XmlParser processor, String type, Attribute[] attributes);
and lose the attribute() method.
> data(XmlParser, String)
I feel that the 2nd argument should not be a String. It is a recipe
for disastrous inefficiency if the processor has to cook up a
java.lang.String object for every little chunk of text. Lark uses two
arguments, a char[] array and a character count; the app can
make a String if it needs to. If you find this awkward, create
a new data type called Text so that if you need a String you
can make it with lazy-evaluation in Text.toString(), but if you
don't need it you don't build it.
Also, it shouldn't be named "data" - it should be named
characterData or charData or text or some such term that can
be mapped directly to the spec.
> resolveEntity(XmlParser, String, String, URL)
I don't think entities have any place in the first cut of this
interface. The processor exists to make these problems go away.
Generalities:
Lark has a thing where if any callback returns 'true', the
parser drops out of its loop... which is awfully useful and easy
I think. Lark will also re-enter, but this need not be a requirement.
Also, for application programmers, especially dealing with smallish
objects, a tree interface is very natural. I've written both
event-stream and tree apps using Lark, and the trees are a lot
easier to use for anything even moderately complex. So the API
should have Element, Attribute, and Text classes.
And it shouldn't (sorry Peter) be called YAXPAPI - how about SAX, Simple
API for XML? Maybe SAX-J for the Java bindings. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Sun Dec 14 00:09:37 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <000f01bd0824$0c981420$0100007f@localhost>
>Hmm, your proposal is coherent, but why is it better? It's certainly a
>bit more complex than what I proposed, and I'd need to see evidence that
>my proposal fails to meet the needs of the basic application programmer.
>From the parser writer's point of view, they would rather create their own
object model than changing the code to produce W3C DOM objects which will be
incompatible with the version deployed in IE 4.0. My proposal allows
parsers like MSXML to remain unchanged and still support W3C DOM.
Furthermore it allows application programmers to access MSXML objects for
features not supported by W3C DOM.
public class XmlObject {
Object peer;
public Object getPeer () { return peer; }
}
public class XmlDocument extends XmlObject { ... }
XmlDocument obj;
Object peer = obj.getPeer();
if (peer instanceof com.ms.xml.om.Document) {
com.ms.xml.om.Document elem = (com.ms.xml.om.Document)peer;
elem.setOutputStyle(XMLOutputStream.PRETTY);
...
My proposal makes it easier for parser writers to support the standard API
and it does not limit applications programmers to the functionalities in the
standard API. I have designed object-oriented software for fifteen years
and I have learned from past mistakes that, while what I propose might seem
more complex, it will meet the harsh reality of the marketplace better.
>One of the things I did with Lark was hook it up to the Swing Tree
Renderer/
>JTree package, got a nice little XML document tree-walker, even works with
>Unicode fonts; I only needed calls like the ones I outlined and it was
>no big deal. - Tim
The reason I mentioned Swing's TreeModel was to point out the way it allows
any tree structure to be used as model for JTree. It is true that you can
use JTree's default model but then you end up with two models: XML document
tree and JTree's default model tree which is resource intensive.
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Sun Dec 14 00:42:32 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <002201bd0828$a85d4200$0100007f@localhost>
Peter,
>I think it's clear that we are not going to see just one API. Your
>suggestion, the grove plan, Xapi-J are all viable ways forward. The point
>is that Tim, DavidM, Norbert and I have all - independently - come up with
>fairly simple models for APIs which have a large degree of communality.
>They have the merit of being fairly simple for newcomers. None are required
>to be tree-structured.
First, I do not see the need for simple API. Having a simple API now will
definitely help control propliferation of proprietary XML parser API but, in
the long run, it will restrict application programmers to the set of
functionalities supported by the simple API.
Second, the cat is already out of the bag. For example, MSXML is already in
IE 4.0 and it is being used by JScript and Java applet programmers.
>I have looked at TreeModel in Swing and even implemented a simple JUMBO
>display on it. I have to confess that, being a Dumb Browser Hacker, I found
>it quite tough going. If the only interfaces to XML parsers are based on
>this level of abstraction a lot of people will find them hard.
My proposal was mainly for the parser writers and not the application
writers. Application writers will not be using XmlTreeModel but DOM
objects. My point was that interfaces like XmlTreeModel should be used to
write DOM framework so that the framework can support all existing and
future XML parsers.
>WE have been part way down this road before - look through XML-DEV
>discussions 6+ months ago. I think it's essential we home in on a
>moderately simple parser NOW - we know what we need to do - we simply need
>to agree on the precise components and the terminology.
I was not here 6+ months ago and I do not believe that just because there
has been previous discussions makes my proposal any less worthy. Frankly, I
am disappointed by the fact that there was no immediate understanding of the
advantages my proposal offers. It is partly my fault since I am pretty bad
at explaining things. However, I am disturbed that, while there is a wealth
of SGML and XML knowledge present in this mailing list, there seem to be a
lack of object-oriented design knowledge. I do not say this insultingly but
with concern. I appologize if anyone took my opinion negatively.
>All I want is to get the DOCTYPE stuff from the file. AElfred now provides
>exactly what I want - we just need to agree it.
All one wants is not necessarily what everyone wants and will want. Design
of a standard API should be approached more carefully and with future in
mind.
I am sorry if my comments upset you in anyway. It was not my intention.
Sincerely,
Don Park
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sun Dec 14 00:59:06 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <3.0.32.19971213170037.009997e0@pop.intergate.bc.ca>
At 04:38 PM 13/12/97 -0800, Don Park wrote:
>First, I do not see the need for simple API.
That's where we part company. Generations of hypertext theorists
saw no need for anything as simple as HTML/HTTP, then generations of
SGML implementors saw no need for anything as simple as XML. I
agree that in the general case, you need something quite a bit more
sophisticated than what we're proposing; that's what the DOM is for.
We're getting a lot of static in the XML project from people who feel
that XML is already too complicated and they want to see elements
'n' attributes and that's all they want to see. I happen to think
they're right; when I'm writing XML apps, that's all I care about 99%
of the time.
So why not create a simple API that will give them what they want?
I should point out that what we're talking about could be implemented
on top of the DOM in about 15 minutes. And on top of the MS IE4
machinery.
And as for those who are currently tying themselves to Microsoft's
proprietary interfaces, especially given that Microsoft is saying in
public that they plan on full DOM compatibility (even if at the same time
they are encouraging everyone to starting using "Dynamic HTML" right now)
they'll get what they deserve. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sun Dec 14 02:03:00 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <3.0.32.19971213160700.00970410@pop.intergate.bc.ca>
References: <3.0.32.19971213160700.00970410@pop.intergate.bc.ca>
Message-ID: <199712140201.VAA00351@unready.microstar.com>
Tim Bray writes:
> > attribute(XmlParser, String, String, boolean)
>
> It seems completely wrong to have an attribute event separate from
> start-element events.
I have worried about this myself. My design goal with ?lfred has been
to limit myself to two class files: one for the parser itself, and one
for the interface for the callbacks -- hence the separate event for
attributes. This decision has forced some pretty severely hacked-up
internal code accompanied by very careful documentation.
I could send a hashtable of attribute names and values with the
startElement() callback, and let users look up types (etc.) with my
query methods, but I would have to lose a bit on two counts:
1) Allocating a new hashtable for every start tag will slow down the
parser a fair bit.
2) I'd have no way to show which attributes were specified and which
were defaulted (see below).
> What's the boolean? I don't think the application author should
> to have to deal with anything but the name and value of attributes.
The boolean tells whether the attribute was specified or defaulted. I
include this to allow people to do useful XML-to-XML transformations.
> > data(XmlParser, String)
>
> I feel that the 2nd argument should not be a String. It is a recipe
> for disastrous inefficiency if the processor has to cook up a
> java.lang.String object for every little chunk of text.
The overhead isn't that bad with ?lfred because I coalesce my data
into the largest chunks possible before allocating the String. I
think that returning a char[] array would be confusing for users, and
would lead to many bugs in their code as they ignored our warnings not
to rely on the value in the char[] array outlasting the callback.
> Lark uses two
> arguments, a char[] array and a character count; the app can
> make a String if it needs to. If you find this awkward, create
> a new data type called Text so that if you need a String you
> can make it with lazy-evaluation in Text.toString(), but if you
> don't need it you don't build it.
Again, I'm reluctant to create new classes beyond XmlParser and
XmlProcessor.
> Also, it shouldn't be named "data" - it should be named
> characterData or charData or text or some such term that can
> be mapped directly to the spec.
Agreed. I will not change ?lfred now, but I think that this is a good
idea.
> > resolveEntity(XmlParser, String, String, URL)
>
> I don't think entities have any place in the first cut of this
> interface. The processor exists to make these problems go away.
Normally, you should just return the URL argument; however, this
callback gives users a chance to do public-identifier resolution, URL
substitution, etc., and to return a different URL if desired. For
example, if we had a DTD at
http://www.microstar.com/XML/msldoc.dtd
and you had a local copy, you could substitute a local URL on your own
computer. Likewise, you could do a catalogue lookup on the public
identifier "-//microstar//DTD Microstar Sample Document//EN" and
choose a different system identifier than the default supplied in the
document.
That said, I agree that this probably doesn't belong in the common
event API.
> Generalities:
> Lark has a thing where if any callback returns 'true', the
> parser drops out of its loop... which is awfully useful and easy
> I think. Lark will also re-enter, but this need not be a requirement.
Awfully easy with a DFA-driven parser, but trickier with a
recursive-descent parser like ?lfred. I'd probably have to throw an
exception, and could not allow any kind of re-entry.
> Also, for application programmers, especially dealing with smallish
> objects, a tree interface is very natural. I've written both
> event-stream and tree apps using Lark, and the trees are a lot
> easier to use for anything even moderately complex. So the API
> should have Element, Attribute, and Text classes.
Perhaps -- I may have to give in an allow ?lfred to use more than one
class file; or alternatively, these would be an optional extra, along
with the SAX-J layer.
> And it shouldn't (sorry Peter) be called YAXPAPI - how about SAX, Simple
> API for XML? Maybe SAX-J for the Java bindings. -Tim
How about RUSTY?
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sun Dec 14 02:21:24 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <3.0.32.19971213182146.0095b780@pop.intergate.bc.ca>
At 09:01 PM 13/12/97 -0500, David Megginson wrote:
>I have worried about this myself. My design goal with ?lfred has been
>to limit myself to two class files: one for the parser itself, and one
>for the interface for the callbacks -- hence the separate event for
>attributes. This decision has forced some pretty severely hacked-up
>internal code accompanied by very careful documentation.
Hmm, isn't this what JAR and so on are for? Seems like an awfully
severe design constraint. I certainly agree with "small" as a design
goal, but it seems like limiting class file count carries a pretty
high price. - Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Sun Dec 14 02:39:32 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <3.0.32.19971213184107.009a3d60@pop.intergate.bc.ca>
At 09:01 PM 13/12/97 -0500, David Megginson wrote:
> > What's the boolean? I don't think the application author should
> > to have to deal with anything but the name and value of attributes.
>The boolean tells whether the attribute was specified or defaulted. I
>include this to allow people to do useful XML-to-XML transformations.
No. Not of interest to people who just want to see elements and
attributes. The whole point of using an XML processor is that it
takes care of these details for the application programmer. Leave it
out for now. If you want XML-to-XML you need a lot more, go use the
DOM.
> > > data(XmlParser, String)
> > I feel that the 2nd argument should not be a String. It is a recipe
> > for disastrous inefficiency if the processor has to cook up a
> > java.lang.String object for every little chunk of text.
>
>The overhead isn't that bad with ?lfred because I coalesce my data
>into the largest chunks possible before allocating the String. I
>think that returning a char[] array would be confusing for users
that's a fair point; the correct solution per design principles
is to have a Text class that could give you a String if you
asked it; since many applications will ignore the comment of many
elements, it seems vital not to have an interface that makes
lazy evaluation impossible. So I think you have to go for either
the char[] trick or another class.
> > Lark has a thing where if any callback returns 'true', the
> > parser drops out of its loop... which is awfully useful and easy
> > I think. Lark will also re-enter, but this need not be a requirement.
>
>Awfully easy with a DFA-driven parser, but trickier with a
>recursive-descent parser like ?lfred.
But it seems completely unreasonable, if I call the parser mainline,
not to have a way to get control back. I guess you could get the
client callback to throw an exception... blecch. If exceptions
are going to be thrown, it's better to hide all this stuff within
the processor and not make each application do it. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Dec 14 07:34:39 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <3.0.32.19971213160700.00970410@pop.intergate.bc.ca>
Message-ID: <3.0.1.16.19971214083114.5b3f006e@pop3.demon.co.uk>
At 16:07 13/12/97 -0800, Tim Bray wrote:
>At 12:03 AM 14/12/97, Peter Murray-Rust wrote:
>>I am listing the main calls from Lark and AElfred that I find useful. As
>>you can see there is a great similarity - I confess that I find the AElfred
>>ones slightly easier to understand.
>
>OK, let's get concrete. I think that the AElfred callbacks each having
>an XMLParser argument is a good idea. Also AElfred's names are better,
>the "Do*" prefix in Lark is silly. So on the event-stream stuff, I'd
>go with the AElfred model modulo the following changes:
This seems eminently reasonable - if DavidM is listening I suggest we can
get this sorted very quickly.
>
>> attribute(XmlParser, String, String, boolean)
>
>It seems completely wrong to have an attribute event separate from
>start-element events. To start with, it suggests that the order of
>attributes is significant, which it is incorrect. Secondly, since much
>element-specific processing depends on what attributes are there, it is
>less convenient for the application programmer. Third, if the processor
>(as it must) does defaulting, he's going to have to do some attribute
>list wrangling anyhow, so it can't really be extra work.
I cut the documentation out to save space on the list.
boolean isSpecified
(although this doesn't match with the documentation for the Parameters,
David...)
>
>What's the boolean? I don't think the application author should
>to have to deal with anything but the name and value of attributes.
>
>Anyhow, I'd go with
>
>startElement(XmlParser processor, String type, Attribute[] attributes);
So would I.
>
>and lose the attribute() method.
>
>> data(XmlParser, String)
>
>I feel that the 2nd argument should not be a String. It is a recipe
>for disastrous inefficiency if the processor has to cook up a
>java.lang.String object for every little chunk of text. Lark uses two
>arguments, a char[] array and a character count; the app can
>make a String if it needs to. If you find this awkward, create
>a new data type called Text so that if you need a String you
>can make it with lazy-evaluation in Text.toString(), but if you
>don't need it you don't build it.
Seems reasonable.
>
>Also, it shouldn't be named "data" - it should be named
>characterData or charData or text or some such term that can
>be mapped directly to the spec.
>
>> resolveEntity(XmlParser, String, String, URL)
>
>I don't think entities have any place in the first cut of this
>interface. The processor exists to make these problems go away.
Lark has entities:
public boolean doSystemTextEntity(Entity e, String name, String extID)
and two others...
>
>Generalities:
>Lark has a thing where if any callback returns 'true', the
>parser drops out of its loop... which is awfully useful and easy
>I think. Lark will also re-enter, but this need not be a requirement.
>
>Also, for application programmers, especially dealing with smallish
>objects, a tree interface is very natural. I've written both
>event-stream and tree apps using Lark, and the trees are a lot
>easier to use for anything even moderately complex. So the API
>should have Element, Attribute, and Text classes.
I won't quarrel with this. I would be very happy for a tree interface,
because JUMBO is based on trees. However I didn't want to subclass Lark's
trees if we decided on a different one, because unlike an event stream,
that could take a major rewrite of JUMBO. IFF we can standardise now, I'll
be very happy.
>
>And it shouldn't (sorry Peter) be called YAXPAPI - how about SAX, Simple
Of course it shouldn't - I would second the use of Simple somewhere in it.
>API for XML? Maybe SAX-J for the Java bindings. -Tim
>
Sounds great. let's make sure we get 100% of the way this time.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Dec 14 09:37:54 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:30 2004
Subject: XML-DEV (was Re: YAXPAPI)
In-Reply-To: <002201bd0828$a85d4200$0100007f@localhost>
Message-ID: <3.0.1.16.19971214102740.5b0ff64a@pop3.demon.co.uk>
In replying to Don I'm taking the opportunity to re-iterate and refine some
ideas about the role of XML-DEV.
Thanks Don,
I think I know how you feel and will try to address it. There is no
suggestion that your ideas are not valuable. Since I try to develop this
list as a collaborative communal arena, I'll outline my underlying ideas.
There is only one formal route for XML discussion - XML-WG, with about 10
members chosen from the W3C. They are supported by a larger virtual body of
about 100 experts (XML-SIG). The WG asks the SIG to consider proposals for
XML and related things (XLL, XSL), listens very carefully to what the
XML-SIG says, makes changes, and has regular votes to firm up on the spec.
This culminated in last week's PR.
One of the important ideas of XML was that it should be *simple*. Design
goal 4 in the spec is:
"It shall be simple to write programs which process XML documents".
This was exemplified by the 'Mythical CompSci grad student' who could hack
a non-validating XML parser in 2 weeks. [This person is still quite
mythical :-)] There has also been assumption that the 'desperate Perl
hacker' (DPH) is an important feature of the emerging XML scene. This
person doesn't necessarily use XML tools to manage XML documents - if they
wanted to change a tag they'd just use:
s///g
and most of the time this type of approach works.
I was invited to be part of the XML-SIG although I am not an SGML expert
and have never read 8879. My role has emerged as representing the DPH (or
worse). I have been described as a 'bellwether' and a Dumb XML Browser
Hacker in both of which I take pride as it legitimises my self-appointed
role. This is, very simply, to represent the 99% of future XML users who
know nothing about SGML, objects, DTDs, parameter entities, etc. BUT who
(at least in my vision) want to be more than passive consumers of
shrink-wrapped systems. I felt that HTML (actually HTTP) was an enormous
liberating force because it allowed people to publish for the first time.
The great success of HTML was that anyone could play - you could create
HTML documents after a few hours' experimentation. It was easy - we have
discovered that ease has its price - I feel it's an acceptable one. XML
also has the capability to make publishing available for 'everyone', but
only if it is made simple enough to be a self-replicating idea ('meme').
So I - as 'webhacker' - have consistently argued for simplicity in XML. At
the other end of the spectrum are SGML experts who want XML to provide WWW
support for any current SGML application. The WG has to find a practicable
way forward, and we are accustomed to 'disappointment'. Personally I think
XML is too complex and too difficult to understand - I have made my views
known here :-) [I have argued for the removal of dual quoting,
, NOTATION (which I *still* do not fully understand). I have
argued that the WG should address whitespace more proactively. I have said
that XLL is too abstract and needs further elaboration.] I know that the WG
considers all suggestions and perhaps 1% of what I say has some effect on
the final spec. I'll settle for that.
It has been made very clear that the WG will not address implementation
issues. They understand them, and make decisions based on them, but they do
not want to constrain how people use XML. I applaud this, because XML will
not be the vision of what its creators have now (in 1997) but the
accumulated experimentation of the world over the next few years. What the
WG has addressed is a language which is both robust and flexible - two
extremely difficult things to bring together. I am sure that everyone
involved in the XML process thinks "they have got bits wrong" but we are
all prepared to work with what emerges.
So - to XML-DEV. There is a clear vacuum between the spec and working
applications of XML, and XML-DEV was offered as a way to fill it. It has no
formal status - it's supported by the goodwill of Henry Rzepa and myself
(both molecular scientists - Henry does theoretical calculations on
molecules and my 'day job' is to help people learn how to design new
drugs). We have a not very hidden agenda in wishing XML to prosper, but we
feel we represent an average vertical XML community in the future.
Personally I find SGML very hard. Perhaps this is because I don't use it
every day and because I think in concrete terms (being an experimental
scientist). Words like 'entity' do not bring immediate enlightenment. I do
not fully understand XLL, I do not understand groves, I do not understand
formal design of interfaces, I do not understand the DSSSL spec, I do not
(at least yet) understand the DOM. But I represent 99% of future XML users.
I do not feel I and others should be disenfranchised - that may be
unrealistic and Quixotic, but at least I enjoy the windmills.
In setting up XML-DEV I assumed that lots of people would be developing
software (initially prototypes) for XML, and would need a discussion forum.
I've been surprised how little software there has so far been. Not
disappointed - I'm never disappointed in the virtual arena - what happens,
happens. But personally I think the ratio of talk to action is too high -
maybe that's my scientific background.
I get a small amount of private mail that suggests that XML_DEV has a
useful role, and that continuing to highlight the simple approach is
valuable. There is also general support for a public collaborative forum.
My ideal is to see communal activities arise out of XML-DEV - rather like
the tcl, Linux, LaTeX, Perl and other efforts. I see the WWW as a
biological system - lots of new species evolve and only a very few survive.
Not always the apparently 'best'.
We've had several goes at creating an API on this list. Take it as
axiomatic that everyone has slightly different ideas - some are radically
different. We catalysed the formation of Xapi-J (from John Tigue) -
unfortunately no-one uses it because (I think) they are all waiting for the
DOM. I am too impatient to wait for the DOM I am revising JUMBO and want to
get out the next snapshot.
Those of use who have written simple systems feel we have an urgent need to
rationalise their interfaces. What we (or at least JUMBO) don't want is yet
6 more incompatible parsers. We believe that this is achievable in a short
time. If so, it will give impetus to the communal approach.
History will tell whether this is valuable :-)
At 16:38 13/12/97 -0800, Don Park wrote:
[...]
>First, I do not see the need for simple API. Having a simple API now will
^^^^^^^^^^^^
I do. Remember, I'm Dumb :-)
>definitely help control propliferation of proprietary XML parser API but, in
>the long run, it will restrict application programmers to the set of
>functionalities supported by the simple API.
There was never any suggestion it would be the only API. Let's assume
there are 3 APIs.
- simple
- Object based
- grove based
JUMBO uses the first. If someone says "I would really like JUMBO to sit on
top of groves", I will appeal to the world for someone to have JumboGroves.
[JUMBO is offered as a public communal project.] If no one comes to the
party, too bad :-)
>
>Second, the cat is already out of the bag. For example, MSXML is already in
>IE 4.0 and it is being used by JScript and Java applet programmers.
I am publicly neutral about any software produced by commercial
organisations. There have been some very good de facto standards in the
past, a lot of adequate ones, and some awful ones. History will decide.
My ideal - as stated above - is to provide an environment where the general
mass of XML users have a chance to affect the design and implementation of
XML systems. Maybe this is unrealistic? Please feel free to join in the
software effort :-)
>
>>I have looked at TreeModel in Swing and even implemented a simple JUMBO
>>display on it. I have to confess that, being a Dumb Browser Hacker, I found
>>it quite tough going. If the only interfaces to XML parsers are based on
>>this level of abstraction a lot of people will find them hard.
>
>
>My proposal was mainly for the parser writers and not the application
>writers. Application writers will not be using XmlTreeModel but DOM
>objects. My point was that interfaces like XmlTreeModel should be used to
*This* application writer uses NXP, Lark and AElfred because the DOM ain't
ready and because he doesn't yet understand it :-)
>write DOM framework so that the framework can support all existing and
>future XML parsers.
>
>>WE have been part way down this road before - look through XML-DEV
>>discussions 6+ months ago. I think it's essential we home in on a
>>moderately simple parser NOW - we know what we need to do - we simply need
>>to agree on the precise components and the terminology.
>
>I was not here 6+ months ago and I do not believe that just because there
The list is archived on http://www.lists.ic.ac.uk/hypermail/xml-dev. I am
not suggesting that it's all worth reading, but you might find the stuff
about API useful.
>has been previous discussions makes my proposal any less worthy. Frankly, I
No one has doubted the worthiness of your proposal :-). If you can find
people on XML-DEV who wish to take it up and implement it, I'd be
*delighted*. Really.
All that has happened is that three parser writers have decided to propose
a particular way forward.
>am disappointed by the fact that there was no immediate understanding of the
>advantages my proposal offers. It is partly my fault since I am pretty bad
No, Don. It's the inertia and the time pressures. For me, it would take me
a week to understand. I don't understand the Consumers, etc. in the rest of
java very well. I don't see where an EventConsumer is required in what I
want to do.
I understand the proposal strategically because it has the same look and
feel of other things in Java. In a similar way I didn't understand John
Tigue's API with ParserFactorys and so on - but those who did seemed to
think they were a good way to do things. So - hope that someone less Dumb
than me picks up on your idea :-)
>at explaining things. However, I am disturbed that, while there is a wealth
>of SGML and XML knowledge present in this mailing list, there seem to be a
>lack of object-oriented design knowledge. I do not say this insultingly but
We all have concerns. My concern is that there aren't enough people who are
actively writing code and making it publicly available. My advice would be
to go out and write something that you think does something useful and show
people that it's a GoodThing. That's what I have done with JUMBO - very
much the Dumb persons tool (you wouldn't like to look inside JUMBO - no
Factories, no Consumers, etc.). If you or anyone would like to rewrite
JUMBO properly I'd be *delighted* :-)
>with concern. I appologize if anyone took my opinion negatively.
One of the very positive aspects of XML/SGML is the incredible patience and
politeness of people. There are no flamewars. If people get things formally
wrong they are gently educated in a better way to do it. If their ideas are
way off beam, they often won't get a response of any kind, but if they do
it will be polite and helpful.
>
>>All I want is to get the DOCTYPE stuff from the file. AElfred now provides
>>exactly what I want - we just need to agree it.
>
>
>All one wants is not necessarily what everyone wants and will want. Design
>of a standard API should be approached more carefully and with future in
>mind.
I don't disagree with this :-) You have your opportunity to convince
people, right here. My own suggestion is that working software is a useful
part of an argument.
>
>I am sorry if my comments upset you in anyway. It was not my intention.
I don't get upset in virtual environments :-). [I did once :-), in a
situation so bizarre it could have come straight out of a Shakespearean
comedy. It's not polite to retell it.] Passion is important. People's
ontologies are very dear to them. Flame wars arise from colliding ontologies.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From h.rzepa at ic.ac.uk Sun Dec 14 09:44:00 1997
From: h.rzepa at ic.ac.uk (Rzepa, Henry)
Date: Mon Jun 7 16:59:30 2004
Subject: XML-DEV list errors on weekends
Message-ID:
As the person receiving all the list errors (undelivered mail etc) I try my
best to delete all the ones that seem permanent (a significant proportion
of people who try to subscribe do so with mail addresses that subsequently
bounce). But increasingly, I notice that a large number of errors
(undelivered mail) seem to occur only on weekends. I get perhaps 200-300
such errors each weekend, but fewer on weekdays
Coming form a university background where we run 7 days a week,
I am wondering whether in commerce, companies might implement
policies where mail routers etc perhaps are taken down over weekends?
Is there anyone out there who thinks there might be such a reason
why weekends are problematic?
Henry Rzepa. +44 171 594 5774 (Office) +44 594 5804 (Fax)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sun Dec 14 11:37:02 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <3.0.32.19971213182146.0095b780@pop.intergate.bc.ca>
References: <3.0.32.19971213182146.0095b780@pop.intergate.bc.ca>
Message-ID: <199712141135.GAA00310@unready.microstar.com>
Tim Bray writes:
> At 09:01 PM 13/12/97 -0500, David Megginson wrote:
> >I have worried about this myself. My design goal with ?lfred has been
> >to limit myself to two class files: one for the parser itself, and one
> >for the interface for the callbacks -- hence the separate event for
> >attributes. This decision has forced some pretty severely hacked-up
> >internal code accompanied by very careful documentation.
>
> Hmm, isn't this what JAR and so on are for? Seems like an awfully
> severe design constraint. I certainly agree with "small" as a design
> goal, but it seems like limiting class file count carries a pretty
> high price. - Tim
It is a painfully high price, especially in terms of coding
difficulty; if NS 3.*, NS 4.*, MSIE 3.*, MSIE 4.*, and HotJava all
accepted the JAR files (or any other archive format), then I wouldn't
worry. As it stands, however, that is not the case, and it is
essential that ?lfred be easy to use in existing browsers as well as
future ones. That is the same reason that I didn't use any JDK 1.1
features, despite the fact that I _like_ JDK 1.1.
I am willing to be convinced that an extra couple of class files won't
make a difference to Java applet writers (with no special interest in
XML), but I will need to hear that from them.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sun Dec 14 12:18:20 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:30 2004
Subject: Corrected Examples: XML Architectural Forms
Message-ID: <199712141216.HAA00392@unready.microstar.com>
Here are corrected examples for XML architectural forms, using the
proposed amendment (note also the corrected spelling) to ISO 10744:
Simple XML document with one base architecture:
Simple XML document with two base architectures:
DTD for simple XML document with two base architectures:
Simple XML document two base architectures hidden in DTD:
(Note that I have added quotation marks, in line with XML's handling
of attribute values). The rest of my original message still applies.
Thank you to Robin Cover for gently pointing out my first mistake, and
for being too genteel to point out my second (my second-year Medieval
English teacher told me that if I studied too much Medieval English,
I'd never be able to spell again).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Sun Dec 14 12:18:41 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI and DOM
Message-ID:
I am a bit confused by the recent statements about the "complexity of
DOM" and the proposed simpler alternatives for an object model. The DOM
model seems as simple and direct as all the proposed alternatives. I
could see suggesting changes (I have myself) but it would seem these
should be relative to the DOM as it is or have some significantly new
features. The core DOM 'content' information classes [with the read part
of their interfaces[*1] ] are:
public interface Node {
public int /*NodeType*/ getNodeType();
public Node getParentNode();
public NodeList getChildren();
}
public interface Element extends Node {
public String getTagName();
public NodeList getAttributes();
}
public interface Attribute extends Node {
public String getName();
public NodeList getValue();
public boolean isSpecified();
};
public interface Text extends Node {
public String getData();
public boolean isIgnorableWhitespace();
};
I can't see how you could get much simpler in the number of classes and
the concept for each class[*2].
So if we have the Grove model and the DOM, of what value is another
similar, less-standard standard object model?
--------
I can see a different problem though: it may be that no model will be
useful to standardize for the actual interfaces. Each application will
want slightly different object models that have very small changes that
are very significant to it. Two examples I have in the above are both
from the same type of problem: restricted Typing. In the above
interfaces I would much rather have NodeList->List [the JDK 1.2 interface
for a general indexed collection] because I have many more
implementations and functionality to use for manipulating lists than I do
for NodeList [I could wrapper and delegate all the functionality but that
is much more effort and less maintainable for no real benefit]. Likewise
I would rather have Attribute's value be an Object or a String than a
NodeList. These minor changes make the DOM interfaces themselves
impossible to use: I can have interfaces just like them but they will
have to be my own version.
I suspect this may always be the case. I have helped build many large
and small information system models and none of them committed to using
exactly somebody else's code for the DomainModel[*3]. Having control
over the model of the information your application works with is crucial
to both good design and good/maintainable implementations. This isn't to
say you can't use someone else's designs: that works excellently (e.g.
Design Patterns and Analysis Patterns). You can even start with someone
else's code but you will almost certainly need to modify that model ever
so slightly (or majorly) at some point.
An approach that works better than defining an exact ObjectModel
(i.e. exact Types) to implement is to think from outside the Model: to
the client and supplier points of views. From the outside people only
care about limited interfaces and protocols that a DomainModel must
support to work with them. This is how Swing's TreeModel works (as long
as you support the TreeModel interface you are worthy) and other 'M's in
the MVC pattern. This is also how Java Beans work, but with a runtime
signature-binding approach. In all these cases, the client/supplier
requirements come first and you can decide if you want to work with them
by suitably designing and implementing your DomainModel.
So I suspect all of the following are true:
(1) The DOM interfaces will be exactly suitable to some
applications
(2) There are many applications that the DOM interfaces
(as exact code) will not be suitable
(3) The DOM model is a good design model and template for
a good number of these applications
(4) It would be good to suggest possible modifications to the
DOM to either make it better or as possible alternatives
for people in situation (2)
(5) There are many good reasons to start defining the possible
clients and services that (Document) DomainModels may want
to use. [*4]
(6) There is no reason to have a similar model to the DOM
and make it a semi-standard
(7) Frequently (2) will turn to using (3), (4), and (5) to make a
suitable model, so these will be very valuable.
So it would seem good to focus on all of (1)-(5) in the above but not on
(6) except as it helps to understand the others[*5].
--Mark
mark.fussell@chimu.com
[1] I made a couple minor stylistic/convention changes (e.g 'is' for
booleans) to these interfaces.
[2] I coded a skeleton implementation (able to construct, inspect, and
print objects) of the level-1 DOM model (i.e. including the DocumentType
classes) in a part of an evening and offered to provide it as source in a
previous email.
[3] Except, for a while, when the model can be extended without changing
the source (a Smalltalk/ENVY feature).
[4] As an example of (5) in DOM, the DOM interfaces are generally Java
Bean compatible. This is very useful in Java: the MONDO DOM
ObjectBuilder had exactly one line of code to specify how to take a
recipe for a (for example) ModelGroup and build a ModelGroup object:
addBeanFactoryFor_toBuilder(ModelGroupClass.class,builder);
[5] In the above I am not referring to an event oriented API, but will
respond to that in a different email.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Dec 14 15:29:54 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <199712141135.GAA00310@unready.microstar.com>
References: <3.0.32.19971213182146.0095b780@pop.intergate.bc.ca>
<3.0.32.19971213182146.0095b780@pop.intergate.bc.ca>
Message-ID: <3.0.1.16.19971214152950.315f45c6@pop3.demon.co.uk>
At 06:35 14/12/97 -0500, David Megginson wrote:
>Tim Bray writes:
[...]
> >
> > Hmm, isn't this what JAR and so on are for? Seems like an awfully
> > severe design constraint. I certainly agree with "small" as a design
> > goal, but it seems like limiting class file count carries a pretty
> > high price. - Tim
The following assertions are based on ignorance and hearsay...
As I understand it, if Java wants a method in a class, it loads the whole
class into the virtual machine. Therefore if you have a large complex
class you have a constant large overhead in terms of (a) HTTP connections
(b) JVM space. I have a number of very large classes (e.g. > 100 member
functions, some quite crunchy) so I have been thinking of doing the exact
reverse to DavidM - i.e. splitting up my classes into smaller bits. Thus
my MOLNode implements Drawable routines, Linkable (XLL), Editable,
Validatable at least. If I have a very simple application it will still
download all these functions (am I right?) and also keep them in the JVM so
long as there is a re ference to an object of the class (am I still right?)
So I am thinking of splitting these into smaller chunks, such as
DrawableMethods, etc which don't need to be loaded if not used. Would the
same apply to AElfred? Thus if you had two chunks - DTD.class and
Instance.class (or whatever) and the document instance had no DTD, you'd
never need to load the DTD class, right?
Poor old JUMBO comes to 500 Kbytes at least if it's all there. That
includes things like matrix.diagonalise(), ProteinSequence.Align() and
Bivariate.display(Axes). I am assuming that (a) things will speed up (b)
classes can be cached client-side (c) the excitement of finally getting the
display will hold the reader in her seat long enough. I'm certainly
assuming that JAR files will happen (or equivalent). IOW I'm not designing
for speed, but functionality.
P.
>
>It is a painfully high price, especially in terms of coding
>difficulty; if NS 3.*, NS 4.*, MSIE 3.*, MSIE 4.*, and HotJava all
>accepted the JAR files (or any other archive format), then I wouldn't
>worry. As it stands, however, that is not the case, and it is
>essential that ?lfred be easy to use in existing browsers as well as
>future ones. That is the same reason that I didn't use any JDK 1.1
>features, despite the fact that I _like_ JDK 1.1.
I would assume it's possible to re-route the client to a non-JAR applet if
required.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Sun Dec 14 17:13:57 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:30 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <002201bd08b3$2bf38630$0100007f@localhost>
Peter,
>Poor old JUMBO comes to 500 Kbytes at least if it's all there. That
>includes things like matrix.diagonalise(), ProteinSequence.Align() and
>Bivariate.display(Axes). I am assuming that (a) things will speed up (b)
>classes can be cached client-side (c) the excitement of finally getting the
>display will hold the reader in her seat long enough. I'm certainly
>assuming that JAR files will happen (or equivalent). IOW I'm not designing
>for speed, but functionality.
Problem with relying on cached Java classes is that a typical browser user
will flush the cache quite frequently (everyday in my case because one day
of work leaves me with about 25 to 50 meg of useless web pages and images in
my cache). I would prefer to leave the Java classes in the cache but
current crop of browsers offers little control when it comes to cache
content.
My advice is to solve the download problem from user perception angle.
Users expect applets to download fast (1 to 5 minutes) because they are
expecting to see the applet as part of a web page. Their focus is on the
content and not the code. They do not realize emotionally that content must
be rendered by applets and applets take time to download. On the other
hand, when they are asked to manually download something and install it,
they display more patience because they know they are downloading software
and not content. They are already familiar with the timescale of getting
and installing new software so wait of 10 minutes to 1 hour is not going to
tick them off. One added bonus is that, since you can install into
browser's classpath, you get higher security clearance.
If you really need to go the download-on-demand applet route, you can divide
up your classes into two parts.
First part is a small set of classes with following objectives:
1. Put something up to grab user's attention. Amuse him with something or
render non-editable view.
2. Prefetch resources such as XML files and the second part.
The second part is the full set of classes. The point is that something
like your XML browser applet will usually display some XML files which are
not fetched until all the classes are downloaded unless they are prefetched
using a scheme like the above.
Hope this helps,
Don Park
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From simeons at allaire.com Sun Dec 14 21:55:22 1997
From: simeons at allaire.com (Simeon Simeonov)
Date: Mon Jun 7 16:59:31 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <01bd08db$a2559660$4a15b5cd@sim.allaire.com>
I come to this discussion late (4:30pm EST on Sunday :) so my set of
assorted notes is addressed at no one in particular.
I like the acronym SAX. It's short and sweet.
In principle I agree with the idea that an API simpler than what DOM exposes
will be useful. This is especially true in the short run--until fully DOM
compliant implementations with a variety of language bindings become readily
available. I absolutely agree with the need for both event-driven and a
tree-based interfaces. My product, the Cold Fusion Application Server, needs
both. And it really only needs to know about text, elements, and attributes.
All else is currently of no interest to the tens-of-thousands of web
application developers that use CFAS.
A note of caution. I hope that in your mind SAX is not the same as SAX-J.
Some of the API proposals I have seen have a very strong Java flavor. For
example, I see the need for an API that does not require runtime type
information. The equivalent of instanceof in C++ is the dynamic_cast()
operator. It requires the enabling of RTTI which imposes an immediate and
quite noticeable size and performance penalty. IMHO, runtime type
information is necessary only when the object model of a system is
undergoing continuous change. I don't see this being the case with SAX.
I cannot invest the time in writing an XML parser in C++ right now, but I'd
be more than happy to contribute to this discussion to make sure that SAX is
a C++-friendly API.
Regards,
Simeon Simeonov
Allaire
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Dec 14 22:57:18 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:31 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <01bd08db$a2559660$4a15b5cd@sim.allaire.com>
Message-ID: <3.0.1.16.19971214234149.51f7a68e@pop3.demon.co.uk>
At 17:00 14/12/97 -0500, Simeon Simeonov wrote:
Thanks very much Simeon,
>I come to this discussion late (4:30pm EST on Sunday :) so my set of
>assorted notes is addressed at no one in particular.
>
>I like the acronym SAX. It's short and sweet.
So do I.
>
[...]
>
>A note of caution. I hope that in your mind SAX is not the same as SAX-J.
>Some of the API proposals I have seen have a very strong Java flavor. For
I agree with your point - personally I have no idea how to write a language
independent API, but for this one I suspect it's fairly straightforward
because of the relative simplicity.
>example, I see the need for an API that does not require runtime type
>information. The equivalent of instanceof in C++ is the dynamic_cast()
>operator. It requires the enabling of RTTI which imposes an immediate and
>quite noticeable size and performance penalty. IMHO, runtime type
>information is necessary only when the object model of a system is
>undergoing continuous change. I don't see this being the case with SAX.
This seems to make sense. I think the main area where this might be used is
in children, where a child could be either an Element or PCDATA, and you
found out which by asking it. I assume it can be managed with strong typing
as well.
>
>I cannot invest the time in writing an XML parser in C++ right now, but I'd
>be more than happy to contribute to this discussion to make sure that SAX is
>a C++-friendly API.
I think that's a very useful offer :-)
I have been thinking as we go how we manage other languages like tcl and
Perl (I know tcl, but not Perl). I assume some parts of the interface can
almost be translated algorithmically, but others may be tricky. [Even I am
not going to ask for a FORTRAN interface :-)]
P.
>
>Regards,
>
>Simeon Simeonov
>Allaire
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From eliot at isogen.com Sun Dec 14 22:58:07 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 16:59:31 2004
Subject: XML Architectural Forms
Message-ID: <3.0.32.19971214165523.006a86d0@swbell.net>
At 05:58 PM 12/13/97 -0500, David Megginson wrote:
>I don't remember seeing an announcement here (apologies if I'm
>mistaken), but Eliot Kimber and James Clark have announced on
>comp.text.sgml a proposed ammendment to ISO 10744 that will make it
>possible to use Architectural Forms in XML. You can find the text of
>the ammendment at the following URL:
Dave,
Thanks for the announce. Unfortunately, my original post contained an
error, which was inadvertently carried forward into your post. The
examples should read:
And
And finally,
I appologize for any confusion my original error has caused.
Cheers,
Eliot
--
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Sun Dec 14 23:57:36 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:31 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <3.0.1.16.19971214152950.315f45c6@pop3.demon.co.uk>
Message-ID:
On Sun, 14 Dec 1997, Peter Murray-Rust wrote:
> The following assertions are based on ignorance and hearsay...
>
> As I understand it, if Java wants a method in a class, it loads the whole
> class into the virtual machine. Therefore if you have a large complex
> class you have a constant large overhead in terms of (a) HTTP connections
> (b) JVM space.
Actually, for a given amount of functionality there is more overhead to
have it divided into multiple files than in a single file. First, the
number of HTTP connections will go up as the number of files goes up and
HTTP connection setup time is significant. Second, the size of a '.class'
file is usually about evenly divided between the symbol table (containing
method names) and the actual method bytecodes. If you split a file you
need to duplicate the inter-file method-name symbols. You may also need
to duplicate some helper methods. Third, to resolve a class (which is
required to use its methods) you need to resolve all of the types that the
class [as a whole] uses, which will cause you to load in any related
classes. This makes it unlikely that only a single smaller class would be
loaded if a big class is divided. [Typing to interfaces instead of
classes reduces this chaining problem.]
So for XMLParser, which is currently ~24K, splitting it into (say) four 7K
files (6K+1K repetition) will probably increase the overall download time
noticeably (>20%). A compressed JAR would remove all of these issues but
David said he wants Aelfred to be backward compatible, so a single
'class' file is a good (probably the best) approach.
>..I have a number of very large classes (e.g. > 100 member
> functions, some quite crunchy) so I have been thinking of doing the exact
> reverse to DavidM - i.e. splitting up my classes into smaller bits. Thus
> my MOLNode implements Drawable routines, Linkable (XLL), Editable,
> Validatable at least. If I have a very simple application it will still
> download all these functions (am I right?) and also keep them in the JVM so
> long as there is a re ference to an object of the class (am I still right?)
Yes, class objects and their code are maintained until there are no
references to the class object (and instances are obviously one of those
references). It is definitely a good idea to compose larger classes
(frequently the root domain objects) out of more focused pieces and
loosely connect them through delegation interfaces. It is also good to
seperate presentation (everthing but validation above) from the core
information of the model. This is sometimes hard to remember for
information systems and is certainly more difficult when you view you
information as inherently display-oriented (i.e. "documents").
> So I am thinking of splitting these into smaller chunks, such as
> DrawableMethods, etc which don't need to be loaded if not used. Would the
> same apply to AElfred? Thus if you had two chunks - DTD.class and
> Instance.class (or whatever) and the document instance had no DTD, you'd
> never need to load the DTD class, right?
But AElfred may need to use either at potentially any time. It is very
likely that a reference to the DTDParser.class would crop into the
InstanceParser.class and make it necessary to load both. I would
certainly think it is another significant constraint to try to manage
avoiding any of these cross-references, even if the DTD and the rest of
the instance are conceptually well seperated.
> I would assume it's possible to re-route the client to a non-JAR applet if
> required.
This would be up to the deploying applet maker. Also, generally it is
best to JAR everything together that will be needed together so this would
be a different (larger) JAR than just an Aelfred specific JAR.
That does lead me to a question: David, are you assuming that XmlParser is
always standalone? If someone packages it with their application/applet
it is only about 11K in a compressed JAR so you are pretty close to your
size design goal from that pespective.
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Mon Dec 15 01:10:55 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:31 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To:
References: <3.0.1.16.19971214152950.315f45c6@pop3.demon.co.uk>
Message-ID: <199712150109.UAA00340@unready.microstar.com>
Mark L. Fussell writes:
> That does lead me to a question: David, are you assuming that XmlParser is
> always standalone? If someone packages it with their application/applet
> it is only about 11K in a compressed JAR so you are pretty close to your
> size design goal from that pespective.
No, I have designed it mostly to be hidden deep within applets and
applications (possibly without the user even being aware that the
applet has an XML parser). I am glad to hear that ?lfred compresses
down to 11K in a JAR file; now if only Bill and Scott would kiss and
make up, and the entire user community would celebrate the moment by
upgrading their browsers...
On a different note, I have given ?lfred a zero-argument constructor,
and have provided some public setters and accessors, but I have not
implemented the Serializable interface (because it does not exist in
Java 1.0.2). I am curious whether ?lfred will work in the Beanbox as
a simple invisible bean, but I haven't had time to download the
Beanbox and check.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Mon Dec 15 01:18:09 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:31 2004
Subject: Java question
Message-ID: <3.0.32.19971214171946.009acb90@pop.intergate.bc.ca>
O Java gureaux, the recent discussions about compactness have started me
thinking. I'm now debugging the validation code for Lark. I've got it
in a separate package, textuality.validator; the validation code is
half-again as large as the basic WF code: 60+ K as opposed to 45K; it
has classes like 'DTD' and 'Attlist' and 'Validator' and so on.
I'd like for people who want to use Lark as just a WF checker to
avoid the overhead of downloading 60K of validation rubbish. Lark now
has a method called lark.validate(boolean) and if it's not turned on,
none of those textuality.validator classes will ever get invoked.
However, is an applet loader going to pull 'em all in over the
network regardless?
I suppose if this is the case, I could create two different Lark
distributions, using the trick documented in the O'Reilly book
where I say
private static final boolean sVALIDATE = false;
and then bracket all refs to validation classes with
if (sVALIDATE)
{
}
which won't get compiled.
Or, should I provide stubbed-out class files for the only two classes
that are directly referenced, DTD and Validator?
Or, is this worth worrying about? Or is there a standard way to
achieve this effect? Wisdom welcome. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Mon Dec 15 02:39:23 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:31 2004
Subject: Java question
In-Reply-To: <3.0.32.19971214171946.009acb90@pop.intergate.bc.ca>
Message-ID:
On Sun, 14 Dec 1997, Tim Bray wrote:
> I'd like for people who want to use Lark as just a WF checker to
> avoid the overhead of downloading 60K of validation rubbish. Lark now
> has a method called lark.validate(boolean) and if it's not turned on,
> none of those textuality.validator classes will ever get invoked.
> However, is an applet loader going to pull 'em all in over the
> network regardless?
For most VMs, if the code executed encounters the class as a variable type
it will ask the ClassLoader to load it. If you want to prevent a class
from loading you need to prevent any references to the class from occuring
except in the exact context when you want it to be loaded.
> Or, should I provide stubbed-out class files for the only two classes
> that are directly referenced, DTD and Validator?
>
> Or, is this worth worrying about? Or is there a standard way to
> achieve this effect? Wisdom welcome. -Tim
The simplest approach would be to define an interface or abstract class
for the DTD and validator that has the minimum your main class needs.
This will always be loaded with the main class. Then you can
implement/subclass off this interface to put the full validation
functionality in. Finally, have exactly one method in the main class that
constructs an object of the full implementation (assigning it to a
variable of the general type) and only call this method when you need to.
The full implementation classes should only be loaded when this
particular method is called, so (for your example) only when validation
is turned on.
You can verify what is happening with loading by turning on 'java' verbose
mode to see if everything is working ok. Some VMs behave differently
(delayed loading is not part of the Java spec, just common), but I think
most VMs behave this way.
Hope that helps.
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Mon Dec 15 02:58:17 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:31 2004
Subject: Java question
Message-ID: <000f01bd0904$cf823ca0$0100007f@localhost>
Tim,
JStud here :-). Since my cape is at the dryer, I will be your average Java
guru and see if I can answer your questions. I must warn you that my ego is
positively radioactive .
>I'd like for people who want to use Lark as just a WF checker to
>avoid the overhead of downloading 60K of validation rubbish. Lark now
>has a method called lark.validate(boolean) and if it's not turned on,
>none of those textuality.validator classes will ever get invoked.
>However, is an applet loader going to pull 'em all in over the
>network regardless?
Rules on that issue is very hairy depending on which browser and which
version is used. The Netscape browser, for instance, will download
referenced classes if there is no method involved.
a = b where b is an instance of class B.
a instanceof B
will cause code verifier to download class B. However, a = b.foo(); will
not. Their recommendation is that code like above should be wrapped inside
a method like this:
public class B {
B foo () { return this; }
boolean bar(Object a) { return a instanceof B; )
}
Frankly, I find all this details troublesome and not worth two tablets of
Advil. I recommend staying well clear of it.
>I suppose if this is the case, I could create two different Lark
>distributions, using the trick documented in the O'Reilly book
>where I say
>
>private static final boolean sVALIDATE = false;
>
>and then bracket all refs to validation classes with
>
>if (sVALIDATE)
>{
>
>}
>
>which won't get compiled.
Above trick causes too much code management problem and is typically used
for debugging purposes only. I wouldn't recommend it either.
>Or, should I provide stubbed-out class files for the only two classes
>that are directly referenced, DTD and Validator?
There is no need for stubbing. You can create an interface for the
Validator and combine that with
Class.forName(). For example:
public interface XmlValidator {
void doThis(XmlParser ctx);
void doThat(XmlParser ctx);
}
public class XmlParser {
public void validate () {
...
XmlValidator validator =
(XmlValidator)Class.forName("lark.SuperXmlValidator");
validator.doThis(this);
validator.doThat(this);
}
}
You will need to wrap Class.forName code with some exception catchers but
above code basically allows late-binding to the SuperXmlValidator class and
only when it is actually used. You will still need to have the
lark.SuperXmlValidator code in the same Zip or Jar file since not all
browsers support multiple Zip/Jar. The trick of using invisible applets to
load multiple archives does not work either because the browser will use
different classloader if Zip/Jar specification attribute is different.
Anyway, interface + Class.forName scheme is better for code management since
you will only have to change the make file rather than Java source files.
>Or, is this worth worrying about? Or is there a standard way to
>achieve this effect? Wisdom welcome. -Tim
I think it is worth worrying about but there is no standard way. I am
afraid that the browser war left the applet world still pretty much in the
wild and wacky west state.
Hope this helps,
Don "JStud" Park
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Mon Dec 15 03:11:44 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:31 2004
Subject: Java question
Message-ID: <002301bd0906$b0e05eb0$0100007f@localhost>
Oops! My code had some errors. Here is the correct version:
public class XmlParser {
public void validate () {
...
Class clazz = Class.forName("lark.SuperXmlValidator");
XmlValidator validator = (XmlValidator)clazz.newInstance();
validator.doThis(this);
validator.doThat(this);
Gosh, I really need my cape back ;-).
I hope you all are having nice Sunday dinners rather than sitting in front
of the computer like I am.
Don "Capeless JStud" Park
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Mon Dec 15 07:45:40 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:31 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To: <3494D2F5.C8070627@javalab.uoregon.edu>
Message-ID:
[Did you post these replies also? I don't yet see them from the list.]
> Mark Fussell wrote:
> > ... If you split a file you
> > need to duplicate the inter-file method-name symbols. You may also need
> > to duplicate some helper methods.
Sean Russell wrote:
> Is this significant? Probably, but it isn't severe. The following is a
> list of a set of trivial classes...
[showing no class size difference when splitting].
What I was referring to is if the classes have to call between each other
when they are split. So test3 calls test4's method and test4 calls
test3's method. The number of cross-calls will increase the total file
size because of the duplicated symbols (method names) in the table.
But as you indicated: the less coupled the classes the better the design,
so this duplication "penalty" is actually a reasonable metric on how well
a large class was divided into logical smaller classes.
> If you can look at your code and see Objects, you should go ahead and extract
> them out into smaller classes. OOP is going to benefit you and anyone
> else who is going to be reading and modifying your code in six months.
Although I would normally encourage this I think we are [or I am] on a
slightly more esoteric "size-management" optimization topic for a very
particular case of an ultra-small, fast-download, XML parser. In any
other case, ignore what I said about file sizes and just design really
nice clean OO software.
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Mon Dec 15 10:31:55 1997
From: mecom-gmbh at mixx.de (james anderson too)
Date: Mon Jun 7 16:59:31 2004
Subject: external subset syntax
Message-ID: <348EC8EE.62C2BA81@mixx.de>
do i understand the (PR-xml-971208) document correctly, that it is
possible, by means of distinct document type declarations, to refer to
different aspects of a document type definition?
according to
[31] extSubset ::= ( markupdecl | conditionalSect | PEReference
| S )
there is no requirement that the external subset begin with a document
type declaration.
which would mean that the only indication of the root element type comes
from the refering declaration. which would mean that distinct xml
documents could well specify different root elements with respect to the
same external subset.
is this intentional?
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Mon Dec 15 10:58:00 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:31 2004
Subject: external subset syntax
References: <348EC8EE.62C2BA81@mixx.de>
Message-ID: <34950C6F.707DF3DC@jclark.com>
james anderson too wrote:
> do i understand the (PR-xml-971208) document correctly, that it is
> possible, by means of distinct document type declarations, to refer to
> different aspects of a document type definition?
Yes.
> according to
> [31] extSubset ::= ( markupdecl | conditionalSect | PEReference
> | S )
>
> there is no requirement that the external subset begin with a document
> type declaration.
It is a requirement that the external subset *not* begin with a document
type declaration.
> which would mean that the only indication of the root element type comes
> from the refering declaration. which would mean that distinct xml
> documents could well specify different root elements with respect to the
> same external subset.
>
> is this intentional?
Yes.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tms at ansa.co.uk Mon Dec 15 12:37:50 1997
From: tms at ansa.co.uk (Toby Speight)
Date: Mon Jun 7 16:59:31 2004
Subject: Corrected Examples: XML Architectural Forms
In-Reply-To: David Megginson's message of "Sun, 14 Dec 1997 07:16:41 -0500"
References: <199712141216.HAA00392@unready.microstar.com>
Message-ID:
A non-text attachment was scrubbed...
Name: not available
Type: text/plain (pgp signed)
Size: 1628 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971215/305a0984/attachment.bin
From eliot at isogen.com Mon Dec 15 14:41:32 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun 7 16:59:31 2004
Subject: Corrected Examples: XML Architectural Forms
Message-ID: <3.0.32.19971215083801.00e35c94@swbell.net>
At 12:36 PM 12/15/97 +0000, Toby Speight wrote:
>-----BEGIN PGP SIGNED MESSAGE-----
>
>David> David Megginson
>
>> In article <199712141216.HAA00392@unready.microstar.com>, David
>> wrote:
>
>David> Here are corrected examples for XML architectural forms, using
>David> the proposed amendment (note also the corrected spelling) to
>David> ISO 10744:
>David>
>David>
>David> Simple XML document with one base architecture:
>David>
>David>
>David>
>David>
>David>
>
>Will that enable us to write DSSSL stylesheets as XML?
Yes.
Cheers,
E.
--
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004
www.isogen.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Mon Dec 15 15:04:42 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:59:31 2004
Subject: CharData
In-Reply-To: <348D9ED8.9E109B4E@medlib.com> (message from Chris Hubick on Tue,
09 Dec 1997 12:41:12 -0700)
Message-ID: <199712151508.KAA12097@geode.ora.com>
[Chris Hubick]
> All this change seems to have done is disallow "]]>" in element
> content!
Since no one's replied yet (at least not on-list):
You are correct; ']]>' is forbidden in element content, as it should
be. This is cruft from SGML; the msc/mdc combination (marked section
close = ']]', markup declaration close = '>') is always recognized as
a delimiter (see Figure 3, ISO 8879). As a result, XML mandates that
this combination always be escaped using ']]>' unless it actually
closes a marked section.
-Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Mon Dec 15 15:44:23 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:59:31 2004
Subject: (message from Richard Light
on Wed, 10 Dec 1997 22:26:57 +0000)
Message-ID: <199712151547.KAA12838@geode.ora.com>
[Richard Light]
> I notice that the current draft has switched the case of the XML
> declaration and its arguments to lower case:
>
>
>
> Now that case is significant, this presumably matters.
Absolutely. Note the disappearance of the "match/exactly match"
dichotomy from the terminology section of the specification; *all*
matches must be exact.
> Is there a particular reason for this?
It was a posted WG decision to use lower-case for all possible
keywords.
> Other PIs will have a PItarget where 'xml' sits, and this isn't
> constrained to be any particular case.
Which ones? The WG-produced specifications will use lower-case, and
other applications are forbidden from using [Xx][Mm][Ll] per the prose
in the section on names.
> Wouldn't it be kinder to make it '' ('XML'|'xml') ... ?!
Not really. It was introduce an instance of case-insensitivity into
an otherwise completely case-sensitive specification, making education
more difficult.
> (The DTD declarations ( presumably for compatibility with what SGML systems produce.)
Not for compatibility - it's required by ISO 8879. The NAMECASE
GENERAL NO parameter in the SGML declaration only affects processing
of the DTD and document instance; keywords declared in the declaration
itself are still folded to upper case. This means that any such
keywords (like ELEMENT, DOCTYPE, ATTLIST, NOTATION) must be uppercase.
-Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Mon Dec 15 15:58:36 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:59:31 2004
Subject: General comments on parsers
In-Reply-To: <3.0.1.16.19971211024212.2d87bafc@pop3.demon.co.uk> (message from
Peter Murray-Rust on Thu, 11 Dec 1997 02:42:12)
Message-ID: <199712151602.LAA13427@geode.ora.com>
[Peter Murray-Rust]
> It's very tedious to have to implement different interfaces for each
> (AElfred has about 30 methods - and they are all valuable). So:
> - Chris
[...]
> any comments on a common interface :-)?
Not the Chris you were looking for, but the DOM is standardizing
access to XML DTDs, according to Lauren Wood's presentation at
SGML/XML '97.
-Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mtbryan at sgml.u-net.com Mon Dec 15 16:13:22 1997
From: mtbryan at sgml.u-net.com (Martin Bryan)
Date: Mon Jun 7 16:59:32 2004
Subject: XML interface to Oracle
Message-ID: <01bd0962$c6b5f360$LocalHost@default>
I have just been asked by the European Commission Statistics Office if
anyone has yet hooked an XML parser directly to an Oracle database. (They
need to transfer large amounts of data between heterogeneous databases and
are considering using XML as a possible database neutral format, with data
from specific XML elements being loaded into predefined Oracle fields.)
Any advice on how to do this would be welcome.
Martin Bryan
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Per-Ake.Ling at uab.ericsson.se Mon Dec 15 16:31:13 1997
From: Per-Ake.Ling at uab.ericsson.se (Per-Ake Ling)
Date: Mon Jun 7 16:59:32 2004
Subject: external subset syntax
Message-ID: <199712151630.RAA27288@uabs19c27.eua.ericsson.se>
> From jjc@jclark.com Mon Dec 15 11:59:21 1997
> james anderson too wrote:
...[snip]
> > according to
> > [31] extSubset ::= ( markupdecl | conditionalSect | PEReference
> > | S )
> >
> > there is no requirement that the external subset begin with a document
> > type declaration.
>
> It is a requirement that the external subset *not* begin with a document
> type declaration.
>
If it were permitted, it would mean that there is a doctype declaration
within a doctype declaration, which is clearly nonsense. It is a common
misunderstanding that DTD means "document type declaration" instead of
"document type definition".
> > which would mean that the only indication of the root element type comes
> > from the refering declaration. which would mean that distinct xml
> > documents could well specify different root elements with respect to the
> > same external subset.
> >
> > is this intentional?
>
> Yes.
>
Not only that, it is an underexploited feature in SGML that this is the
case. The only indication of real use of this feature in SGML comes from
Eliot Kimber, but I believe that it would be even more valuable in XML.
Per-Åke
--
Per-Åke Ling (note: Per-Åke, transliteration Per-Ake)
email: Per-Ake.Ling@uab.ericsson.se phone: +46 8 727 5674
Ericsson Utvecklings AB mobile: +46 70 790 2446
AXE Research and Development fax: +46 8 727 3463
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Mon Dec 15 16:36:09 1997
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 16:59:32 2004
Subject: XML interface to Oracle
Message-ID: <01bd0977$71a77500$1011e391@mhklaptop.bra01.icl.co.uk>
-----Original Message-----
From: Martin Bryan
To: xml-dev@ic.ac.uk
Date: 15 December 1997 16:14
Subject: XML interface to Oracle
>I have just been asked by the European Commission Statistics Office if
>anyone has yet hooked an XML parser directly to an Oracle database. (They
>need to transfer large amounts of data between heterogeneous databases ...
Then why specify Oracle in the question?
There are at least two obvious ways of representing a table in XML, the main
decision
is whether to represent data values as attributes or as content. No doubt,
given the
richness of XML, the experts could come up with many less obvious
representations
as well. Using any of the parsers I have looked at, any of these formats
could be trivially
translated into the kind of input formats (e.g. CSV files) that existing
RDBMSs will accept.
Or perhaps by "directly" you want to avoid the intermediate CSV file: well
that's not difficult
either but it's more work and I don't see much benefit in it.
There's a class in the MSXML distribution that gives ADO access to an XML
file, which
illustrates what can be done, though of course this is Microsoft
proprietary.
But it all begs the question, what are you (or they) trying to achieve? Is
there really a practical
problem with transferring data between relational databases that XML can
solve? People do
it all the time with various flavours of CSV, so why bother?
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Kenneth.J.Meltsner at jci.com Mon Dec 15 17:05:26 1997
From: Kenneth.J.Meltsner at jci.com (Meltsner, Kenneth J)
Date: Mon Jun 7 16:59:32 2004
Subject: XML application
Message-ID: <8625656E.005D4668.00@Corpnotes.JCI.Com>
David Winer, developer of the Frontier scripting language, has gone
over to the XML camp.
His website (www.scripting.com) is now keeping all of its changes in
XML format. Also, the last couple of messages from his mailing list
are centered on how great XML is.
I've included one (without permission) to show what's going on:
>From Scripting News... It's DaveNet! Released on 12/15/97; 6:43:15 AM PST
-------------------------------------
Everyone talks in hushed tones about XML. Shhh. It's exciting! But
what does it do?
I can't get involved with something without immediately trying to
ground it with an application. How else could I know if it's worth
exploring?
Luckily, I had an application waiting for XML. ***siteChanges.xml
Jump to this page, and if necessary, view source to see a real
application of XML.
A scheduled script produces that file, running every night at 12
midnight Pacific. It scans our server for new pages, or pages that
were modified in the last 24 hours.
A search engine like Alta Vista, InfoSeek, Newsbot or Excite could
read this page every night at 9AM GMT. They wouldn't have to crawl the
whole site to find the pages that changed, as they do now, they could
just load the pages that have changed since the last time they
visited.
***How was the file generated?
It doesn't matter!
One bit of software can talk to another, and all they need to agree
about is the format of the data they want to exchange. There's nothing
interesting about how the information is generated (mod dates are a
common feature of all current operating systems). What matters is
that there's a format that can be understood on all operating systems.
So even if we use PERL running on Solaris to create the XML-based
info, you can read it on Windows 95 running Microsoft Access, or on an
IBM mainframe running Oracle, or Rhapsody running Sybase, or an
ancient CP/M box running dBASE II.
It doesn't matter. That's the magic of XML.
***It's about relationships
XML allows sites to easily establish an ongoing relationship with
search engines. Each machine does what it's good at doing. Network
traffic is minimized. The best picture of our site assembles itself on
the search engine server.
On our LAN it's easy and fast to find all the pages that changed. Why
should a search engine struggle to find this information when we can
easily generate it? There's no need.
We've been waiting for an agreed-to format for this functionality. I
think every webmaster will recognize the value of maintaining this
information. We need agreement with the search engine companies. If
you represent such a company, please send me email so you can be
included in the discussion.
I think we need a practical example of real-world XML. This may be the
first one?
***Is XML just hype?
No, it's not. Unlike some other industry initiatives, there are
applications waiting for XML. I wouldn't be involved if that weren't
true.
XML can go somewhere. First search engines, then caches running on
your LAN, then sandboxes running on each machine.
We're going somewhere...
Let's have fun!
Dave Winer
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Ingo.Macherius at TU-Clausthal.de Mon Dec 15 17:55:49 1997
From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius)
Date: Mon Jun 7 16:59:32 2004
Subject: XML application
In-Reply-To: <8625656E.005D4668.00@Corpnotes.JCI.Com>
Message-ID: <199712151755.SAA02724@sinfonix.rz.tu-clausthal.de>
> From: "Meltsner, Kenneth J"
> Date: Mon, 15 Dec 1997 10:56:09 -0600
> Subject: XML application
> David Winer, developer of the Frontier scripting language, has gone
> over to the XML camp.
[...]
>
It is not valid XML, because of 3 things:
1) the PI is missing.
2) is used case-insensitive
3) Being a wellformed document, a parser considers as the
document (root) element. xmlwf considers anything after as
junk. Need to introduce a different root element.
Further it reads:
[...]
frontier5/download.html
Sat, 13 Dec 1997 23:01:36 GMT
frontier5/fasttrack/aboutThisSite.html
Sat, 13 Dec 1997 00:54:49 GMT
[...]
Having numbered tag-names it's impossible to give a DTD, as
there is an potentially infinite set of tags. Why not only use
and leave numbering to the receiving application ? Or give the numer
as attribute ? Why there is a need for numbering at
all ?
++im
--
Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Ingo.Macherius at TU-Clausthal.de Mon Dec 15 18:04:15 1997
From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius)
Date: Mon Jun 7 16:59:32 2004
Subject: XML application
In-Reply-To: <199712151755.SAA02724@sinfonix.rz.tu-clausthal.de>
References: <8625656E.005D4668.00@Corpnotes.JCI.Com>
Message-ID: <199712151804.TAA03231@sinfonix.rz.tu-clausthal.de>
> >
> It is not valid XML, because of 3 things:
More exact: It is neither well-formed nor valid XML ...
> 1) the PI is missing.
Ouch. This is not well-formed at all.
Use as to the latest PR.
++im
TODO: Learn to proofread mail before sending. Sorry.
--
Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ddb at criinc.com Mon Dec 15 18:42:43 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun 7 16:59:32 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <3.0.32.19971215103901.00ad61f0@mailhost.criinc.com>
At 09:01 PM 12/13/97 -0500, David Megginson wrote:
>Tim Bray writes:
>
> > > attribute(XmlParser, String, String, boolean)
> >
> > It seems completely wrong to have an attribute event separate from
> > start-element events.
[snip]
>I could send a hashtable of attribute names and values with the
>startElement() callback, and let users look up types (etc.) with my
>query methods, but I would have to lose a bit on two counts:
>
>1) Allocating a new hashtable for every start tag will slow down the
> parser a fair bit.
>
>2) I'd have no way to show which attributes were specified and which
> were defaulted (see below).
>
> > What's the boolean? I don't think the application author should
> > to have to deal with anything but the name and value of attributes.
I can imagine two relatively simple solutions.
1) Have your element evetn callback return a switch (boolean) to indicate
if attributes are wanted or not. Have seperate code for each options
(minor bloating in exchange for speed) and also have a finished-start-tag
event which is oly fired when the processor (event-handler) asked for
attributes. I am not sure of the utility of this since every SGML/XML
application I have written uses attributes on many of it's component elements.
2) Rather than using a hash table, use a recursive parse routine over the
attributes which unwinds to put the attribute names and values into a pair
of String[]'s. The recursive parse is so that you can use the stack as a
temporary holding place for the names and values until you know how many
there are. alternatively you could have one common pair of String[]s which
are used by every element event. You allocate them to a default size and
grow them if need be. You never really need to shrink them. The bloat to
your current code is more significant for this technique, but it increases
usability with little performance hit.
-derek
Derek E. Denny-Brown II || ddb@criinc.com
"Reality is that which, || Seattle, WA USA
when you stop believing in it, || WWW/SGML/HyTime/XML
doesn't go away." -- P. K. Dick || Java/Perl/Scheme/C/C++
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From elm at arbortext.com Mon Dec 15 19:08:53 1997
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun 7 16:59:32 2004
Subject: XML application
Message-ID: <3.0.32.19971215141136.00b2f390@village.doctools.com>
At 02:02 PM 12/15/97 -0500, Ingo Macherius wrote:
>
>> >
>
>> It is not valid XML, because of 3 things:
>More exact: It is neither well-formed nor valid XML ...
>
>> 1) the PI is missing.
>
>Ouch. This is not well-formed at all.
>Use as to the latest PR.
Note that the XML declaration PI isn't strictly required (though it's
recommended), so it's not a WF error for it to be missing. The other
comments you made were right on.
Eve
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rsiera at steunpunt.be Mon Dec 15 21:06:38 1997
From: rsiera at steunpunt.be (Robrecht Siera)
Date: Mon Jun 7 16:59:32 2004
Subject: 3 APIs ! GO FOR IT !!
In-Reply-To: <002201bd0828$a85d4200$0100007f@localhost>
References: <002201bd0828$a85d4200$0100007f@localhost>
Message-ID: <34989262.6770596@mailhost.innet.be>
There has been some discussion about the API again.
I like it, but ... please start now.
Peter Murray wrote
>We all have concerns. My concern is that there aren't enough people who are
>actively writing code and making it publicly available.
Yes, and I'm affraid I'm one of them.
I'm just a simple VB programmer in a non-profit organisation who
sees the value of using XML in data management, but can't produce
anything valuable if I don't have a API or DLL (oeps, I don't know
the difference I'm affraid. I'm dumb too you see :-) which is
supported by a larger part of the XML community.
I need something simple to start working with.
On the other hand I fully agree with the concern of Don Park
that the object-oriented design knowledge should find it's place in
the XML-future that he thinks, along with me, that it deserves.
*** I'm not quoting Don here and I can't express myself very well in
English. So bringing those two aspects together here might result
here in me missing the ball completely here with this sentence :-)
***
That is why I fully support Peter's proposal :
>There was never any suggestion it would be the only API.
>Let's assume there are 3 APIs.
> - simple
> - Object based
> - grove based
GO FOR IT. NOW.
Groetjes,
Robrecht Siera
------------------------------------------------
In Petto - Jeugddienst Informatie en Preventie
In Petto - National Youth Service for Youth Information and Prevention
Diksmuidelaan 50, 2600 Berchem, Belgium
tel +32/3/366.15.20, +32/3/366.45.45
fax +32/3/366.11.58
email: inpetto@cybco.be
www : http://www.cybco.be/inpetto
------------------------------------------------
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ser at javalab.uoregon.edu Mon Dec 15 22:11:38 1997
From: ser at javalab.uoregon.edu (Sean Russell)
Date: Mon Jun 7 16:59:32 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
In-Reply-To:
Message-ID: <199712152210.OAA28359@jersey.uoregon.edu>
On 14 Dec, Mark L. Fussell wrote:
> [Did you post these replies also? I don't yet see them from the list.]
Uh, probably not. Sorry.
> > If you can look at your code and see Objects, you should go ahead and extract
> > them out into smaller classes. OOP is going to benefit you and anyone
> > else who is going to be reading and modifying your code in six months.
>
> Although I would normally encourage this I think we are [or I am] on a
> slightly more esoteric "size-management" optimization topic for a very
> particular case of an ultra-small, fast-download, XML parser. In any
> other case, ignore what I said about file sizes and just design really
> nice clean OO software.
Yes, especially when we're dealing with web distributed packages.
The argument to not use jar files because of their non-portability (in
that not everybody supports them yet) has weight, but will again become
increasingly insignificant as platforms become Java 1.1 conformant.
Unless I am mistaken, jar file support is part of the required core Java
distribution.
--
|.. --------------------- Sean Russell ----------------------
<|> ser@javalab.uoregon.edu <-> http://jersey.uoregon.edu/ser
/|\ ------- [ Software Engineer ] --------
/| [ PGP info available from my web site ]
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 239 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971215/8ceec58e/attachment.bin
From tyler at infinet.com Tue Dec 16 04:41:14 1997
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 16:59:32 2004
Subject: XML interface to Oracle
References: <01bd0977$71a77500$1011e391@mhklaptop.bra01.icl.co.uk>
Message-ID: <341A12CA.F1DACB37@infinet.com>
Michael Kay wrote:
> -----Original Message-----
> From: Martin Bryan
> To: xml-dev@ic.ac.uk
> Date: 15 December 1997 16:14
> Subject: XML interface to Oracle
>
> >I have just been asked by the European Commission Statistics Office if
> >anyone has yet hooked an XML parser directly to an Oracle database. (They
> >need to transfer large amounts of data between heterogeneous databases ...
>
> Then why specify Oracle in the question?
>
> There are at least two obvious ways of representing a table in XML, the main
> decision
> is whether to represent data values as attributes or as content. No doubt,
> given the
> richness of XML, the experts could come up with many less obvious
> representations
> as well. Using any of the parsers I have looked at, any of these formats
> could be trivially
> translated into the kind of input formats (e.g. CSV files) that existing
> RDBMSs will accept.
> Or perhaps by "directly" you want to avoid the intermediate CSV file: well
> that's not difficult
> either but it's more work and I don't see much benefit in it.
>
For relational databases, this could be easily done (I would surmise) by simply
using the DTD to first generate the table structure of the database in which you
would map each child's foreign key to its parent container (which is represented
as a table). This would eliminate needing to map each DTD's structure manually.
Then you would just insert content into each table based upon its type as well as
the appropriate foreign key value. Java Blend maps Java Object's to a relational
database, so why not just map XML objects to a relational database. Mapping XML
DTD's I feel is far simpler, but the implementations are pretty much the same.
> But it all begs the question, what are you (or they) trying to achieve? Is
> there really a practical
> problem with transferring data between relational databases that XML can
> solve? People do
> it all the time with various flavours of CSV, so why bother?
>
I would not be surprised if Oracle and just about any other database vendor will
be supporting XML in the near future as doing so would require little
implementation time, while the benefits could be enormous.
Tyler
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cathy at bd748.pku.edu.cn Tue Dec 16 09:12:56 1997
From: cathy at bd748.pku.edu.cn (Chang Ming)
Date: Mon Jun 7 16:59:32 2004
Subject: Any XSL tool!
Message-ID: <3496458B.4C47@bd748.pku.edu.cn>
I think XSL is not off-topic in this list.
I would like to know if there is any work done on XSL ,something like a
interpreter.
The only known tool seems the converter from XSL to DSSSL.
Thanks
Chang Ming
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Tue Dec 16 10:38:21 1997
From: mecom-gmbh at mixx.de (james anderson)
Date: Mon Jun 7 16:59:32 2004
Subject: XML application
References: <8625656E.005D4668.00@Corpnotes.JCI.Com> <199712151804.TAA03231@sinfonix.rz.tu-clausthal.de>
Message-ID: <34965B31.4F97F345@mixx.de>
how about
i've yet to understand why, but isn't that the way it needs to be?
Ingo Macherius wrote:
> > >
>
> > It is not valid XML, because of 3 things:
> More exact: It is neither well-formed nor valid XML ...
>
> > 1) the PI is missing.
>
> Ouch. This is not well-formed at all.
> Use as to the latest PR.
>
> ++im
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971216/0022ebb0/attachment.htm
From Ingo.Macherius at TU-Clausthal.de Tue Dec 16 11:24:57 1997
From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius)
Date: Mon Jun 7 16:59:32 2004
Subject: XML application
In-Reply-To: <34965B31.4F97F345@mixx.de>
Message-ID: <199712161124.MAA21015@sinfonix.rz.tu-clausthal.de>
xml-dev folks,
I'm *very* sorry for the confusion my sloppy mail has caused. Thanks
to the individuals that noticed me (there was quite a lot of private
mail). A good sign, people care.
> > > 1) the PI is missing.
> >
> > Ouch. This is not well-formed at all.
> > Use as to the latest PR.
This is wrong again. *bummer*
> how about
>
This is wrong, too. "xml" must be lower-case.
> i've yet to understand why, but isn't that the way it needs to be?
Why ? Productions [24] and [25] in section 2.8 !
[24]? XMLDecl ::= ''
[25]? VersionInfo ::= S 'version' Eq
('"VersionNum"'|?"'VersionNum'")
So the minimal correct PI is:
++im
--
Snail : Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
mailto:Ingo.Macherius@tu-clausthal.de http://home.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Zappa)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Dec 16 12:14:17 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:32 2004
Subject: Common event-based parser API
Message-ID: <199712161212.HAA00516@unready.microstar.com>
Tim and I have taken some of the gritty details of our discussion
offline, and we have not yet managed to agree on how to return
character data; however, I do agree that attributes should be returned
somehow in the startElement() event rather than passed as separate
events, so I'd like to propose this:
startElement(XmlParser p, String elname, java.util.Dictionary attributes)
Note the use of the Dictionary abstract base class here. Hashtable is
derived from Dictionary, as is my Trie class that I released a
couple of years ago. There are a couple of advantages to this
approach:
1) It can be implemented efficiently, without requiring allocation of
a hash table: you could return anything derived from
java.util.Dictionary, including a simple sequential lookup in an
array of attributes.
2) It makes the users' work much easier, since they can just use
/**
* Handle the start of an element.
*/
startElement (XmlParser p, String elname, Dictionary attributes)
{
String id = (String)attributes.get("id");
String role = (String)attributes.get("role");
[...]
}
(instead of)
/**
* Handle the start of an element.
*/
startElement (XmlParser p, String elname, Attribute attributes[])
{
String id = null;
String role = null;
for (int i = 0; i < attributes.length; i++) {
if (attributes[i].getName().equals("id")) {
id = attributes[i].getValue();
} else if (attributes[i].getName().equals("role")) {
role = attributes[i].getValue();
}
}
[...]
}
Comments?
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Ingo.Macherius at TU-Clausthal.de Tue Dec 16 13:04:19 1997
From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius)
Date: Mon Jun 7 16:59:32 2004
Subject: Case in SystemLiterals
Message-ID: <199712161304.OAA26431@sinfonix.rz.tu-clausthal.de>
This is a question about ambigous mapping of URL used as
SystemLiterals:
Consider two files. A document "Test.xml":
and in the same directory a file called "Test.dtd".
XML-PR states: "[The SystemLiteral ...] is a URI, which may be used to
retrieve the entity". RFC1866 (section 3.1) declares file:-URL
as strictly system dependent, so what ?
Depending on processing context, the SystemLiteral may be
interpreted as either a relative URL of type "file:" (not necessarily
case sensitive) or type "http:" (case sensitive) !
So if interpreted as "file:" the above parses fine on Windows and
fails on Unix. If the same pair of files would have been served via
"http:", the error would have become obvious on both platforms (404
File Not Found).
Should XML try to overcome this (e.g. by requiring case-sensitivity in
file:-URL, despite of the underlaying OS), or is this out of scope ?
++im
--
Snail : Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
mailto:Ingo.Macherius@tu-clausthal.de http://home.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Zappa)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Tue Dec 16 13:06:59 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:32 2004
Subject: Common event-based parser API
In-Reply-To: <199712161212.HAA00516@unready.microstar.com>
Message-ID:
On Tue, 16 Dec 1997, David Megginson wrote:
> events, so I'd like to propose this:
>
> startElement(XmlParser p, String elname, java.util.Dictionary attributes)
Personally, it would save me allocating a collection to collect the ESIS
style pre-element attributes for Aelfred, so that would be nice. I am
ultimately trying to call into a very similar interface:
ObjectBuilder:
Object createObjectFromName_parameters(String recipeName,
Map parameters);
Where the parameters are just a bit more general in type than attributes.
A mild question would be: are you planning on being able to modify the
Dictionary after you have given it out, or can the client assume it is a
constant after the startElement?
> Note the use of the Dictionary abstract base class here. Hashtable is
> derived from Dictionary, as is my Trie class that I released a
> couple of years ago.
Just a heads-up in case anyone doesn't know. In JDK 1.2, Hashtable is
obsolete:
NOTE: This class is obsolete. New implementations should implement
the Map interface, rather than extending this class.
The problem with Hashtable is that it is an abstract class instead of an
interface so although you can have different implementations they are
still pretty restricted in their implementation approach. This has been a
known problem for a long-long time, and the 1.2 collections are finally
interface based. The basic 'get', 'put' operations for Map are the same
though, so it is just a type-ing problem.
Not that this should weigh very heavily if you are trying to support
1.0 and 1.1 based browsers, but I suspect the 1.2 release will be migrated
to pretty rapidly (in the Spring->Summer).
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Dec 16 13:54:38 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:32 2004
Subject: Common event-based parser API
In-Reply-To:
References: <199712161212.HAA00516@unready.microstar.com>
Message-ID: <199712161352.IAA00393@unready.microstar.com>
Mark L. Fussell writes:
> A mild question would be: are you planning on being able to modify the
> Dictionary after you have given it out, or can the client assume it is a
> constant after the startElement?
This is undecided right now. My current test implementation gives you
a new Dictionary, so you can do what you want with it, but the common
interface might impose more restrictions (Tim almost certainly won't
want to allocate a new object each time).
> > Note the use of the Dictionary abstract base class here. Hashtable is
> > derived from Dictionary, as is my Trie class that I released a
> > couple of years ago.
>
> Just a heads-up in case anyone doesn't know. In JDK 1.2, Hashtable is
> obsolete:
> NOTE: This class is obsolete. New implementations should implement
> the Map interface, rather than extending this class.
>
> The problem with Hashtable is that it is an abstract class instead of an
> interface so although you can have different implementations they are
> still pretty restricted in their implementation approach. This has been a
> known problem for a long-long time, and the 1.2 collections are finally
> interface based. The basic 'get', 'put' operations for Map are the same
> though, so it is just a type-ing problem.
(I have substituted "Dictionary" for "Hashtable" in the above
caution).
Thanks for the warning -- I have always been annoyed by the fact that
java.util.Dictionary was an abstract base class instead of an
interface, so I am happy to see that they are finally getting around
to changing it.
That makes agreeing on a common event-based interface a little more
difficult, though.
> Not that this should weigh very heavily if you are trying to support
> 1.0 and 1.1 based browsers, but I suspect the 1.2 release will be migrated
> to pretty rapidly (in the Spring->Summer).
Yes, but many users haven't even upgraded to Netscape 3 yet, so it
will be years before we can count on a general user base that will be
able to handle this (including a local copy of Map.class is a clumsy
work-around, and it could sabotage other parts of an applet or
application).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Tue Dec 16 14:13:30 1997
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 16:59:32 2004
Subject: Character references
Message-ID: <01bd0a2c$aee00e40$1e09e391@mhklaptop.bra01.icl.co.uk>
The latest XML spec gives a bit more detail on the rules
for a character reference, but it is still incomplete.
In particular, are there any rules on the use of leading
zero digits?
At present MSXML seems to permit "ª" but to
reject "ª" (not to mention "ª" and
"ª").
I can't see any justification for this in the spec - what is
the authors' intention?
Mike Kay, ICL
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Tue Dec 16 15:00:02 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:59:32 2004
Subject: Case in SystemLiterals
In-Reply-To: <199712161304.OAA26431@sinfonix.rz.tu-clausthal.de>
(Ingo.Macherius@TU-Clausthal.de)
Message-ID: <199712161504.KAA16395@geode.ora.com>
[Ingo Macherius]
> So if interpreted as "file:" the above parses fine on Windows and
> fails on Unix. If the same pair of files would have been served via
> "http:", the error would have become obvious on both platforms (404
> File Not Found).
Are you sure about that? I think that Windows servers will attempt to
find a non-case-matching file. But in any case:
> Should XML try to overcome this (e.g. by requiring case-sensitivity
> in file:-URL, despite of the underlaying OS), or is this out of
> scope ?
It's out of scope. Case screw-ups are only one way you can break your
URLs; I don't see why XML should try and define resolution for this
one case, *especially* since the rest of the language is case-
sensitive. A very simple rule: "Broken URLs in XML documents are
broken." I think we can all live with that one.
-Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Tue Dec 16 15:05:27 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:32 2004
Subject: Character references
Message-ID: <3.0.32.19971216070244.00a88b74@pop.intergate.bc.ca>
At 02:12 PM 16/12/97 -0000, Michael Kay wrote:
>At present MSXML seems to permit "ª" but to
>reject "ª" (not to mention "ª" and
>"ª").
>
>I can't see any justification for this in the spec - what is
>the authors' intention?
I think the spec is clear, and no authors-intention clarifications are
relevant. To quote:
If the character reference begins with "", the digits and letters
up to the terminating ";" provide a hexadecimal representation of the
character's value in ISO/IEC 10646
No reasonable interpretation of this could rule out any of ª or
ª or ª - I'm sure msxml will get around to fixing this. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Tue Dec 16 15:05:37 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:32 2004
Subject: Case in SystemLiterals
Message-ID: <3.0.32.19971216065822.00a88b74@pop.intergate.bc.ca>
At 02:02 PM 16/12/97 +0000, Ingo Macherius wrote:
>Should XML try to overcome this (e.g. by requiring case-sensitivity in
>file:-URL, despite of the underlaying OS), or is this out of scope ?
Out of scope, I'd say. file: URL's are a monster pain in the ass,
especialy given Microsoft Operating System ideas about case-mapping...
I suspect that file: urls will only ever be useful in a local authoring
environment... if you want to ship a multi-part chunk, relative URLs
are way safer and have a chance of working correctly if you can just
get to the document entity. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Tue Dec 16 16:29:16 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:33 2004
Subject: CharData
References: <199712151508.KAA12097@geode.ora.com>
Message-ID: <3496A1DC.E5A6B196@technologist.com>
Chris Maden wrote:
> You are correct; ']]>' is forbidden in element content, as it should
> be. This is cruft from SGML; the msc/mdc combination (marked section
> close = ']]', markup declaration close = '>') is always recognized as
> a delimiter (see Figure 3, ISO 8879). As a result, XML mandates that
> this combination always be escaped using ']]>' unless it actually
> closes a marked section.
The reason for this is not just SGML compatibility, it is robustness. A
floating MDC is almost certainly an error in the document. If I
accidently end a CDATA marked section twice, I want to *know*.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Tue Dec 16 16:29:51 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:33 2004
Subject: external dtd subset content
References: <3.0.1.16.19971213001510.30077718@pop3.demon.co.uk>
Message-ID: <3496A793.47965960@technologist.com>
Peter Murray-Rust wrote:
> Good point! I have never really understood why it's necessary to have
> consistency between the root element and the doctypedeclName.
The docTypeDeclName exists specifically to state the root element type.
>
>
> This is a para
>
> is invalid.
Think about what the code above means in *HTML*:
This is a para
Now I suspect you understand why the docTypeDeclName exists and in XML
must always be the same as the type of the explicitly tagged root
element. Since XML has no minimization, it is redundant of course.
WebSGML allows you to use the keyword #IMPLIED (but XML does not) to
remove that redundancy.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Tue Dec 16 17:00:28 1997
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 16:59:33 2004
Subject: CharData
Message-ID: <01bd0a43$f276cb00$1e09e391@mhklaptop.bra01.icl.co.uk>
-----Original Message-----
From: Paul Prescod
To: xml-dev@ic.ac.uk
Date: 16 December 1997 16:30
Subject: Re: CharData
>Chris Maden wrote:
>> You are correct; ']]>' is forbidden in element content, as it should
>> be. >
>The reason for this is not just SGML compatibility, it is robustness. A
>floating MDC is almost certainly an error in the document.
I don't think you're being particularly user-friendly here. The most
likely reason for a floating "]]>" is that the software-writer was
lazy and forgot to escape it.
If we assume that most XML will be software-generated, then it
appears the only purpose of CDATA is to allow the software-writer
to copy in a chunk of text without bothering to convert the <'s and
&'s to < and &. But since he still has to check for any "]]>"
in the text, and has no clear course of action if he finds one,
it's not at all clear that it achieves this aim. As one who is
currently writing software to generate XML, I have no intention
of deliberately generating CDATA, and the need to avoid
doing so by mistake is a complication I could do without.
In practice I will just get round it by escaping all my >'s
as well as my <'s.
Mike Kay, ICL
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Tue Dec 16 17:21:26 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:59:33 2004
Subject: CharData
In-Reply-To: <01bd0a43$f276cb00$1e09e391@mhklaptop.bra01.icl.co.uk>
(M.H.Kay@eng.icl.co.uk)
Message-ID: <199712161725.MAA20422@geode.ora.com>
[Paul Prescod]
> Chris Maden wrote:
> > You are correct; ']]>' is forbidden in element content, as it
> > should be.
>
> The reason for this is not just SGML compatibility, it is
> robustness. A floating MDC is almost certainly an error in the
> document.
That's true; but I find it easier to argue from standards than from
philosophy, especially in cases like this where others will disagree:
[Michael Kay]
> I don't think you're being particularly user-friendly here. The most
> likely reason for a floating "]]>" is that the software-writer was
> lazy and forgot to escape it.
Then the writer *must* be warned. See _The SGML FAQ Book_, question
2.9; the potential messy ramifications of stray marked section end
delimiters are many, and the potential damage quite high.
-Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From liamquin at interlog.com Tue Dec 16 17:43:31 1997
From: liamquin at interlog.com (Liam Quin)
Date: Mon Jun 7 16:59:33 2004
Subject: CharData and escaping ]]>
In-Reply-To: <01bd0a43$f276cb00$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID:
Michael Kay at ICL wrote:
> If we assume that most XML will be software-generated, then it
> appears the only purpose of CDATA is to allow the software-writer
> to copy in a chunk of text without bothering to convert the <'s and
> &'s to < and &. But since he still has to check for any "]]>"
> in the text, and has no clear course of action if he finds one,
> it's not at all clear that it achieves this aim.
I'd say firstly that if you are writing software that works a character
at a time, it is generally easier to avoid CDATA marked sections and to
escape every < and & directly. If you use a marked section, you need up
to 3 characters of lookahead, and you need to make sure that all of the
following sequences pass through unscathed:
]]]]]]]]]]]
]>
a]b]]c]]]d
Secondly, the simplest way to escape ]]> is to insert a Unicode
zero-width non-printing non-combining space between the ] and the >.
This might be a pain for some applications, though.
> In practice I will just get round it by escaping all my >'s
> as well as my <'s.
That's what I would do too.
Lee
--
Liam Quin -- the barefoot typographer -- Toronto
lq-text: freely available Unix text retrieval
IRC: Learn about XML/SGML/XSL/XLL/DSSSL on irc.dragonnet.org in #xml
email address: l i a m q u i n, at host: i n t e r l o g dot c o m
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ser at javalab.uoregon.edu Tue Dec 16 18:01:58 1997
From: ser at javalab.uoregon.edu (Sean Russell)
Date: Mon Jun 7 16:59:33 2004
Subject: Any XSL tool!
References: <3496458B.4C47@bd748.pku.edu.cn>
Message-ID: <3496C3A7.7438332A@javalab.uoregon.edu>
Chang Ming wrote:
> I think XSL is not off-topic in this list.
>
> I would like to know if there is any work done on XSL ,something like a
> interpreter.
> The only known tool seems the converter from XSL to DSSSL.
Which converter are you talking about? Have you looked at docproc?
http://javalab.uoregon.edu/ser/software/docproc_2/docs/index.xml
I've been having a nighmarish time with the Java Web Server, for some reason,
which doesn't want to stay running for more than 24 hours at a time. If the
above link is down when you try it, please try back later. I have to go in and
restart the server every once in a while.
--- SER
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971216/0cd7bc7b/attachment.htm
From mecom-gmbh at mixx.de Tue Dec 16 19:02:53 1997
From: mecom-gmbh at mixx.de (james anderson)
Date: Mon Jun 7 16:59:33 2004
Subject: Message Length vs Processing Speed
References: <01BD0190.2C9A9AD0@gren-exch-1.kpscal.org>
Message-ID: <3496D174.A62E26A0@mixx.de>
i'm not sure whether it falls within the lists scope to pose questions of the
sort "what's the rational for this ?", but i hope so. i'm not sure where else
would be more appropriate and as someone implementing a parser, when i discover
stipulations which are non-intuitive i'm at least curious about the rational for
some of the stipulated "conforming parser" behaviour and welcome the opportunity
to at least ask why things are the way they are.
today's question concerns dtd compactness
Dolin,Robert H wrote:
> Greetings XML-DEV list,
>
> We've been working on an SGML (?XML) syntax for HL7 messages,...
i've read through the related hl7sgm3 document and discovered one concern which
we share.
among other things the document discusses the whether attribute definitions
should be repeated as necessary or should be attached to an intermediate "type"
element.
where sgml permitted something like
[53x] AttlistDecl ::= '
[53y] Nameopt ::= Name (S '|' S Name)*
xml allows only
[53] AttlistDecl ::= '
which forces one, as noted below (i trust the excerpt is, for discussion
purposes, permitted.) to introduce extraenous elements.
when i consider the relative effort of getting a parser to accept a name list
and coding applications to treat the interposed elements as transparent, i don't
undertstand why this sgml feature was not carried over?
OPTION 1
OPTION 2
COMMENTS
? Example DTDs are currently using Option 1.
ISSUES
? Option 1:
? Able to express more Required Value constraints in DTD.
? Easier to parse?
? Option 2:
? Define HL7 V2.3 data types just once, for all message DTDs. May be easier to
maintain DTDs as data type definitions change.
? Recieving application can determine the data type of previously unknown data
elements.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Tue Dec 16 19:24:54 1997
From: mecom-gmbh at mixx.de (james anderson)
Date: Mon Jun 7 16:59:34 2004
Subject: XML syntax (was Re: external subset syntax)
References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se>
Message-ID: <3496D6AB.E55D721@mixx.de>
greetings,
perhaps it's time for a new role to complement the mcsgs, namely the npw - or
niggeling parser writer - not rebelling, just niggeling.
i admit to that fault.
my problem is, whenever i come to a point in the proposed recommendation at
which a parser is required to report an error and "must not continue normal
processing" even though the result which the stream would denote would be
sufficiently unambiguous if allowed, then i feel compelled to ask, "why does one
have to exclude this"?
which does not mean "in which production does the standard exclude or prescribe
it", but rather why does the standard exclude or prescribe it. what is the
useful purpose? particularly when excluding it makes the parser more complex and
the document encoding more exacting.
more than likely, when i've followed discussions of similar questions, the
design goal #3 gets hoisted like a commandment: "XML shall be compatible with
SGML". as a npw i tend to adhere more to #'s 1,4, 6, and 9: it should be easy to
generate, easy to program, and easy to read. SGML processors are already pretty
complex, so an argument to increase the complexity of XML in strictly order to
keep SGML processors simpler is difficult to accept on logical terms. (i know
i'm being naive here, and i'm ignoring the past, but i would wager that the
future is going to bear me out...)
the simplest thing would have been a document form which distinguished inline
definitions, external references (ie XLL built-in), content, and (maybe) a
declaration (autorecognition of encoding being the criteria on the latter). it
is true, that that is all there, but the standard requires at least twice as
many syntactic forms as are necessary. so despite having read mr murray-rust's
note on background to the list itself (re: XML-DEV (was Re: YAXPAPI)) which
gave me some sense of the effort which has gone into the proposed
recommendation, the distance between the simple form of the denoted data and the
complexity of the syntactic form often leads me to ask "why?"
one such example concerns the external subset, xml declaration, doctype
declaration, and text declaration. in particular, the productions
[24] XMLDecl ::= ''
[29] doctypedecl ::= ''
[78] TextDecl ::= ''
[80] ExtPE ::= TextDecl? extSubset
i observe that, while one can well label the XMLDecl and TextDecl productions
differently, lexically speaking they are not disjoint, and practically speaking
there is no difference between their situation and that concerning the presence
of a doctype form at a location analogous to that of the textdecl. yet one is
"standard" and the other is "nonsense". not to a niggeling parser writer. from
the stream content, the permitted case (almost) appears (by analogy to the
remarks below) as one xml document within another. the other thing which is
disconcerting is that the standard goes to great length to, on one hand,
specify that the presence of an xml document may be introduced by a form with
the (not)PI keyword 'xml' (all lower case only) but on the other hand engenders
lexical ambiguity where it does not introduce a distinct keyword for the
distinctly different purpose and context of specifying the encoding of the
external dtd subset. why?
Per-Ake Ling wrote:
> > From jjc@jclark.com Mon Dec 15 11:59:21 1997
...
> > It is a requirement that the external subset *not* begin with a document
> > type declaration.
> >
> If it were permitted, it would mean that there is a doctype declaration
> within a doctype declaration, which is clearly nonsense. It is a common
> misunderstanding that DTD means "document type declaration" instead of
> "document type definition".
>
> Per-?ke
> --
(as an aside, i didn't - and still don't - see that as, in itself, a sufficient
explanation, since the case would comprise two instances of a "document type
declaration": one in the xml document and the other in the prolog of the
external portion of the "document type definition", which was referred to from
the first, but is not contained in the first, and which serves to constrain the
root element if so desired.)
another example is the MDC (']]>') exclusion in CharData which means that one
needs a state machine to scan character data. why?
another example is that of [24], in itself, where the npw believes his point (in
a previous posting) was misunderstood, and can only repeat the question
why is a PI-close specified to be '?>' and not '>', which would be
easier, or ('?>' | '>'), which would be robuster and observes (wrt to 'XML'
itself) that the standard, cf #6 with irony, engenders an encoding where of the
four obvious humanly legible encodings (that is, neglecting 'xMl' et.al.:
('' | '>')) only one is legitimized. why?
if the precision of an encoding depends so much on uniqueness, then why does one
start out with such a level of lexical complexity in the first place, only to
then exclude much of it as 'malformed'? all you need is <, >, ', & and / (if
you allow element recursion) - and even the distinction between < and > is more
for the eye than anything else.
Ingo Macherius wrote:
> ...
> > how about
> >
>
> This is wrong, too. "xml" must be lower-case.
>
> > i've yet to understand why, but isn't that the way it needs to be?
>
> Why ? Productions [24] and [25] in section 2.8 !
>
> [24]? XMLDecl ::= ''
> [25]? VersionInfo ::= S 'version' Eq
> ('"VersionNum"'|?"'VersionNum'")
>
> So the minimal correct PI is:
>
> ++im
> --
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Tue Dec 16 19:48:29 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:34 2004
Subject: XML syntax (was Re: external subset syntax)
In-Reply-To: <3496D6AB.E55D721@mixx.de>
References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se>
<3496D6AB.E55D721@mixx.de>
Message-ID: <199712161946.OAA05109@unready.microstar.com>
james anderson writes:
> my problem is, whenever i come to a point in the proposed
> recommendation at which a parser is required to report an error and
> "must not continue normal processing" even though the result which
> the stream would denote would be sufficiently unambiguous if
> allowed, then i feel compelled to ask, "why does one have to
> exclude this"?
[...]
> more than likely, when i've followed discussions of similar
> questions, the design goal #3 gets hoisted like a commandment: "XML
> shall be compatible with SGML".
No, it's not SGML's fault, at least not this time. Conforming SGML
parsers are allowed to continue processing if they want to, and are
even allowed not to report errors at all (as long as they don't claim
to be "validating parsers"). XML has gone way beyond any SGML
requirements with this one.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Tue Dec 16 20:05:37 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:34 2004
Subject: Draconian error handling (was Re: XML syntax )
Message-ID: <199712162006.HAA07141@jawa.chilli.net.au>
From: james anderson
>my problem is, whenever i come to a point in the proposed recommendation at
>which a parser is required to report an error and "must not continue normal
>processing" even though the result which the stream would denote would be
>sufficiently unambiguous if allowed, then i feel compelled to ask, "why does one
>have to exclude this"?
The requirement for "Draconian error handling" actually came from the HTML
side not the SGML people. The reason was to ensure data integrity:
if a document was compromised it should be clearly marked as such when
passed to the application. Under no circumstances should something that
is not well-formed be passed to an application as if it were.
This is because XML is intended for more than just typed-text applications.
It was thought that allowing all sorts of transparent error-recovery
mechanisms would just reintroduce tag minimization in through the back
door. Then people would start to rely on it, or at least write their
XML to suit the error-recovery of particular parsers, and we would
be back in HTML-land, where the effective grammar is too loose to
be reliable.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From crism at ora.com Tue Dec 16 20:26:02 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun 7 16:59:34 2004
Subject: XML syntax (was Re: external subset syntax)
In-Reply-To: <3496D6AB.E55D721@mixx.de> (message from james anderson on Tue,
16 Dec 1997 20:31:30 +0100)
Message-ID: <199712162030.PAA25069@geode.ora.com>
[James Anderson]
> my problem is, whenever i come to a point in the proposed
> recommendation at which a parser is required to report an error and
> "must not continue normal processing" even though the result which
> the stream would denote would be sufficiently unambiguous if
> allowed, then i feel compelled to ask, "why does one have to exclude
> this"? which does not mean "in which production does the standard
> exclude or prescribe it", but rather why does the standard exclude
> or prescribe it. what is the useful purpose? particularly when
> excluding it makes the parser more complex and the document encoding
> more exacting.
I am not particularly fond of this rule. However, I can explain its
justification. The WG made this decision at the request of both
Microsoft and Netscape. In the HTML arena, both companies spend a
fair amount of their time reverse engineering the other's error-
recovery behavior, since Web page authors "validate" by seeing if it
looks OK in their browser of choice. By requiring parsers to fail on
non-conformant documents, there is no chance that a user can think
erroneous data is acceptable in a conforming browser; if a browser
accepts the data, its opponent can level the charge that it is non-
conforming.
> more than likely, when i've followed discussions of similar
> questions, the design goal #3 gets hoisted like a commandment: "XML
> shall be compatible with SGML". as a npw i tend to adhere more to
> #'s 1,4, 6, and 9: it should be easy to generate, easy to program,
> and easy to read. SGML processors are already pretty complex, so an
> argument to increase the complexity of XML in strictly order to keep
> SGML processors simpler is difficult to accept on logical terms. (i
> know i'm being naive here, and i'm ignoring the past, but i would
> wager that the future is going to bear me out...)
Rule 3 is critical for two reasons: (a) technologically, it allows
easier application of existing SGML technology to the new problem
space, and (b) politically, it encourages XML's adoption in rigorously
standards-based arenas, like the Military-Industrial Complex.
> the simplest thing would have been a document form which
> distinguished inline definitions, external references (ie XLL
> built-in), content, and (maybe) a declaration (autorecognition of
> encoding being the criteria on the latter). it is true, that that is
> all there, but the standard requires at least twice as many
> syntactic forms as are necessary. so despite having read mr
> murray-rust's note on background to the list itself (re: XML-DEV
> (was Re: YAXPAPI)) which gave me some sense of the effort which has
> gone into the proposed recommendation, the distance between the
> simple form of the denoted data and the complexity of the syntactic
> form often leads me to ask "why?"
Many people have had discussions of the form "a markup language might
...", in which a clean, new theoretical language is designed. These
discussions are useful and interesting, but completely outside of the
scope of XML, whose charter was to enable the transfer of SGML over
the Web.
If you want to design such a language, and are successful in
encouraging its adoption, many current SGMLheads would be very
grateful. We use SGML because it is the best existing tool, not
because it is the best possible.
> (as an aside, i didn't - and still don't - see that as, in itself, a
> sufficient explanation, since the case would comprise two instances
> of a "document type declaration": one in the xml document and the
> other in the prolog of the external portion of the "document type
> definition", which was referred to from the first, but is not
> contained in the first, and which serves to constrain the root
> element if so desired.)
And indeed, some older SGML software produces documents like this.
This is a purely backwards-compatibility issue, from one point of
view; disambiguation rules could easily be developed, but then that
language would not be SGML. See the XML charter.
> another example is the MDC (']]>') exclusion in CharData which means
> that one needs a state machine to scan character data. why?
This is because floating msc/mdc combos can get you later in a big
way. See _The SGML FAQ Book_, and trust us on this. I'd recommend
avoiding marked sections in the document instance altogether, but if
you don't, *ALWAYS* escape any occurrence of ']]>' in data.
> another example is that of [24], in itself, where the npw believes
> his point (in a previous posting) was misunderstood, and can only
> repeat the question why is a PI-close specified to be '?>'
> and not '>', which would be easier, or ('?>' | '>'), which would be
> robuster and observes (wrt to 'XML' itself) that the standard, cf #6
> with irony, engenders an encoding where of the four obvious humanly
> legible encodings (that is, neglecting 'xMl' et.al.: (' '' | '>')) only one is legitimized. why? if the
> precision of an encoding depends so much on uniqueness, then why
> does one start out with such a level of lexical complexity in the
> first place, only to then exclude much of it as 'malformed'? all you
> need is <, >, ', & and / (if you allow element recursion) - and even
> the distinction between < and > is more for the eye than anything
> else.
The pic *was* '>' in SGML. It was explicitly changed to '?>' for two
reasons. One, there is no standardized way of escaping characters in
a PI, so with pic='>' there's no way to put a greater-than in a
processing instruction. '2)>' is illegal. Yes, you
can use application conventions, but are authors going to buy
''? So, since '?>' is much less likely to occur
*within* PIs, it makes a safer delimiter. Secondly, the symmetry is
appealing, especially for new authors. Have you never seen
used as a comment on Web pages? The ... ?> syntax is more
intuitive.
Take the time to search the SGML WG archives
(), which go
through July of this year and are open to the public, and the XML SIG
archives (address unknown). Searching them will lead to answers to
many of these questions. See also the XML FAQ at
.
-Chris
--
http://www.oreilly.com/people/staff/crism/ +1.617.499.7487
90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Dec 16 21:40:23 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:34 2004
Subject: Any XSL tool!
In-Reply-To: <3496458B.4C47@bd748.pku.edu.cn>
Message-ID: <3.0.1.16.19971216223321.0fe7201a@pop3.demon.co.uk>
At 17:10 16/12/97 +0800, Chang Ming wrote:
Many thanks Chang Ming.
>I think XSL is not off-topic in this list.
It is absolutely appropriate. However:
- XSL is at a very early stage. It's likely to undergo extensive changes
- XSL is being discussed in the W3C process at present. Unfortunately for
you the discussion cannot be made public except by the WG.
- There is a discussion group for DSSSL (forget URL - at Mulberry? -
someone will post this I'm sure). So that *may* be useful as well.
>
>I would like to know if there is any work done on XSL ,something like a
>interpreter.
>The only known tool seems the converter from XSL to DSSSL.
This is the primary (and for many people the only) motivation for XSL (i.e.
the precise and flexibly rendering of XML documents in 2D format).
This is a very good question. I cannot answer for the WG, of course. All I
can say is that my applications are not always textual and that I would
love to have transformation facilities in an XSL-like language. So, always
referring to the public spec of course, I would argue for the inclusion of
additional ELEMENTs that could provide this. I'll probably experiment in
JUMBO - (JUMBO doesn't do much formatting as elephants can't do joined up
writing.)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Dec 16 21:47:57 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:34 2004
Subject: XML syntax (was Re: external subset syntax)
In-Reply-To: <3496D6AB.E55D721@mixx.de>
References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se>
Message-ID: <3.0.1.16.19971216220351.0fe77372@pop3.demon.co.uk>
At 20:31 16/12/97 +0100, james anderson wrote:
[... lots of "why?" appeals about XML ...]
james - and others. i have enormous sympathy with your position. i have a
somewhat unique role being an SGML-near-illiterate and yet being part of
the SIG (was WG) process. i can't divulge any of the last 4 months material
- it's confidential; the earlier stuff is archived.
however i think it's allowable to say that enormous care has gone into this
process. for example the case-sensitivity involved a huge amount of
discussion with expert knowledge of many non-anglophone countries.
similarly the DTD-stuff has had a huge amount of discussion. my own naive
questioning about whitespace generated a large amount of material.
what i have come to accept from a year on the SIG (was WG) is the precision
of the process and the need for discipline. i - as do many SIG members -
raise things they don't feel happy about, but when they are decided agree
to try to make them work.
my own personal concerns are littered publicly on XML-DEV :-). like you i
find the different syntaxes very tedious because JUMBO has to read and
parse both. of course i really enjoy writing parsers especially past
midnight, and the best bit is tracking down the bugs, but others are
different. so i sigh, and hack it. fwiw i translate all the non-XML syntax
into XML internally because XML is superb to work with. (if anyone hasn't
discovered that yet, it's because they don't have a full xml system.) xml
is incredible. i can do things with JUMBO in a few hours that would have
taken months before.
it is very tough to have to ask you to take this on trust - i understand.
at least i have had my say - or shout - and accept that i *have* shouted
where necessary. *everything* has been listened to - not a sparrow chirps
without the WG taking it on board (or some other poetic phrase - i probably
misquote).
it's important to realise that xml is part of a historical process. it was
by no means certain that by 1997q4 we should have xml hyped throughout the
world. it wouldn't have happened without a *huge* effort from the sgml
community and we have them to thank. if, as a result, we have
sgml-compatibility in xml that is an acceptable price for me.
what i *hope* is that as a community we make the job of writing parsers as
easy as possible. to do this we need APIs, communal libraries, test data,
etc. so james(a) should be able to borrow a DTD-parser *off the shelf* in
which case it's no big deal.
p.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From norbert at datachannel.com Tue Dec 16 21:48:46 1997
From: norbert at datachannel.com (Norbert Mikula)
Date: Mon Jun 7 16:59:34 2004
Subject: Any XSL tool!
References: <3.0.1.16.19971216223321.0fe7201a@pop3.demon.co.uk>
Message-ID: <3496F6FD.CE9FA249@datachannel.com>
Peter Murray-Rust wrote:
> - There is a discussion group for DSSSL (forget URL - at Mulberry? -
> someone will post this I'm sure). So that *may* be useful as well.
http://www.mulberrytech.com/dsssl/dssslist
it is.
--
Norbert H. Mikula
Sr. Online Information Architect
Norbert@DataChannel.com
DataChannel, 155 108th Avenue NE Ste 400, Bellevue, WA 98004
Phone: 425.462.1999 Fax: 425.637.1192 http://www.datachannel.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 428 bytes
Desc: Card for Norbert Mikula
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971216/8e99bb4f/vcard.vcf
From kvale at phy.ucsf.EDU Tue Dec 16 21:56:41 1997
From: kvale at phy.ucsf.EDU (Mark Kvale)
Date: Mon Jun 7 16:59:34 2004
Subject: Two typos in and a suggestion for the XML Proposal
Message-ID: <199712162156.NAA09886@phy.ucsf.EDU>
In updating my parser to the XML Proposal of 8 December, I find that
there seems to be two typos in the EBNF production rules:
1) The encoding declaration
[81] EncodingDecl ::= S 'encoding' Eq '"' EncName '"' | "'" EncName "'"
should have parentheses around the quoted names:
[81'] EncodingDecl ::= S 'encoding' Eq ('"'EncName '"' | "'" EncName "'")
2) The version info production
[25] VersionInfo ::= S 'version'
Eq ('"VersionNum"' | "'VersionNum'")
Here VersionNum is a nonterminal, not a literal string, and I think
what was meant was
[25'] VersionInfo ::= S 'version' Eq
('"' VersionNum '"' | "'" VersionNum "'")
I also have one suggestion for improvement of the proposal. The notation
type production is
[58] NotationType ::= 'NOTATION' S '(' S? Name
(S? '|' Name)* S? ')'
It allows for space before the the alternation '|' but not after. It
would be more symmetric to have
[58'] NotationType ::= 'NOTATION' S '(' S? Name
(S? '|' S? Name)* S? ')'
as in the enumeration production. Comments?
-Mark
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Dec 16 21:58:43 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:34 2004
Subject: external subset syntax
In-Reply-To: <199712151630.RAA27288@uabs19c27.eua.ericsson.se>
Message-ID: <3.0.1.16.19971216222503.21cfcd26@pop3.demon.co.uk>
At 17:30 15/12/97 +0100, Per-Ake Ling wrote:
[...]
>Not only that, it is an underexploited feature in SGML that this is the
>case. The only indication of real use of this feature in SGML comes from
>Eliot Kimber, but I believe that it would be even more valuable in XML.
>
I agree. I have only just discovered a week ago that something like:
Hello world!
could be an allowable use of SGML. If I had realised this earlier I could
have saved weeks of work in my CML DTDs.
I have to say that the SGML community is *not* good at marketing the
language - I don't *think* it deliberately keeps it opaque. It has proved
extremely difficult to get hold of good newbie information on (say)
architectural forms, HyTime, etc. Pleased to see some postings on XML-DEV
about it but I don't appreciate things normally till I see a piece of
software doing useful work :-) [No criticism to James Clark and those who
have implemented everything, but there aren't many household applications
yet.]
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Dec 16 22:03:05 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:34 2004
Subject: General comments on parsers
In-Reply-To: <199712151602.LAA13427@geode.ora.com>
References: <3.0.1.16.19971211024212.2d87bafc@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971216221636.5477ed8a@pop3.demon.co.uk>
At 11:02 15/12/97 -0500, Chris Maden wrote:
>[Peter Murray-Rust]
>
>Not the Chris you were looking for, but the DOM is standardizing
>access to XML DTDs, according to Lauren Wood's presentation at
>SGML/XML '97.
>
Chris - and Lauren - this is excellent news. As always, I shall
defer/convert to the official way of doing things when it comes. Any
formally published timescale for this (or any summary of the SGML/XML 97?)
On that last point - some of weren't able to get to the mtg - any feedback
on this list would be very much appreciated.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Dec 16 22:40:19 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:34 2004
Subject: LISTRIVIA (was Re: XML application)
In-Reply-To: <34965B31.4F97F345@mixx.de>
References: <8625656E.005D4668.00@Corpnotes.JCI.Com>
<199712151804.TAA03231@sinfonix.rz.tu-clausthal.de>
Message-ID: <3.0.1.16.19971216231631.0fe7f528@pop3.demon.co.uk>
At 11:44 16/12/97 +0100, [... someone ...] wrote:
[... stuff clipped to avoid identification ...]
AND
an unnecessary mail attachment which appeared to duplicate the posting and
for which I have to pay for personally.
>
>Attachment Converted: "c:\eudora\attach\ReXMLapp.htm"
PLEASE can you avoid mail attachments. I have received private mail in
support of this view and I shall be very boring in pursuing this. It's not
difficult to avoid, and for most people it's a waste of time and money.
P.
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Tue Dec 16 22:55:39 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:34 2004
Subject: Common event-based parser API
In-Reply-To: <199712161212.HAA00516@unready.microstar.com>
Message-ID: <3.0.1.16.19971216231808.54779b84@pop3.demon.co.uk>
At 07:12 16/12/97 -0500, David Megginson wrote:
>Tim and I have taken some of the gritty details of our discussion
>offline, and we have not yet managed to agree on how to return
Wonderful! I wish you both well and the strength to persevere till it's
finally caught and bottled.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 17 01:16:04 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:34 2004
Subject: external subset syntax
In-Reply-To: <3.0.1.16.19971216222503.21cfcd26@pop3.demon.co.uk>
References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se>
<3.0.1.16.19971216222503.21cfcd26@pop3.demon.co.uk>
Message-ID: <199712170113.UAA00333@unready.microstar.com>
Peter Murray-Rust writes:
> At 17:30 15/12/97 +0100, Per-Ake Ling wrote:
> >Not only that, it is an underexploited feature in SGML that this is the
> >case. The only indication of real use of this feature in SGML comes from
> >Eliot Kimber, but I believe that it would be even more valuable in XML.
> >
> I agree. I have only just discovered a week ago that something like:
>
>
> Hello world!
>
>
> could be an allowable use of SGML. If I had realised this earlier I could
> have saved weeks of work in my CML DTDs.
Actually, this is by no means an underexploited technique in SGML; on
the contrary, it's standard practice in larger projects. Some
industry-standard DTDs like DocBook even repeat inclusion exceptions
on many different element types (book, chapter, section, glossary,
etc) so that any one of them can be used as the document element with
identical results.
Of course, each application (in the SGML sense) has its own rules.
For example,
Microstar
is valid SGML, but it is not correct HTML.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Dec 17 01:18:46 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:34 2004
Subject: XML syntax (was Re: external subset syntax)
References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se> <3.0.1.16.19971216220351.0fe77372@pop3.demon.co.uk>
Message-ID: <34971A9E.8BA8ED5F@technologist.com>
Peter Murray-Rust wrote:
>
> my own personal concerns are littered publicly on XML-DEV :-). like you i
> find the different syntaxes very tedious because JUMBO has to read and
> parse both. of course i really enjoy writing parsers especially past
> midnight, and the best bit is tracking down the bugs, but others are
> different. so i sigh, and hack it. fwiw i translate all the non-XML syntax
> into XML internally because XML is superb to work with.
I'm not sure what you mean. Do you really take (e.g.) an ELEMENT
declaration and map it to a textual string ? Or do you mean
that internally you represent it using the same data structure that you
use to represent XML elements.
If the latter, then you have just re-discovered the concept of a grove,
and have also discovered why you can standardize processing software and
data models without necessarily standardizing notation.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Dec 17 01:19:11 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:34 2004
Subject: CharData
References: <01bd0a43$f276cb00$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID: <34971C6A.CAE42A7C@technologist.com>
Michael Kay wrote:
> I don't think you're being particularly user-friendly here. The most
> likely reason for a floating "]]>" is that the software-writer was
> lazy and forgot to escape it.
>
> If we assume that most XML will be software-generated,
Sure, if we make that assumption then we can make lots of
"simplifications" to XML to make it harder to type and easier to
generate. Then it can be as popular to end users as TeX, PDF or
PostScript instead of as popular as HTML.
Personally, I am not willing to make that assumption and I'm glad that
the ERB did not. SGML would be just another forgotten technology if it
had made that assumptions.
Once we reject that assumption, the restriction on MDC is reasonable.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 17 02:41:37 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:34 2004
Subject: RFC: Simple XML Event-Based API for Java
Message-ID: <199712170238.VAA00829@unready.microstar.com>
After careful thought, I am fairly certain that I would be willing to
accept the following simple event-driven API for Ælfred. It meets
most of Tim Bray's concerns (I'd love to hear from Norbert Mikula and
from Chris Lovett), and it requires me to use only one extra class
file (the new XmlProcessor interface -- see below):
XmlApplication.java:
====================8<====================8<====================
import java.net.URL;
import java.util.Dictionary;
public interface XmlApplication {
public void
startDocument (XmlProcessor processor, String pubid, URL sysid);
public void
endDocument (XmlProcessor processor);
public void
startProlog (XmlProcessor processor);
public void
endProlog (XmlProcessor processor);
public void
startElement (XmlProcessor processor, String elname,
Dictionary attributes);
public void
endElement (XmlProcessor processor, String elname);
public void
characters (XmlProcessor processor, char ch[], int start, int length);
public void
processingInstruction (XmlProcessor processor, String target, String data);
public void
error (XmlProcessor processor, String message, URL url, int line);
}
// end of XmlApplication.java
====================8<====================8<====================
The processor itself could implement the following interface (very
Thread-oriented and Bean-like):
XmlProcessor.java:
====================8<====================8<====================
import java.lang.Runnable;
import java.net.URL;
public interface XmlProcessor extends Runnable {
public void setPublicId (String publicId);
public String getPublicId ();
public void setSystemId (URL systemId);
public URL getSystemId ();
public void setUserData (Object data);
public Object getUserData ();
public void addApplication (XmlApplication application);
public void removeApplication (XmlApplication application);
public void run();
}
// end of XmlProcessor.java
====================8<====================8<====================
I would lose Ælfred's resolveEntity() callback, the isSpecified
boolean for attributes and the simple String argument for character
data. Tim would lose the ability to return a boolean to stop the
parse (the user would have to throw an exception), and would have to
rename more of his callbacks.
On the positive side, this interface would let you hang more than one
application off the same parse, which could be very interesting. The
userData property also gives users a chance to pass extra information
to the processor easily, if they wish.
This new XmlProcessor interface (actually a parser, but I'm using the
XML spec's terminology here) does not preclude additional
functionality -- I'll keep all of Ælfred's DTD-query methods -- but
neither does it standardise that functionality.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From antony at n-space.com.au Wed Dec 17 03:14:53 1997
From: antony at n-space.com.au (Antony Blakey)
Date: Mon Jun 7 16:59:34 2004
Subject: RFC: Simple XML Event-Based API for Java
References: <199712170238.VAA00829@unready.microstar.com>
Message-ID: <34974325.6314AD85@n-space.com.au>
Skipped content of type multipart/mixed-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/x-pkcs7-signature
Size: 4250 bytes
Desc: S/MIME Cryptographic Signature
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971217/9ab0ef32/smime.bin
From donpark at quake.net Wed Dec 17 04:48:14 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:34 2004
Subject: Simple XML Event-Based API for Java
Message-ID: <000b01bd0aa6$753cce10$0100007f@localhost>
David,
Looks good in general. I have only a few comments and a couple of
questions.
I would rename XmlApplication and XmlProcessor to XmlConsumer and
XmlProducer. It is just matter of current Java API tradition.
Additionally, I would write a helper class XmlFilter.
Producer/Filter/Consumer arrangement is a well known design pattern and it
would be confusing to rename it.
I would rename startProlog, endProlog, and processingInstruction to
something more friendly. To most beginner XML programmers, they wouldn't
know what PI is nor would they care. I would group all "abnormal" tags
(with the exception of comments) as special elements and have a separate
pair of start/end for them. I would add a separate method for comments
text. Renaming characters() to content() might make it more clear to
programmers about what the method does.
I would also rename xetPublicId and xetSystemId to xetPublicID and
xetSystemID. I usually change acronyms when they are used as prefix (XML to
Xml) but not when they are used to postfix a name. It tend to look more
legible.
Would entities be resolved by XmlProcessor er, XmlProducers?
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Wed Dec 17 05:20:07 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:34 2004
Subject: RFC: Simple XML Event-Based API for Java
References: <199712170238.VAA00829@unready.microstar.com>
Message-ID: <34975FEE.1A5406F4@jclark.com>
David Megginson wrote:
> After careful thought, I am fairly certain that I would be willing to
> accept the following simple event-driven API for ?lfred.
I don't see the point of the XmlProcessor first argument. What's wrong
with having the implementation of XmlApplication store the XmlProcessor
in the member variable? (This is what SP typically does.)
> public void
> startDocument (XmlProcessor processor, String pubid, URL sysid);
What do the pubid and sysid arguments represent? The document entity?
> public void
> startProlog (XmlProcessor processor);
>
> public void
> endProlog (XmlProcessor processor);
Why do you need startProlog() and endProlog()?
> public void
> startElement (XmlProcessor processor, String elname,
> Dictionary attributes);
>
> public void
> endElement (XmlProcessor processor, String elname);
>
> public void
> characters (XmlProcessor processor, char ch[], int start, int length);
>
> public void
> processingInstruction (XmlProcessor processor, String target, String data);
The one major omission I see here is absense of information about the
location (URL, byte offset, line number etc) of the events. It would be
very nice to be able to implement validation as just as an
XmlApplication (that wraps around another XmlApp). In others to to run
without validation you would use:
processor.run(new MyXmlApplication());
and to run with validation you would use
processor.run (new ValidateXmlApplication(new MyXmlApplication));
In order to make this work the application needs to be able to get
information about the location of start/end tags and of data. This is
also useful for all kinds of application-specific validation.
This could be done by having the app ask the processor for the location
of the last event in some non-standardized way, but that's kind of
kludgy. On the other hand, maybe this is just too fancy for a
"simple" API.
> public void
> error (XmlProcessor processor, String message, URL url, int line);
I don't think having simply "String message" is going to
internationalize well. It's also desirable to know exactly what
character number/column number the error occurred at. Also XML
distinguishes fatal errors (which the parser must not continue
processing after) from other errors. On the whole I would be inclined
to handle fatal errors as an exception, and not try to deal with
non-fatal errors at all in this simple interface.
> On the positive side, this interface would let you hang more than one
> application off the same parse, which could be very interesting.
I don't think this is a good idea. It adds complexity and it's likely
to impose a performance cost, but it doesn't buy you anything, because
you can achieve that functionality with a MultipleXmlApplication class
that implements the XmlApplication interface, and provides
addApplication and removeApplication methods, and then forwards each
event to the applications that have been added to it.
> The
> userData property also gives users a chance to pass extra information
> to the processor easily, if they wish.
Surely there are cleaner ways to do this sort of thing.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 07:09:48 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:34 2004
Subject: XML syntax (was Re: external subset syntax)
In-Reply-To: <34971A9E.8BA8ED5F@technologist.com>
References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se>
<3.0.1.16.19971216220351.0fe77372@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971217080357.18b7149e@pop3.demon.co.uk>
At 19:19 16/12/97 -0500, Paul Prescod wrote:
>Peter Murray-Rust wrote:
>>
>> my own personal concerns are littered publicly on XML-DEV :-). like you i
>> find the different syntaxes very tedious because JUMBO has to read and
>> parse both. of course i really enjoy writing parsers especially past
>> midnight, and the best bit is tracking down the bugs, but others are
>> different. so i sigh, and hack it. fwiw i translate all the non-XML syntax
>> into XML internally because XML is superb to work with.
>
>I'm not sure what you mean. Do you really take (e.g.) an ELEMENT
>declaration and map it to a textual string ? Or do you mean
Just once - i.e.
which I use in the "DTD" for the DTD. But this is a unique case.
>that internally you represent it using the same data structure that you
>use to represent XML elements.
Yes! Yes!!
>
>If the latter, then you have just re-discovered the concept of a grove,
>and have also discovered why you can standardize processing software and
>data models without necessarily standardizing notation.
Wow! This is a glorious day! I have been told I am using (very simple)
groves *and* (very simple architectural forms) without realising! "Good
Heavens! For more than [two years] I have been speaking [grove] without
knowing it".
I am clearly on a lifetime voyage to re-invent HyTime in my own fashion :-)
Many thanks for this enlightenment. All I have to do is work out how to
implement it sufficiently generically in JUMBO.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 07:10:34 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:35 2004
Subject: LISTRIVIA (was Re: RFC: Simple XML Event-Based API for Java)
In-Reply-To: <34974325.6314AD85@n-space.com.au>
References: <199712170238.VAA00829@unready.microstar.com>
Message-ID: <3.0.1.16.19971217080814.49c74e72@pop3.demon.co.uk>
At 13:42 17/12/97 +1030, [... a first-time poster on XML-DEV...] wrote:
[... some useful stuff...]
and then spoiled it by attaching 5 Kbytes (sic) of non-ASCII files that I,
the majordomo software, the hypermail system and the rest of the world
don't want and sometimes get high blood pressure about.
I expect of you think - what a boring old person I am to keep on about
this. After all what's 5 Kbytes?
I had a colleague in Bratislava who some years ago was charged ONE US
DOLLAR (yes, real grey green greasy money) for ONE KILOBYTE by his ISP.
The price may have changed, but it expect it still costs more than he can
afford.
I was privileged to hear about scientific computing recently in the
recently independent ex-USSR states. Some of these countries have a SINGLE
64KB LINE FOR THE WHOLE COUNTRY. I imagine that in most African countries
it's even worse. Thoughtless attachments and quoting are a serious
disadvantage to people who are really struggling.
The XML community has taken great pains to try to make the language
accessible to every country in the world. Let's not send them junk content.
>Attachment Converted: "c:\eudora\attach\vcard39.vcf"
>
>Attachment Converted: "c:\eudora\attach\smime1.p7s"
>
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 07:11:52 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:35 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <199712170238.VAA00829@unready.microstar.com>
Message-ID: <3.0.1.16.19971217073359.52afe65e@pop3.demon.co.uk>
At 21:38 16/12/97 -0500, David Megginson wrote:
[... a really simple and understandable interface ...]
If it helps the deliberations of the closeted experts, this looks exactly
the sort of level of interface I would like and can work with. I assume
that somewhere will be all the calls to the "DTD" stuff.
Keep at it!
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 07:29:18 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:35 2004
Subject: LISTRIVIA (was Re: Any XSL tool!)
Message-ID: <3.0.1.16.19971217082735.2dc7e69e@pop3.demon.co.uk>
The following (private) e-mail has just arrives and confirms exactly what I
have just posted about attachments. The quoted mail comes from someone in a
country *much* poorer than the one I live in.
>
>Somebody mailed me a attachment with extension :-vcf.
>How do i open it?
>regards
Dear [name omitted for privacy]
A VCF is a "vcard" usually with personal details of the sender (such as
address, e-mail, title, etc.) I think it's in ASCII.
I can't help you on how to read it, since it depends on the mailer that
you have. If this is not a recent mailer, you may not be able to access it
at all. You will probably be able to save it to disk. My Eudora mailer (on
Windows 3.1) automatically saves these to a directory C:\eudora\attach. It
then gives them memorable names like vcard39.vcf. They can be opened as an
ASCII file.
I have about 100K of accumulated attachments, many from XML-DEV.
In my opinion it is unnecessary to attach any files, including *.vcf, to
postings to XML-DEV and I have asked the posters if they would take the
trouble not to. I'm sure they will take note of this.
Best wishes with XML :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Dec 17 08:17:33 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:35 2004
Subject: Simple XML Event-Based API for Java
References: <000b01bd0aa6$753cce10$0100007f@localhost>
Message-ID: <349780FD.D468CAEB@technologist.com>
Don Park wrote:
>
> I would rename XmlApplication and XmlProcessor to XmlConsumer and
> XmlProducer.
I would interpret those as classes that create and consume XML (text
strings). Perhaps they should be called EventProducer and EventConsumer
or XMLEventProducer and XMLEventConsumer. The former would depend on the
package mechanism to avoid clashes with other kinds of Event systems.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Wed Dec 17 09:06:48 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:35 2004
Subject: Simple XML Event-Based API for Java
Message-ID: <003a01bd0aca$9ddf4a90$0100007f@localhost>
>I would interpret those as classes that create and consume XML (text
>strings). Perhaps they should be called EventProducer and EventConsumer
>or XMLEventProducer and XMLEventConsumer. The former would depend on the
>package mechanism to avoid clashes with other kinds of Event systems.
XMLEventBlahBlah implies something that has to do with XMLEvent objects
which does not exist. The fact that the API being worked is said to be
event-based does not imply that central product of the API are events. It
could have just as well been described as callback-based XML parser.
Furthermore, I do not see how XmlConsumer and XmlProducer imply that they
work with XML text string. Those names imply only that they are interfaces
for classes that consume and produce XML data.
As far as reducing dependency on package mechanism, there is a point of
balance where class names are unique enough without requiring package
specification for most of the situations. I do not see how XmlEventBlah is
significantly better than XmlBlah. If there is any confusion, it is cleared
up by import statements or prefixing package names. org.w3c.xml.XmlConsumer
is not very long and is needed only in the instantiation call.
BTW, some attention should be paid to JavaBeans method signatures if you are
planning on having simple XML event-based parser packaged as beans.
My comments are just comments, pure and simple. Any effort on the XML
parser is a movement in the right direction no matter whether I have a bone
to pick with its design. I sure do appreciate the effort you guys are
putting in.
Sincerely,
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Wed Dec 17 09:28:27 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:35 2004
Subject: RFC: Simple XML Event-Based API for Java
Message-ID: <003f01bd0acd$a3266da0$0100007f@localhost>
>I don't see the point of the XmlProcessor first argument. What's wrong
>with having the implementation of XmlApplication store the XmlProcessor
>in the member variable? (This is what SP typically does.)
XmlApplication can not store the XmlProcessor in the member variable because
it is an interface. I am very happy to see that XmlProcessor and
XmlApplication are interfaces rather than classes. Of course, it would help
to have some sort of Factory or Manager.
>The one major omission I see here is absense of information about the
>location (URL, byte offset, line number etc) of the events. It would be
>very nice to be able to implement validation as just as an
>XmlApplication (that wraps around another XmlApp). In others to to run
>without validation you would use:
This is exactly why I proposed XmlFilter. XmlValidator derived from
XmlFilter can be used to add validation at runtime. Each class and
interfaces should have a clearly intended role. Stringing XmlApplications
along like some kind of Unix app is not something I would like to see people
do. I would rather see folks developing XmlFilters to be intentionally used
as converters or by-product producers.
>> On the positive side, this interface would let you hang more than one
>> application off the same parse, which could be very interesting.
>
>I don't think this is a good idea. It adds complexity and it's likely
>to impose a performance cost, but it doesn't buy you anything, because
>you can achieve that functionality with a MultipleXmlApplication class
>that implements the XmlApplication interface, and provides
>addApplication and removeApplication methods, and then forwards each
>event to the applications that have been added to it.
Support of multiple event listeners is the norm in the Java world. As they
say "When in Texas, wear cowboy boots". I have no concern about performance
cost since Java loops are not very expensive compared to method invocations
and object instantiations. If we were really concerned about performance, I
would recommend giving up the use of String. Pool of marker/cursor into a
string buffer will improve performance by a factor.
>> userData property also gives users a chance to pass extra information
>> to the processor easily, if they wish.
>Surely there are cleaner ways to do this sort of thing.
I do not think so. Just as every Mac developer loved having RefCon to hang
thing onto, I like userData. Could I have get/setStudData methods?;-)
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Wed Dec 17 09:28:38 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:35 2004
Subject: RFC: Simple XML Event-Based API for Java
Message-ID: <004001bd0acd$a405aa10$0100007f@localhost>
>If it helps the deliberations of the closeted experts, this looks exactly
>the sort of level of interface I would like and can work with. I assume
>that somewhere will be all the calls to the "DTD" stuff.
Perhaps we should have DtdConsumer interface and add/removeDtdConsumer
methods in XmlProcessor? I would advice keeping it empty for now as a
placeholder and keep moving.
> Keep at it!
David is a workaholic. You can't pull him off it. I am a spectaholic ;-p
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From anders at rcs.urz.tu-dresden.de Wed Dec 17 10:19:07 1997
From: anders at rcs.urz.tu-dresden.de (Andrea Anders)
Date: Mon Jun 7 16:59:35 2004
Subject: inclusions/exclusions/named groups
Message-ID:
I am a amateur in xml and hope anyone can help me.
I try to transform a SGML-DTD into XML (I use MSXML-parser).
My questions are:
1) Neither SGML-inclusions nor -exclusions are allowed in XML!? How can I
express this in XML?
my sgml-dtd:
...
2) I tried it to bypass with named groups, but it failed. Named groups are
not allowed too.
There are any ideas?
Thanks.
____________________________________________________________
Andrea Anders
-------------
eMail: anders@rcs.urz.tu-dresden.de
WWW: http://rcswww.urz.tu-dresden.de/~anders
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tms at ansa.co.uk Wed Dec 17 10:49:28 1997
From: tms at ansa.co.uk (Toby Speight)
Date: Mon Jun 7 16:59:35 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: Antony Blakey's message of "Wed, 17 Dec 1997 13:42:37 +1030"
References: <199712170238.VAA00829@unready.microstar.com> <34974325.6314AD85@n-space.com.au>
Message-ID:
Antony> Antony Blakey
> In article <34974325.6314AD85@n-space.com.au>, Antony wrote:
Antony> [1 ]
Antony>
Antony> [1.1 ]
Antony> David Megginson wrote:
>> I would lose ?lfred's resolveEntity() callback
Antony> One of the major pains we have had using the available XML
Antony> tools is the lack of a resolveEntity() callback. Originally
Antony> we wanted to use PUBLIC identifiers and resolve them using a
Antony> catalog, but now we use SYSTEM urls and have a dedicated http
Antony> host to resolve resources. Unfortunately we need to ship tools
Antony> to customers who may not be able to resolve the URL. It is
Antony> not feasible to change the SYSTEM identifiers. What we need
Antony> to do is change the URL on the fly (ie redirect through a
Antony> proxy or a lookup), or actually provide the input stream from
Antony> within the program ie. the entity is stored as a string, or
Antony> accessed through ClassLoader.getResourceAsStream(). This is
Antony> also neccessary if you want to store resources in a versioned
Antony> object base and have the version number implicit in the
Antony> processing, rather than explicitly mentioned in the URL
Antony> (although we have in fact done exactly this :)
ISTM that there's no difficulty in bolting on an XmlEntityResolver
interface to the design, and a method in XmlProcessor to register it
(just one small interface, David!). An XmlApplication could implement
the resolver interface, so it doesn't necessarily imply a proliferation
of classes.
However, I say it should be kept as simple as possible (but no simpler)
to start with, and goodies like the resolver can be added once there
are some implementations. Perhaps we'll want a "Level 2" API that
extends the interfaces in the Level 1 API?
[8-line sig snipped]
Antony> [1.2 Card for Antony Blakey ]
Antony>
Antony> [2 S/MIME Cryptographic Signature ]
I agree with PMR's comments on this lot (why can't people just include
an URL to their personal information, like my X-Author-Info header?)
--
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Wed Dec 17 10:51:48 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:35 2004
Subject: RFC: Simple XML Event-Based API for Java
References: <003f01bd0acd$a3266da0$0100007f@localhost>
Message-ID: <3497ADBF.27C4F668@jclark.com>
Don Park wrote:
> >I don't see the point of the XmlProcessor first argument. What's wrong
> >with having the implementation of XmlApplication store the XmlProcessor
> >in the member variable? (This is what SP typically does.)
>
> XmlApplication can not store the XmlProcessor in the member variable because
> it is an interface. I am very happy to see that XmlProcessor and
> XmlApplication are interfaces rather than classes.
I didn't suggest XmlApplication should should store XmlProcessor in a
member variable. I suggested that implementations of XmlApplication
could (if they needed to make callbacks to XmlProcessor) store
XmlProcessor in a member variable.
> >I don't think this is a good idea. It adds complexity and it's likely
> >to impose a performance cost, but it doesn't buy you anything, because
> >you can achieve that functionality with a MultipleXmlApplication class
> >that implements the XmlApplication interface, and provides
> >addApplication and removeApplication methods, and then forwards each
> >event to the applications that have been added to it.
>
> Support of multiple event listeners is the norm in the Java world. As they
> say "When in Texas, wear cowboy boots".
I don't think it's appropriate to carry over patterns from GUI events
and apply them to XML events just because we happen to use the word
"event" to describe them both. I believe performance is important for
XML processing, and an interface shouldn't impose an unnecessary
performance cost.
The real merit of this interface is that it's simple; unless there's a
really compelling need for a feature, I think it should be left out.
> If we were really concerned about performance, I
> would recommend giving up the use of String.
It's (rightly in my view) done that already for character data (which I
think is right). It's not a problem for element type names, because an
implementation can maintain a hash table of names and thus only allocate
a String for each distinct element type.
> >> userData property also gives users a chance to pass extra information
> >> to the processor easily, if they wish.
>
> >Surely there are cleaner ways to do this sort of thing.
>
> I do not think so. Just as every Mac developer loved having RefCon to hang
> thing onto, I like userData.
Could you explain a typical case where you need this?
Are there any standard Java classes that do this?
It feels very wrong to me; it's the sort of thing I would try hard to
avoid in my own programming, but maybe this is my strongly-typed C++
prejudices showing through. To me it seems like a feature that one can
easily manage without.
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 11:09:35 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:35 2004
Subject: inclusions/exclusions/named groups
In-Reply-To:
Message-ID: <3.0.1.16.19971217120821.30af6000@pop3.demon.co.uk>
At 10:20 17/12/97 +0100, Andrea Anders wrote:
>I am a amateur in xml and hope anyone can help me.
You are very welcome, Andrea, and this is exactly the sort of question that
needs addressing. I can't help you myself, but I know that it has been
addressed before - it would be nice if someone has posted guidelines. [I'm
not sure whether there is a general approach - my suspicion is that you can
end up with quite a complex XML-DTD sometimes.]
Best of luck.
P.
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From arjan.loeffen at let.ruu.nl Wed Dec 17 11:19:35 1997
From: arjan.loeffen at let.ruu.nl (Arjan Loeffen)
Date: Mon Jun 7 16:59:35 2004
Subject: inclusions/exclusions/named groups
References:
Message-ID: <3497B4CA.79710B80@let.ruu.nl>
Andrea Anders wrote:
> 1) Neither SGML-inclusions nor -exclusions are allowed in XML!? How can I
> express this in XML?
Inclusions and exclusions cannot be expressed by model group constructs (except
for a very few cases). As model groups describe and therefore affect an element's
content, and therefore are a DTD-based concept, exceptions describe and affect
the complete element subtree, and therefore are a document-instance-based
concept.
Best you can do is to merge inclusions into the model groups of all elements it
'intends to affect' (typically by defining parameter entities), which would
extent over all elements occurring in the model of the element you intended the
inclusion to work on (and elements in the model of those elements, etc.).
To alter the model group for exclusions requires you to re-think the complete set
of parameter entities used in the original DTD; you have to make certain that the
element you want excluded does not occurr in any model after entities are
resolved.
Unsupporting exceptions is the toll we pay for allowing standard parser
generators to be used to build XML systems.
Arjan.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 17 11:54:17 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:35 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <34975FEE.1A5406F4@jclark.com>
References: <199712170238.VAA00829@unready.microstar.com>
<34975FEE.1A5406F4@jclark.com>
Message-ID: <199712171151.GAA00342@unready.microstar.com>
James Clark writes:
> I don't see the point of the XmlProcessor first argument. What's wrong
> with having the implementation of XmlApplication store the XmlProcessor
> in the member variable? (This is what SP typically does.)
The advantage is that the same XmlApplication object can work with
more than one XmlProcessor at the same time (though it is not required
to be able to do so).
> > public void
> > startDocument (XmlProcessor processor, String pubid, URL sysid);
>
> What do the pubid and sysid arguments represent? The document entity?
Yes. I suppose that they are redundant, given
XmlProcessor.getPublicId() and XmlProcessor.getSystemId(), so they
could go if the XmlProcessor argument stayed.
> > public void
> > startProlog (XmlProcessor processor);
> >
> > public void
> > endProlog (XmlProcessor processor);
>
> Why do you need startProlog() and endProlog()?
Convenience only: users could infer the end of the prolog from the
start of the document element. The end of the prolog (or at least, of
the document type declaration) is important for ?lfred, because that
is the first point when ?lfred's DTD query routines will return useful
results.
> The one major omission I see here is absense of information about the
> location (URL, byte offset, line number etc) of the events. It would be
> very nice to be able to implement validation as just as an
> XmlApplication (that wraps around another XmlApp). In others to to run
> without validation you would use:
>
> processor.run(new MyXmlApplication());
>
> and to run with validation you would use
>
> processor.run (new ValidateXmlApplication(new MyXmlApplication));
>
> In order to make this work the application needs to be able to get
> information about the location of start/end tags and of data. This is
> also useful for all kinds of application-specific validation.
>
> This could be done by having the app ask the processor for the location
> of the last event in some non-standardized way, but that's kind of
> kludgy. On the other hand, maybe this is just too fancy for a
> "simple" API.
I think that it probably is too fancy.
> > public void
> > error (XmlProcessor processor, String message, URL url, int line);
>
> I don't think having simply "String message" is going to
> internationalize well. It's also desirable to know exactly what
> character number/column number the error occurred at. Also XML
> distinguishes fatal errors (which the parser must not continue
> processing after) from other errors. On the whole I would be inclined
> to handle fatal errors as an exception, and not try to deal with
> non-fatal errors at all in this simple interface.
>
> > On the positive side, this interface would let you hang more than one
> > application off the same parse, which could be very interesting.
>
> I don't think this is a good idea. It adds complexity and it's likely
> to impose a performance cost, but it doesn't buy you anything, because
> you can achieve that functionality with a MultipleXmlApplication class
> that implements the XmlApplication interface, and provides
> addApplication and removeApplication methods, and then forwards each
> event to the applications that have been added to it.
A wise suggestion.
> > The
> > userData property also gives users a chance to pass extra information
> > to the processor easily, if they wish.
>
> Surely there are cleaner ways to do this sort of thing.
Perhaps -- it would be most useful, again, when an XmlApplication was
being used with more than one XmlProcessor.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 17 11:56:43 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:35 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <3.0.1.16.19971217073359.52afe65e@pop3.demon.co.uk>
References: <199712170238.VAA00829@unready.microstar.com>
<3.0.1.16.19971217073359.52afe65e@pop3.demon.co.uk>
Message-ID: <199712171154.GAA00352@unready.microstar.com>
Peter Murray-Rust writes:
> At 21:38 16/12/97 -0500, David Megginson wrote:
> [... a really simple and understandable interface ...]
>
> If it helps the deliberations of the closeted experts, this looks exactly
> the sort of level of interface I would like and can work with. I assume
> that somewhere will be all the calls to the "DTD" stuff.
Yes, but we are not looking at standardising these right now. They
will still be available in ?lfred, but outside of the interface.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 12:40:26 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:35 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <3497ADBF.27C4F668@jclark.com>
References: <003f01bd0acd$a3266da0$0100007f@localhost>
Message-ID: <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk>
At 17:47 17/12/97 +0700, James Clark wrote:
>
>The real merit of this interface is that it's simple; unless there's a
>really compelling need for a feature, I think it should be left out.
Yes. Let's please get this bus into the air. If it needs tweaking or
junking later, it's not the end of the world :-). I couldn't bear it if we
go down the same road as we have done 2-3 times before, drawing out the
process and finally running out of steam.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mecom-gmbh at mixx.de Wed Dec 17 13:06:03 1997
From: mecom-gmbh at mixx.de (james anderson)
Date: Mon Jun 7 16:59:35 2004
Subject: EntityDef v/s PEDef
Message-ID: <3497CF51.3FE0F8F9@mixx.de>
greetings,
todays question from out of the blue:
do i follow PR-XML-19971208 correctly, that the only difference between
a general entity definition and a parameter entity definition
(syntactically modulo the '%') is that the general entity definition
permits a notation?
[70] EntityDecl ::= GEDecl | PEDecl
[71] GEDecl ::= ''
[72] PEDecl ::= ''
[73] EntityDef ::= EntityValue | ExternalDef
[74] PEDef := EntityValue | ExternalID
[75] ExternalDef ::= ExternalID NDataDecl?
[76] ExternalID ::= 'SYSTEM' S SystemLiteral
| 'PUBLIC' S PubidLiteral S SystemLiteral
[77] NDataDecl ::= S 'NDATA' S Name
(nb. spurious (?) '|' removed from [72])
what is the significance of ExternalDef? i found it referenced nowhere
else in the document.
wouldn't
[73'] EntityDef ::= EntityValue | ExternalID NDataDecl?
[74'] PEDef := EntityValue | ExternalID
[75x]
make the similarity clearer?
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Dec 17 14:38:56 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:35 2004
Subject: Simple XML Event-Based API for Java
References: <003a01bd0aca$9ddf4a90$0100007f@localhost>
Message-ID: <3497E3DD.C3980FB4@technologist.com>
Don Park wrote:
>
> XMLEventBlahBlah implies something that has to do with XMLEvent objects
> which does not exist. The fact that the API being worked is said to be
> event-based does not imply that central product of the API are events.
I think that the concept of events are implicit in the interfaces that
are being defined and may well be explicit in the documentation for it.
The only reason that we don't call them startElementEvent,
endElementEvent, endPrologEvent etc. is because it would be redundant.
ON THE OTHER HAND -- should we actually using Event Objects as SP does?
The nice thing about event objects is that they can be subclassed to add
more information. An example would be James' request for line number
information. That means that an XAPI "level 2" producer could easily
produce data for a "level 1" consumer without a problem.
They can also be "lazy" in the sense that they don't have to construct
(e.g.) a dictionary object for attributes unless the start-element ASKS
for attributes. Is Java object construction too slow for us to use real
objects?
> It
> could have just as well been described as callback-based XML parser.
> Furthermore, I do not see how XmlConsumer and XmlProducer imply that they
> work with XML text string. Those names imply only that they are interfaces
> for classes that consume and produce XML data.
I'm not religious on this issue, but the only definition of "XML Data" I
know of is PR-xml-971208.
"This specification describes the required behavior of an XML processor
in terms of how it must read XML data and the information it must
provide to the application."
In other words, XML Data is angle-bracketed text that conforms to
PR-xml-971208.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Dec 17 14:39:13 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:35 2004
Subject: RFC: Simple XML Event-Based API for Java
References: <199712170238.VAA00829@unready.microstar.com>
<34975FEE.1A5406F4@jclark.com> <199712171151.GAA00342@unready.microstar.com>
Message-ID: <3497DE8B.7B6F9C28@technologist.com>
David Megginson wrote:
> The advantage is that the same XmlApplication object can work with
> more than one XmlProcessor at the same time (though it is not required
> to be able to do so).
If you use the name "Application" then it makes sense to require a
single application to support multiple processors. Jade is an example of
an application that supports multiple processors. If we use the word
***Consumer, then it makes sense that there should be a single consumer
per Producer.
> > This could be done by having the app ask the processor for the location
> > of the last event in some non-standardized way, but that's kind of
> > kludgy. On the other hand, maybe this is just too fancy for a
> > "simple" API.
>
> I think that it probably is too fancy.
Maybe, but it also seems very important. A processor that can't tell you
where your errors are is very frustrating. Perhaps there should
immediately be a "level 2" that supports this.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 17 14:44:44 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:35 2004
Subject: Failure Criteria: Simple XML Event-Based API for Java
In-Reply-To: <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk>
References: <003f01bd0acd$a3266da0$0100007f@localhost>
<3497ADBF.27C4F668@jclark.com>
<3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk>
Message-ID: <199712171442.JAA00570@unready.microstar.com>
Peter Murray-Rust writes:
> Yes. Let's please get this bus into the air. If it needs tweaking
> or junking later, it's not the end of the world :-). I couldn't
> bear it if we go down the same road as we have done 2-3 times
> before, drawing out the process and finally running out of steam.
Any project should have measurable failure criteria. Here are my
suggestions.
The Simple XML Event-Based API initiative will have failed if either
of the following is true:
1) By Monday 12 January 1998, at least three Java parser writers have
not agreed to support a specific set of common interfaces.
2) By Monday 12 January 1998, at least three Java applet or
application authors have not agreed to use the same set of common
interfaces that the parser writers have agreed to support.
In other words, we need at least one other parser writer on board
besides Tim and me (a duopoly is almost as bad as a monopoly), and at
least two other applet/application writers besides Peter. If we don't
have that agreement, and a working beta interface, by 12 January, I
won't want to spend any more of my time on this issue (I have other
projects that I'd like to pursue).
DOM
---
Another interesting question is the DOM. I have not taken the time
yet to see if this interface provides enough information to construct
the most basic DOM nodes -- if it does (or at least, can), then we
could have a single DOM module maintained separately (using the common
event interface) instead of requiring each parser writer to create a
separate one. A separate DOM module with its own maintainer would be
much more likely to stay up to date and robust.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 17 14:51:42 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:35 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <3497DE8B.7B6F9C28@technologist.com>
References: <199712170238.VAA00829@unready.microstar.com>
<34975FEE.1A5406F4@jclark.com>
<199712171151.GAA00342@unready.microstar.com>
<3497DE8B.7B6F9C28@technologist.com>
Message-ID: <199712171449.JAA00603@unready.microstar.com>
Paul Prescod writes:
> > The advantage is that the same XmlApplication object can work with
> > more than one XmlProcessor at the same time (though it is not required
> > to be able to do so).
>
> If you use the name "Application" then it makes sense to require a
> single application to support multiple processors. Jade is an example of
> an application that supports multiple processors. If we use the word
> ***Consumer, then it makes sense that there should be a single consumer
> per Producer.
I'm using the XML terminology, where "processor" actually means
"parser" (ick).
> > > This could be done by having the app ask the processor for the location
> > > of the last event in some non-standardized way, but that's kind of
> > > kludgy. On the other hand, maybe this is just too fancy for a
> > > "simple" API.
> >
> > I think that it probably is too fancy.
>
> Maybe, but it also seems very important. A processor that can't tell you
> where your errors are is very frustrating. Perhaps there should
> immediately be a "level 2" that supports this.
These are two separate things. Adding a "col" argument to the error()
callback is not so tricky, but providing the exactly location of every
start and end tag or data chunk is too complicated.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 15:22:12 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:35 2004
Subject: Failure Criteria: Simple XML Event-Based API for Java
In-Reply-To: <199712171442.JAA00570@unready.microstar.com>
References: <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk>
<003f01bd0acd$a3266da0$0100007f@localhost>
<3497ADBF.27C4F668@jclark.com>
<3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971217161409.5657bb1e@pop3.demon.co.uk>
At 09:42 17/12/97 -0500, David Megginson wrote:
>Peter Murray-Rust writes:
[...]
>
>Any project should have measurable failure criteria. Here are my
>suggestions.
>
>The Simple XML Event-Based API initiative will have failed if either
>of the following is true:
>
>1) By Monday 12 January 1998, at least three Java parser writers have
> not agreed to support a specific set of common interfaces.
>
>2) By Monday 12 January 1998, at least three Java applet or
> application authors have not agreed to use the same set of common
> interfaces that the parser writers have agreed to support.
Yes - I think this is very appropriate. I will commit at this stage to do
what I can for JUMBO. Given that the API will look fairly like what I'm
used to from David and Tim that seems fine (the Xapi-J was a level above me).
So barring the possibility that I there are bits I may not *understand* it
shouldn't be too horrendous. I would be *very grateful* for a working
harness like Driver.java (Lark) or the equiv in lfred. It's then trivial to
make sure I've got it right.
So - one more parser write, and two more applications. The applications
needn't be browsers - they could be transformers, search engines, whatever.
And they needn't exercise the whole API (just as JUMBO won't). It simple
has to show that the approach is understandable by at least three humans
not connected with the other three humans. [Actually robots can volunteer
if they want, as well].
P.
>
>In other words, we need at least one other parser writer on board
>besides Tim and me (a duopoly is almost as bad as a monopoly), and at
>least two other applet/application writers besides Peter. If we don't
>have that agreement, and a working beta interface, by 12 January, I
>won't want to spend any more of my time on this issue (I have other
>projects that I'd like to pursue).
>
>
>DOM
>---
>Another interesting question is the DOM. I have not taken the time
>yet to see if this interface provides enough information to construct
>the most basic DOM nodes -- if it does (or at least, can), then we
>could have a single DOM module maintained separately (using the common
>event interface) instead of requiring each parser writer to create a
>separate one. A separate DOM module with its own maintainer would be
>much more likely to stay up to date and robust.
>
>
>All the best,
>
>
>David
>
>--
>David Megginson ak117@freenet.carleton.ca
>Microstar Software Ltd. dmeggins@microstar.com
> http://home.sprynet.com/sprynet/dmeggins/
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From d94-dwi at nada.kth.se Wed Dec 17 15:25:38 1997
From: d94-dwi at nada.kth.se (=?ISO-8859-1?Q?Douglas_Wikstr=F6m?=)
Date: Mon Jun 7 16:59:35 2004
Subject: unsubscribe
In-Reply-To: <3.0.1.16.19971217161409.5657bb1e@pop3.demon.co.uk>
Message-ID:
unsubscribe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Wed Dec 17 15:52:03 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:36 2004
Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal
Message-ID: <3.0.32.19971217075000.00aa2634@pop.intergate.bc.ca>
At 02:22 PM 17/12/97 GMT, Gavin Nicol wrote:
>XAPI-J, or whatever this becomes, should be sufficient to build a DOM
>representation.
No no no. You are missing the point - this is the SIMPLE interface for
RDF-heads and SMIL-folks and all the other people who think that XML
should just be elements and attributes and have none of that SGML
apparatus. From the end-user programmer's point of view, it should be.
If you turn your assertion around, then it's correct: you should
be able to build SAX on top of the DOM. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From richard at cogsci.ed.ac.uk Wed Dec 17 16:36:11 1997
From: richard at cogsci.ed.ac.uk (Richard Tobin)
Date: Mon Jun 7 16:59:36 2004
Subject: XML syntax (was Re: external subset syntax)
In-Reply-To: David Megginson's message of Tue, 16 Dec 1997 14:46:22 -0500
Message-ID: <199712171635.QAA21002@stevenson.cogsci.ed.ac.uk>
> No, it's not SGML's fault, at least not this time. Conforming SGML
> parsers are allowed to continue processing if they want to, and are
> even allowed not to report errors at all (as long as they don't claim
> to be "validating parsers"). XML has gone way beyond any SGML
> requirements with this one.
Always remember that your software doesn't have to be a conforming
XML processor unless you want it to be. There are several applications
where you certainly *don't* want to be a conforming processor, such as
an XML editor.
-- Richard
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Dec 17 16:41:29 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:36 2004
Subject: IDL?
References: <003f01bd0acd$a3266da0$0100007f@localhost>
<3497ADBF.27C4F668@jclark.com>
<3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> <199712171442.JAA00570@unready.microstar.com>
Message-ID: <3497F877.977948B9@technologist.com>
David Megginson wrote:
> 1) By Monday 12 January 1998, at least three Java parser writers have
> not agreed to support a specific set of common interfaces.
What about a Python parser writer? We are, after all, on the brink of
the 21st century. It would be really nice to stop the cycle of
"crowning" one language the be-all and end-all of programming languages.
Could we specify the interfaces in terms of IDL instead of Java (or
perhaps agree to make an IDL version soon after the Java one)? The only
extra work I see is that we must explicitly define the interfaces for
URL and Dictionary so that other languages can implement Java-compatible
versions.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Wed Dec 17 18:14:25 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:36 2004
Subject: Failure Criteria: Simple XML Event-Based API for Java
In-Reply-To: <199712171442.JAA00570@unready.microstar.com>
Message-ID:
On Wed, 17 Dec 1997, David Megginson wrote:
> In other words, we need at least one other parser writer on board
> besides Tim and me (a duopoly is almost as bad as a monopoly), and at
> least two other applet/application writers besides Peter.
On the application side (or meta-application), I will commit to having
MONDO and mindo on the API within a couple days of when you release it.
As a semi-application, this includes a DOM builder that I will be
releasing early tomorrow. Is this acceptable as an application?
> DOM
> ---
> Another interesting question is the DOM. I have not taken the time
> yet to see if this interface provides enough information to construct
> the most basic DOM nodes -- if it does (or at least, can), then we
> could have a single DOM module maintained separately (using the common
> event interface) instead of requiring each parser writer to create a
> separate one. A separate DOM module with its own maintainer would be
> much more likely to stay up to date and robust.
Well, I have an architecture, APIs and code that handle building arbitrary
object models, which includes both the type and content of a DOM Document.
This loosely couples the XML events to the DOM object construction, so the
DOM model can be maintained independently of the parser. You could also
have multiple DOM implementation models if you want (and I suspect people
will).
When I move the mindo release up I will let people know so they can
look at it and try it out. (This mindo release is much, much smaller than
the MONDO-J release although it is based on the same concepts and code
base).
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Wed Dec 17 18:32:35 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
Message-ID: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca>
At 09:38 PM 16/12/97 -0500, David Megginson wrote:
>After careful thought, I am fairly certain that I would be willing to
>accept the following simple event-driven API for ?lfred.
I'd be willing to commit to signing up to do this for Lark, given
the following changes:
> public void
> startDocument (XmlProcessor processor, String pubid, URL sysid);
Question: what if there's no public void
> startProlog (XmlProcessor processor);
> public void
> endProlog (XmlProcessor processor);
Lose these; they have no place in this API. You want this kind of stuff,
use Lark or AElfred or whatever.
> public void
> processingInstruction (XmlProcessor processor, String target, String data);
Lose this.
> public void
> error (XmlProcessor processor, String message, URL url, int line);
>}
Have to add the entity ID as an argument. No point giving the line
number if you don't know what it's in.
>The processor itself could implement the following interface (very
>Thread-oriented and Bean-like):
And one last thing: if you use URL, then you have to do a new URL()
which does (I think) at least some syntax checking... is this appropriate?
Why not just pass it as a string? -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Wed Dec 17 18:35:09 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
Message-ID: <3.0.32.19971217103526.00b4151c@pop.intergate.bc.ca>
At 09:38 PM 16/12/97 -0500, David Megginson wrote:
>After careful thought, I am fairly certain that I would be willing to
>accept the following simple event-driven API for ?lfred.
I'd be willing to commit to signing up to do this for Lark, given
the following changes:
Oops; and I forgot the IMPORTANT one: I don't see any point in doing
this if there isn't also an ultra-simple tree interface supporting
only Element, Attribute, and Text classes. Because this is what most
people will use, especially given that a high proportion of XML
transmissions will be small flattish documents; why should everyone
have to build their own tree. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 17 19:05:45 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca>
References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca>
Message-ID: <199712171903.OAA04014@unready.microstar.com>
Tim Bray writes:
> I'd be willing to commit to signing up to do this for Lark, given
> the following changes:
>
> > public void
> > startDocument (XmlProcessor processor, String pubid, URL sysid);
>
> Question: what if there's no well throw in the root doctype.
Agreed. We can take it out, since the same information is available
using getPublicId() and getSystemId() in the XmlProcessor interface.
> > public void
> > startProlog (XmlProcessor processor);
> > public void
> > endProlog (XmlProcessor processor);
>
> Lose these; they have no place in this API. You want this kind of stuff,
> use Lark or AElfred or whatever.
Agreed.
> > public void
> > processingInstruction (XmlProcessor processor, String target, String data);
I disagree -- processing instructions are an essential part of a
document (especially for architectural forms).
> > public void
> > error (XmlProcessor processor, String message, URL url, int line);
> >}
>
> Have to add the entity ID as an argument. No point giving the line
> number if you don't know what it's in.
The URL argument will show you where it is.
> And one last thing: if you use URL, then you have to do a new URL()
> which does (I think) at least some syntax checking... is this appropriate?
> Why not just pass it as a string? -Tim
For starting ?lfred, I found using a string awkward, since I needed a
base URL to resolve relative URLs (like file names). Since XML
mandates URIs anyway, and Java supports them pretty transparently, I
thought that it made sense to use them directly instead of using a lot
of Url.toString() and new URL(String) calls (it will also allow the
use of '==' with system identifiers).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Wed Dec 17 19:09:03 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <3.0.32.19971217103526.00b4151c@pop.intergate.bc.ca>
References: <3.0.32.19971217103526.00b4151c@pop.intergate.bc.ca>
Message-ID: <199712171905.OAA04027@unready.microstar.com>
Tim Bray writes:
> At 09:38 PM 16/12/97 -0500, David Megginson wrote:
> >After careful thought, I am fairly certain that I would be willing to
> >accept the following simple event-driven API for ?lfred.
>
> I'd be willing to commit to signing up to do this for Lark, given
> the following changes:
>
> Oops; and I forgot the IMPORTANT one: I don't see any point in doing
> this if there isn't also an ultra-simple tree interface supporting
> only Element, Attribute, and Text classes. Because this is what most
> people will use, especially given that a high proportion of XML
> transmissions will be small flattish documents; why should everyone
> have to build their own tree. -Tim
I see no reason not to use the DOM for this. The Node, Document,
Element, AttributeList, Attribute, and Text classes look easy enough
to use, and people can simply ignore what they do not need.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 19:56:14 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:36 2004
Subject: IDL?
In-Reply-To: <3497F877.977948B9@technologist.com>
References: <003f01bd0acd$a3266da0$0100007f@localhost>
<3497ADBF.27C4F668@jclark.com>
<3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk>
<199712171442.JAA00570@unready.microstar.com>
Message-ID: <3.0.1.16.19971217192211.406f8f60@pop3.demon.co.uk>
At 11:06 17/12/97 -0500, Paul Prescod wrote:
>David Megginson wrote:
>> 1) By Monday 12 January 1998, at least three Java parser writers have
>> not agreed to support a specific set of common interfaces.
>
>What about a Python parser writer? We are, after all, on the brink of
>the 21st century. It would be really nice to stop the cycle of
>"crowning" one language the be-all and end-all of programming languages.
>
>Could we specify the interfaces in terms of IDL instead of Java (or
>perhaps agree to make an IDL version soon after the Java one)? The only
>extra work I see is that we must explicitly define the interfaces for
>URL and Dictionary so that other languages can implement Java-compatible
>versions.
Please can I very gently suggest that we stick precisely to what David has
suggested. It has the merit that we all understand it. [Strange as it may
seem I have never seen any Python or IDL, so it would make my job a lot
harder.]
The interface has to be simple enough for people like me to understand and
to tell my friends what it's about. I would prefer to limit the Consumers,
Factories and the rest to as few as possible.
On the main goals is to show that we can actually accomplish something
communally. That in itself will be a big achievement, because after that it
should get simpler. We choose java because it's one of the main languages
of the WWW, it's free and the majority of the programs reported here are in
Java.
26 days and counting. In some countries some of the people will be on
holiday for some of the time. There are three more bodies to recruit. We
need people to hack code.
P.
>
> Paul Prescod
>
>
>
>xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>(un)subscribe xml-dev
>To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
>subscribe xml-dev-digest
>List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 20:25:23 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:36 2004
Subject: Failure Criteria: Simple XML Event-Based API for Java
In-Reply-To:
References: <199712171442.JAA00570@unready.microstar.com>
Message-ID: <3.0.1.16.19971217211309.468fb3b0@pop3.demon.co.uk>
At 10:12 17/12/97 -0800, Mark L. Fussell wrote:
[... offer of MONDO on top of API...]
>
>On the application side (or meta-application), I will commit to having
>MONDO and mindo on the API within a couple days of when you release it.
>As a semi-application, this includes a DOM builder that I will be
>releasing early tomorrow. Is this acceptable as an application?
sounds great to me :-) I'll leave the others to comment on the DOM stuff.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 20:28:16 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <3.0.32.19971217103526.00b4151c@pop.intergate.bc.ca>
Message-ID: <3.0.1.16.19971217211949.574fed64@pop3.demon.co.uk>
At 10:35 17/12/97 -0800, Tim Bray wrote:
>
>Oops; and I forgot the IMPORTANT one: I don't see any point in doing
>this if there isn't also an ultra-simple tree interface supporting
>only Element, Attribute, and Text classes. Because this is what most
>people will use, especially given that a high proportion of XML
>transmissions will be small flattish documents; why should everyone
>have to build their own tree. -Tim
Yes - this is really important, because it fixes the terminology. We also
know whether we have a Vector of children or some other model.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Wed Dec 17 20:35:45 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:36 2004
Subject: IDL?
References: <003f01bd0acd$a3266da0$0100007f@localhost>
<3497ADBF.27C4F668@jclark.com>
<3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk>
<199712171442.JAA00570@unready.microstar.com> <3.0.1.16.19971217192211.406f8f60@pop3.demon.co.uk>
Message-ID: <3498363A.5143B885@technologist.com>
Peter Murray-Rust wrote:
>
> The interface has to be simple enough for people like me to understand and
> to tell my friends what it's about. I would prefer to limit the Consumers,
> Factories and the rest to as few as possible.
An IDL interface implies no extra complication in the Java interface. It
merely describes the Java interface in terms that are more universal
than Java itself -- it is like a DTD for interfaces. So far nobody has
yet proposed anything that would make an IDL description impossible. All
I ask is that:
a) nobody do so later (e.g. require runtime lookup of Java class objects
or do something simiarly brain-dead) and
b) implementations in other languages be considered "successes" in terms
of the success/failure of this project.
I don't think that either of these constraints endanger the success of
the Java-specific part of the project or make the Java-specific part
more difficult.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Wed Dec 17 20:44:42 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <199712171903.OAA04014@unready.microstar.com>
References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca>
<3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca>
Message-ID: <3.0.1.16.19971217211714.468f7f82@pop3.demon.co.uk>
At 14:03 17/12/97 -0500, David Megginson wrote:
[...]
>
> > > public void
> > > processingInstruction (XmlProcessor processor, String target,
String data);
>
>I disagree -- processing instructions are an essential part of a
>document (especially for architectural forms).
>
I'd tend to agree on keeping PIs in as well. Both Lark and lfred do them at
present. They are used in namespaces (which JUMBO is able to do something
with) and there are also other local uses.
It's going great :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From neil at bradley.co.uk Wed Dec 17 21:22:22 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun 7 16:59:36 2004
Subject: inclusions/exclusions/named groups
Message-ID: <199712172122.VAA29298@andromeda.ndirect.co.uk>
> I am a amateur in xml and hope anyone can help me.
>
> I try to transform a SGML-DTD into XML (I use MSXML-parser).
> My questions are:
>
> 1) Neither SGML-inclusions nor -exclusions are allowed in XML!? How can I
> express this in XML?
>
> my sgml-dtd:
>
>
>
>
> ...
>
First, let's simplify your SGML DTD, which has too many brackets in
it:
...
Putting the PCDATA in the right place for XML, and removing the
minimization tokens, we get:
...
So you want f, g and h to be accessible in a, b, c, d and e, but also
in l, m and n, but only f and g in i, j and k. Of course, you may not
want any of these directly in LE, c and/or e, though inclusions
automatically allow this. Only you can decide. Assuming that you do want them...
...
In one sense you are lucky in this example, because you do not have
the same element having different content depending on its context.
Suppose the following:
Here, the para element may have an xref, but only if it apepars
inside a section element. To do this in XML requires the definition
of a new element, perhaps called sect_para
Neil.
-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From simeons at allaire.com Wed Dec 17 21:32:53 1997
From: simeons at allaire.com (Simeon Simeonov)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
Message-ID: <01bd0b33$fd605b30$4a15b5cd@sim.allaire.com>
A few comments related to a few posts:
I. Multiplicity of Application - Processor relationship
The "one app, multiple processors argument" is not convincing in my opinion:
(a) I don't think this use of the simple API would be common, and (b) it is
trivial to implement a solution that does this outside the API. I feel the
same about the argument "one processor, multiple applications".
If we make the multiplicity of the relationship between XmlApplication and
XmlProcessor 1 to 1 we can eliminate the XmlProcessor arguments to
XmlApplication methods AND the get/set methods for user data in
XmlProcessor. Additionally, we won't need both addApplication() and
removeApplication().
I see the removal of at least three methods in XmlProcessor and the removal
of XmlProcessor as an argument to XmlApplication methods a substantial gain
for the simple API. Further, I'll get immense personal satisfaction from
seeing the handling of arbitrary user data removed from XMLProcessor.
II. Positional information
I'm somewhat surprised that parser writers claim it is difficult to extract
information about the positions of elements in an XML document. Can s.o.
explain why this is the case? In my work with markup languages I've always
represented the position of elements with a pair of (offset in data stream,
line number, column number) triplets. Providing this information will
certainly result in slightly lower performance, but the functionality it
enables for editing, good error reporting and validation is significant.
III. Exceptions
I am uncertain about the implications of exceptions leaving either the
XmlProcessor or the XmlApplication objects. In particular, I am wondering
what would happen if the XmlProcessor and XmlApplication are used as beans.
I know that in the COM/CORBA world this is very undesirable.
In general, I think it leads to a more complicated programming mechanism.
S.o. mentioned that stopping the parse is difficult with top-down parsers.
While this is true in principle, I there are some very simple mechanisms for
stopping a top-down parse. I'd be happy to discuss these with whoever is
interested.
IV. IDL
I did try a number of times to bring up the issue of a language independent
API with little success. I do see the benefit of something being done with
Java right now, so I'll just wait for the Java API to stabilize before
looking at ways to express it in IDL.
Regards,
Simeon Simeonov
Allaire
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From mrc at allette.com.au Wed Dec 17 21:46:39 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun 7 16:59:36 2004
Subject: inclusions/exclusions/named groups
References:
Message-ID: <3498481C.18CD56D3@allette.com.au>
Andrea Anders wrote:
> I am a amateur in xml...
As are we all...
> I try to transform a SGML-DTD into XML (I use MSXML-parser). My questions are:
>
> 1) Neither SGML-inclusions nor -exclusions are allowed in XML!? How can I
> express this in XML?
Inclusions and (to a lesser extent) exclusions have never really been a great
idea in SGML because of the potential for them to behave incorrectly when parsing
from somewhere other than the top level of the DTD. Depending on how widely
they've been used and how big your data set is, I'd be inclined to process all of
your documents and generate a report of the ancestors elements of the inclusions.
This will give you some perspective about how they've been used - you can then
make informed decisions about their handling and requirements. Exclusions can be
overcome by remodelling the content models, but this could be a substantial
amount of work if your DTD is large and/or complex.
That's the way I wouldn't do it. I would maintain the data as SGML and call it
XML as required. Does it need to be valid, or can it just be well formed? Be
careful about white-space around the inclusions and exclusions if you use this
approach - no matter how you slice it, they're bad news.
--
Regards
Marcus Carr email: mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia) email: info@allette.com.au
Level 10, 91 York Street www: http://www.allette.com.au
Sydney 2000 NSW Australia phone: +61 2 9262 4777
fax: +61 2 9262 4774
_______________________________________________________________
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Wed Dec 17 22:42:20 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
Message-ID: <00d101bd0b3c$88f2aab0$0100007f@localhost>
>I didn't suggest XmlApplication should should store XmlProcessor in a
>member variable. I suggested that implementations of XmlApplication
>could (if they needed to make callbacks to XmlProcessor) store
>XmlProcessor in a member variable.
OOPS. Point taken.
>I don't think it's appropriate to carry over patterns from GUI events
>and apply them to XML events just because we happen to use the word
>"event" to describe them both. I believe performance is important for
>XML processing, and an interface shouldn't impose an unnecessary
>performance cost.
>
>The real merit of this interface is that it's simple; unless there's a
>really compelling need for a feature, I think it should be left out.
While David suggested that add/removeApplication methods allow
implementation of XmlProcessors which support multiple XmlApplications, it
is completely up to the implementations to support multiple XmlApplication
or only one at a time. As JavaBeans spec suggests,
TooManyListenersException should be thrown if XmlProcessor supports only one
XmlApplication for performance and simplicity sake.
>> I do not think so. Just as every Mac developer loved having RefCon to
hang
>> thing onto, I like userData.
>
>Could you explain a typical case where you need this?
>
>Are there any standard Java classes that do this?
userData is a cheap way to associate extra info with the XmlProcessor. For
example, I can store the source URL in the userData. There are other ways
to have XmlProcessors provide the URL info (i.e. Java Activation Frame has
URLDataSource for this) but they are fairly expensive and would
unnecessarily taint the API with URL related stuff. It should be possible
to use XmlProcessor with a File and building URL out of File is not reliable
in all platforms.
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Wed Dec 17 22:42:24 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:36 2004
Subject: Failure Criteria: Simple XML Event-Based API for Java
Message-ID: <00d301bd0b3c$8a818810$0100007f@localhost>
>In other words, we need at least one other parser writer on board
>besides Tim and me (a duopoly is almost as bad as a monopoly), and at
>least two other applet/application writers besides Peter. If we don't
>have that agreement, and a working beta interface, by 12 January, I
>won't want to spend any more of my time on this issue (I have other
>projects that I'd like to pursue).
If Chris does not object or respond, I can step up and provide the
implementation for MSXML by 12 January. There is nothing in the license
that prohibits me from implementing the simple API over MSXML.
As far as James' concern over having a simple DOM, I think one of us can
implement a XmlApplication that produces W3C DOM objects so programmers can
just deal with DOM. Any takers?
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 18 00:05:13 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:36 2004
Subject: Failure Criteria: Simple XML Event-Based API for Java
In-Reply-To: <00d301bd0b3c$8a818810$0100007f@localhost>
Message-ID: <3.0.1.16.19971218005503.0f5f8ba8@pop3.demon.co.uk>
At 14:36 17/12/97 -0800, Don Park wrote:
[...]
>If Chris does not object or respond, I can step up and provide the
>implementation for MSXML by 12 January. There is nothing in the license
>that prohibits me from implementing the simple API over MSXML.
>
Great - I would really love the have that. I assume that it is fairly
stable now (1.8?) and that the various queries on this list have been
resolved...
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From reast at esri.com Thu Dec 18 02:54:08 1997
From: reast at esri.com (Russell East)
Date: Mon Jun 7 16:59:36 2004
Subject: a DTD as a JAR file resource [was Re: RFC: Simple XML Event-Based API for Java]
Message-ID: <34988F5C.16273230@esri.com>
Antony Blakey wrote:
> .... What we need to do is ... provide the
> input stream from within the program ie. the entity is stored as a string, or accessed
> through ClassLoader.getResourceAsStream()...
Yes! I would like to be able to store one or more DTDs as
resources within a JAR file. Within a
I'd like to be able to refer to that DTD, rather than, refering
to some server-side DTD. But, I don't think we can do this now, because,
we can't specify a URL for a JAR resource - well, we can't do it in a
platform independent manner anyway, because JavaSoft states, at
http://java.sun.com/products/jdk/1.1/docs/guide/misc/resources.html :
"The method getResource() returns a URL for the resource.
The URL (and its representation) is implementation-specific
and may vary depending on the implementation details (it may
also change between JDK1.1 and JDK1.1.1). Its protocol is
(usually) specific to the ClassLoader loading the resource.
If the resource does not exist, a null will be returned."
It's hard to test this, firstly Netscape doesn't yet seem to support
ClassLoader.getResource() and IE4 doesn't seem to support JARs as
containers for resources. For instance, I have a sample applet which
is placed into a JAR along with a resource named test.dtd. Within
JDK 1.1.4 appletviewer, getResource() returns the URL of this
resource as: appletresource:/file:/D:/Ims/z//+/test.dtd
or : appletresource://gumnut/http://gumnut/ims/z//+/test.dtd
depending on whether I access the HTML thru my webserver or not.
It would be good to be able to specify one of these URLs in SYSTEM,
and have it work in all cases - not just appletviewer.
Do the XML parser developers have any suggestions on how to achieve this?
Does it make sense to have a special API for the parser through which
you can not only specify an xml document, but also a separate dtd ?
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Russell East mailto:reast@esri.com
_|_| Programmer phn: +1 (909) 793 2853
_|_| ESRI, 380 New York St fax: +1 (909) 307 3067
Redlands CA 92373-8100 http://maps.esri.com/
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Thu Dec 18 03:22:49 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:36 2004
Subject: An interesting news: JDK 1.2 Beta is now public available
Message-ID: <000b01bd0b63$b8af3580$0100007f@localhost>
Since JavaSoft is notoriously late updating its web pages, I thought some of
you might be interested to know that JDK 1.2 Public Beta is finally out at:
http://developer.javasoft.com/developer/earlyAccess/jdk12/
Please do not reply to this message cause I don't want to receive another
LISTRIVIA from Peter :-p
Don "JStud" Park
Master Consultant
donpark@quake.net
Come visit my XML Example Catalog at
http://www.quake.net/~donpark/xmlcat.html
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tyler at infinet.com Thu Dec 18 08:42:48 1997
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
References: <00d101bd0b3c$88f2aab0$0100007f@localhost>
Message-ID: <341CEE96.DFD97141@infinet.com>
Don Park wrote:
> >I didn't suggest XmlApplication should should store XmlProcessor in a
> >member variable. I suggested that implementations of XmlApplication
> >could (if they needed to make callbacks to XmlProcessor) store
> >XmlProcessor in a member variable.
>
> OOPS. Point taken.
>
> >I don't think it's appropriate to carry over patterns from GUI events
> >and apply them to XML events just because we happen to use the word
> >"event" to describe them both. I believe performance is important for
> >XML processing, and an interface shouldn't impose an unnecessary
> >performance cost.
>
> >
> >The real merit of this interface is that it's simple; unless there's a
> >really compelling need for a feature, I think it should be left out.
>
> While David suggested that add/removeApplication methods allow
> implementation of XmlProcessors which support multiple XmlApplications, it
> is completely up to the implementations to support multiple XmlApplication
> or only one at a time. As JavaBeans spec suggests,
> TooManyListenersException should be thrown if XmlProcessor supports only one
> XmlApplication for performance and simplicity sake.
>
> >> I do not think so. Just as every Mac developer loved having RefCon to
> hang
> >> thing onto, I like userData.
> >
> >Could you explain a typical case where you need this?
> >
> >Are there any standard Java classes that do this?
>
> userData is a cheap way to associate extra info with the XmlProcessor. For
> example, I can store the source URL in the userData. There are other ways
> to have XmlProcessors provide the URL info (i.e. Java Activation Frame has
> URLDataSource for this) but they are fairly expensive and would
> unnecessarily taint the API with URL related stuff. It should be possible
> to use XmlProcessor with a File and building URL out of File is not reliable
> in all platforms.
>
> Don
>
I am not sure if this is at all relevant to this discussion, but I got some info
via email from the JDC newsletter that gives an interesting tip on how to
efficiently build tree structures without sucking up too much RAM. I figure,
that for building XML parsers the most efficient way of storing the parsed data
would be some help to the XML parser writers. Anyways, here is the tip.
PERFORMANCE -- using Object to represent disparate types. This tip is a
little tricky, but it recently came up in an actual application, and
illustrates how Java language features are used to efficiently represent a
large data structure.
The application is one where a very large tree structure, consuming
millions of bytes, is built up. Some of the nodes in the tree reference
child nodes (non-terminals), while others are leaf nodes (terminals) and
have no children, but contain String information. The application involves
parsing a large Java program and representing it internally via a tree.
One simple approach to this problem is to define a Node class such as the
following:
public class Node {
private int type;
private Node child[];
private String info;
}
If the node is a leaf node, then info is used. Otherwise, child refers to
the children of the node, and child.length to the number of children.
This approach works pretty well, but uses a lot of memory. Only one of
child and info are used at any one time, meaning that the other field is
wasted. Child is an array, with attendant overhead, for example, in
storing the dimensions of the array for subscript checking. For certain
large inputs, the parser program runs out of memory.
The first refinement of this approach is to collapse child and info:
public class Node {
private int type;
private Object info;
}
In this scheme, info can refer to either a String, for a leaf node, or to a
child node array. Object is the root of the Java class hierarchy, so that
for example, the following:
class A {}
implicitly means:
class A extends Object {}
An instance of a subclass of Object, such as String, can be assigned to an
Object reference. An array of Nodes can likewise be assigned to an Object.
The instanceof operator can be used to determine the actual type of an
Object reference.
In the parser application, using Object to represent both data types is not
good enough because it still takes up too much memory. So a further change
has been implemented. After doing some research, it was found that the
child array consisted of a single Node element about 95 percent of the
time. So it's possible to represent one-child cases directly using an
Object reference to the child node, rather than a reference to a one-long
array of child nodes.
This representation is complicated, and it's useful to define a method for
encapsulating the abstraction as in the following example:
public class Node {
private int type;
private Object info;
// constructors, other methods here ...
// gets the i-th child reference
public Node getChild(int i)
{
if (info instanceof String)
return null;
else if (info instanceof Node && i == 0)
return (Node)info;
else
return ((Node[])info)[i];
}
}
getChild returns the i-th child, or null for leaf nodes. If there is
exactly one child, then info is of type Node, referencing that child. If
there is more than one child, info is of type Node[], and a cast to Node[]
is done, followed by a retrieval and return of the child reference.
In the parser application, this change is enough to tip the scales, so that
the application would not run out of memory. The internal representation
in this example is tricky, but it can be hidden via methods such as
getChild. In general, it's wise to avoid tricky coding, but useful to know
how to do it when the need arises.
The example also illustrates the utility of using one Object reference to
represent several different data types. In C/C++ similar techniques would
use void* pointers or unions.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Thu Dec 18 09:17:08 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> <199712171903.OAA04014@unready.microstar.com>
Message-ID: <3498E043.5F764F28@technologist.com>
David Megginson wrote:
>
> > And one last thing: if you use URL, then you have to do a new URL()
> > which does (I think) at least some syntax checking... is this appropriate?
> > Why not just pass it as a string? -Tim
>
> For starting ?lfred, I found using a string awkward, since I needed a
> base URL to resolve relative URLs (like file names).
XML attributes will probably have relative URLs in them and the XML
Application will have to know how to resolve them. Tim is right that
attributes are syntactically checked when they are created and can throw
an exception if there is a mistake. I would rather leave that up to the
application writer.
> Since XML
> mandates URIs anyway, and Java supports them pretty transparently,
XML mandates URIs, but Java supports URLs. I don't think that all Java
environments will allow new URL types to be installed. But if we are
just passing around strings then the application can recognize URNs and
Do The Right Thing.
> I
> thought that it made sense to use them directly instead of using a lot
> of Url.toString() and new URL(String) calls
I think that all we are doing is shifting the "new URL(String)" calls
from the processor to the application.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 18 09:41:30 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:36 2004
Subject: a DTD as a JAR file resource [was Re: RFC: Simple XML
Event-Based API for Java]
In-Reply-To: <34988F5C.16273230@esri.com>
Message-ID: <3.0.1.16.19971218103417.2ddf1544@pop3.demon.co.uk>
At 18:50 17/12/97 -0800, Russell East wrote:
>
>Yes! I would like to be able to store one or more DTDs as
>resources within a JAR file. Within a
>I'd like to be able to refer to that DTD, rather than, refering
>to some server-side DTD. But, I don't think we can do this now, because,
I think this is a tremendously important subject, Russell - thanks. One of
the exciting aspects of SGML/XML over the WWW is that it makes it possible
to distribute a whole environment. Like you I would want to be able to
"cache" some or all of these resources "client-side". One obvious reason is
slow lines, another is that people are often not connected to the WWW. For
example JUMBO - when used for molecular, statistical and other non-core XML
operations can be over 500Kb in classes.
>
>It would be good to be able to specify one of these URLs in SYSTEM,
>and have it work in all cases - not just appletviewer.
Personally I have enormous trouble with URLs under Java. There are the
following orthogonal problems:
- file: versus http:
- different syntaxes for files ('/' versus '\')
- different compilers (jvc vs javac)
- different JVMs (appletviewer, java, jview, NS (+versions), MS
(+versions), hotjava).
- different platforms (UNIces, Mac, Windows).
Altogether there are at least 20 actual variants.
For example, I contributed a JUMBO snapshot for Henry's latest CDROM on
chemical publishing [1]. Henry already has to test his CDROM for operation
with HTML and JavaScript (sorry ECMAScript). The CDROM has to run anywhere
and for people who have no knowledge of:
HTML
JavaScript
Who made the machine that they are viewing the CDROM on.
Adding:
Java
XML
is yet another dimension.
The ability to publish packaged systems under Java/XML is tremendously
exciting. I've done this in a limited way earlier this year and it seemed
to work. Henry's CDROM is going out with an issue of a paper Journal from
the Royal Soc of Chemistry but I don't expect a lot of feedback about JUMBO
- I suspect that most people won't get that far through the distribution
(the main rationale is *content* - organomettallic chemistry.)
A bizarre problem has just arisen. Please help me :-). The JUMBO snapshot
is arranged to run under a browser as well as a standalone interpreter. So
I have packaged it as this directory structure (not horizontal as hypermail
won't render it :-(
demos
mol.xml
mol.html
jumbo
sgml
SGMLTree.class
cml
MOL.class
etc.
This runs OK with:
java jumbo.sgml.SGMLTree mol.xml
or
java jumbo.sgml.SGMLTree file:/C:/mydir/demos/mol.xml
or (I think)
java.jumbo.sgml.SGMLTree file:mol.xml
and even
java jumbo.sgml.SGMLTree mol.xml PARSER=AElfred
mol.html contains:
When mol.html was loaded this used to work fine, launching JUMBO and
reading the file. Henry tells me that it still works for him under Netscape
4.04. BUT on my own PC with NS4.02 it now throws a SecurityException when
it comes to read file:/C:/cdrom/demos/mol.xml saying it isn't allowed to
read a local file.
So it seems to be a PMR-environment-specific problem. Help would be really
appreciated. Are there any browsers switches, config files etc that I might
have corrupted? Or is everyone benefitting by a laxer implementation of
Applet Security?
[...]
>
>Do the XML parser developers have any suggestions on how to achieve this?
I don't think it's just for parser developers - anyone can play.
>
>Does it make sense to have a special API for the parser through which
>you can not only specify an xml document, but also a separate dtd ?
I think this is part of the namespace activity. JUMBO implements
namespaces experimentally (all namespace stuff is experimental!) and it
involves a lot of subsidiary files (JUMBO has one for most ELEMENTs, schema
files and much more). JUMBO can also use 3 parsers and will - by Jan 12
;-) be able to use 5. As we've seen, these parsers provide additional
features so that it makes sense to distribute them (authors permitting of
course) with the JUMBO distribution. It's also possible - as you suggest
that different DTDs (or, I suspect namespaces) might be distributed as
well. For example, it could make sense to have a variety of support files
for HTML4.0/XML. The reader could then choose between these at browse time.
This requires something with the functionality of a JAR file. I take the
concern that we shouldn't become Java-only, but I think the *experience*
with JAR files for early XML adopters will be essential. So - not for Jan
12 - some communal activity here on distribution, manifests, installation,
etc would be extraordinarily helpful to the success of XML. If we can
reliably distribute our XML applications without worrying about what's at
the other end it would be marvellous. It's a very different sort of task
from writing a parser :-)
P.
[1]
Some people may not know who Henry Rzepa is. Henry is the world's leading
exponent of the use of the Internet and related technologies for chemical
information and publishing. [He also does mainstream research in
computational chemistry.] He has run 3 major electronic conferences on
chemistry (content-driven) and published these with the Royal Society of
Chemistry through an E-lib project, CLIC. This project, including
Cambridge, Leeds and IC is committed to the use of SGML/XML as a publishing
tool. This explains some of his and my enthusiasm for seeing XML succeed.
Our primary concern is to see the link between author and reader as direct
as possible without information loss or corruption.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 18 10:06:26 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <3498E043.5F764F28@technologist.com>
References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca>
<199712171903.OAA04014@unready.microstar.com>
Message-ID: <3.0.1.16.19971218110059.08df9fdc@pop3.demon.co.uk>
At 03:35 18/12/97 -0500, Paul Prescod wrote:
>David Megginson wrote:
>>
[...]
Thanks Paul,
>> I
>> thought that it made sense to use them directly instead of using a lot
>> of Url.toString() and new URL(String) calls
>
>I think that all we are doing is shifting the "new URL(String)" calls
>from the processor to the application.
I think this is right - the "application" is going to have to do a lot of
additional testing for semantic validity. XLL is full of this problem. So I
think it will be very valuable to have *generic* modules that can be used
for this sort of thing. I see some of these as coming in a post-parser
(i.e. post-processor) and pre-application area. For example, it's
reasonable that an application shouldn't get passed:
My XML file
This is a WF element, but contains a number of semantic errors (at least if
the application wishes to validate it against the XLL spec :-).
java.net.URL would catch one of them :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 18 10:16:36 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <341CEE96.DFD97141@infinet.com>
References: <00d101bd0b3c$88f2aab0$0100007f@localhost>
Message-ID: <3.0.1.16.19971218105102.09271c36@pop3.demon.co.uk>
At 04:15 15/09/97 -0400, Tyler Baker wrote:
[...]
>
>I am not sure if this is at all relevant to this discussion, but I got
some info
Well *I* found it extremely valuable :-). This is exactly the sort of thing
that novices will find a variety of ways of tackling. If your suggestions
gets support from those who know more than me, it may be worth considering
for the API.
FWIW I think that the presentation of Trees in the API is the area where
guidance is most valuable. If affects a lot of the downstream part of the
application. Moreover, if people return Objects from a Tree, their nature
has to be very carefully agreed. An Element or a PI is much more obvious by
comparison.
[...]
>In the parser application, using Object to represent both data types is not
>good enough because it still takes up too much memory. So a further change
>has been implemented. After doing some research, it was found that the
>child array consisted of a single Node element about 95 percent of the
Is this figure just for one application, or is it likely to have a
Ziff-like distribution (i.e. "most" XML applications will have only a
single non-terminal child at "most" of the nodes).
>time. So it's possible to represent one-child cases directly using an
>Object reference to the child node, rather than a reference to a one-long
>array of child nodes.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Thu Dec 18 11:36:15 1997
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 16:59:36 2004
Subject: RFC: Simple XML Event-Based API for Java
Message-ID: <01bd0ba9$145a9060$1e09e391@mhklaptop.bra01.icl.co.uk>
I like the way this discussion is going. I don't want to be on anyone's
critical path, but I'll be trying out these interfaces (as an "application
writer") if I can find the time.
I've written a very simple application using AElfred: a converter from an
XML-based encoding of genealogical data back to the "standard"
GEDCOM encoding. (The converter the other way was in Visual Basic,
I will probably rewrite it in Java now I'm getting the hang of it.) It is
beautifully concise, just 17 lines of code apart from the boilerplate
which was copied straight from one of the AElfred sample apps,
and will be even simpler with the proposed revisions to the
interface.
To do anything more interesting with the data (i.e. anything that is not
a single-pass operation) I need a tree representation. Yes, I don't
want to build my own. The DOM seems to be the right solution for
this. The idea of having a choice of parsers with the same event
interface, and a choice of tree-builders that build the same DOM
interface using any of the parsers, is very appealing.
(What I haven't really worked out yet, and would appreciate advice
on, is how to turn the XML objects into a set of genealogical
objects, with methods like getFather(), getMother(), getSpouses(). Do
I need to build a separate tree with the data organised differently,
or should I write methods/functions that operate on the nodes in
the XML tree? I guess the chemists must have similar problems.)
The other thing I need, which has not really been fully addressed,
is access to the DTD. (Not for this application, which I am doing
just as a learning exercise, but for my real job.) I think we need
some kind of extension to the DOM to provide this.
Regards and thanks for all the good work,
Mike Kay, ICL
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Dec 18 12:34:16 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:36 2004
Subject: Goals: XML Event Interface
Message-ID: <199712181232.HAA00429@unready.microstar.com>
I think that the time has come to deal with a question that we have
postponed so far: the goal of a simple XML event-driven interface.
Right now, there are two completely different ideas:
1. The interface will provide standardised low-level, pre-DOM
functionality for parsers to implement, for programmers who do not
want to incur the overhead of using the DOM; perhaps a DOM tree
could be built using only these interfaces.
2. The interface will provide standardised high-level, post-DOM
functionality for parsers to implement, for programmers who do not
want to take the time to learn the XML concepts in the DOM; perhaps
the events could be generated from a DOM tree.
These two are actually quite incompatible: the first is an attempt to
create a less abstract user model, while the second is an attempt to
create a more abstract user model. It's only a (happy) co-incidence
that we have managed a broad agreement so far.
LOW-LEVEL INTERFACE
-------------------
If we decided on (1), then I would consider making the interface the
core interface for Ælfred, and I would probably want to expand it
slightly to include enough functionality to build a basic level-1 DOM
tree, by adding some or all of the following information:
- an event for the doctype declaration
- an isSpecified flag for attributes
- ignorable whitespace (Ælfred should return this anyway)
- comments (yech -- _WHY_ is that in the DOM???)
This interface could use only JDK 1.0.2 features, since I have no
intention of making Ælfred incompatible with existing browsers.
HIGH-LEVEL INTERFACE
--------------------
If we decided on (2), then I would simply produce an optional add-on
for Ælfred, outside of its core interfaces (and probably in a separate
package). I would probably make a pass-through class implementing
(the new) XmlProcessor instead of having Ælfred implement it directly,
so that the core Ælfred could still consist of only two class files.
In this case, the simple interface would be slightly less efficient,
and would include only very minimal functionality (as Tim suggests);
for anything more, you would have to use each parser's native
interface. You could not build a DOM tree using this interface.
The question would remain open whether the simple interface could use
JDK 1.1 or JDK 1.2 features.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Thu Dec 18 14:14:53 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:36 2004
Subject: Example DOM ObjectBuilder
Message-ID:
I released a version of mindo-j and an example DOM ObjectBuilder to:
http://www.chimu.com/projects/mondo/release/
mindo-j is a minimal subset of MONDO suitable for accomplishing some
particular tasks. The version above is focused on supporting DOM
document building, but it can easily expand into much more functionality
and has a more general perspective than might be expected. The example
includes a version of the DOM interfaces and a skeleton implementation.
This is very preliminary for the DOM code, but I am about to fly off for
the holidays so I thought it would be good to release it before then.
The current release is based on Aelfred but it was slightly modified to
support InputStreams and so is included under a different package name.
I will migrate mindo/MONDO to support the standard Java XML API when it
is finalized.
--Mark
mark.fussell@chimu.com
i ChiMu Corporation Architectures for Information
h M info@chimu.com Object-Oriented Information Systems
C u www.chimu.com Architecture, Frameworks, and Mentoring
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 18 14:29:54 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:37 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <01bd0ba9$145a9060$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID: <3.0.1.16.19971218145207.344fc87a@pop3.demon.co.uk>
At 11:35 18/12/97 -0000, Michael Kay wrote:
>I like the way this discussion is going. I don't want to be on anyone's
>critical path, but I'll be trying out these interfaces (as an "application
>writer") if I can find the time.
This is really great Michael.
My vision of (at least one role of) the interface is precisely what you
describe. An intelligent, but ignorant Java/XML application programmer.
A Random Walk in Science
I don't know what to assume...
You may assume infinite ignorance and unlimited intelligence
>I've written a very simple application using AElfred: a converter from an
>XML-based encoding of genealogical data back to the "standard"
>GEDCOM encoding.
This counts :-)
(The converter the other way was in Visual Basic,
>I will probably rewrite it in Java now I'm getting the hang of it.) It is
>beautifully concise, just 17 lines of code apart from the boilerplate
>which was copied straight from one of the AElfred sample apps,
>and will be even simpler with the proposed revisions to the
>interface.
This is exactly what we are after. The idea that we can develop an
application in a few lines is one of the beauties of XML. After all, we are
likely to get a lot more converts if they can write their app in half a
page. The boilerplate lends itself to GUI tools (e.g. presenting the
programmer with a dozen boxes to fill in for doEntity, etc.)
>
>To do anything more interesting with the data (i.e. anything that is not
>a single-pass operation) I need a tree representation. Yes, I don't
>want to build my own. The DOM seems to be the right solution for
>this. The idea of having a choice of parsers with the same event
>interface, and a choice of tree-builders that build the same DOM
>interface using any of the parsers, is very appealing.
Absolutely. JUMBO is essentially a tree-based tool and I expect it to
either implement the DOM or to simply hand over large chunk of current code
to better written stuff for tree management.
As you've probably seen, the Java SwingSet has a Tree tool, which comes
with an example. The major time taking is simply to find one's way around
the documentation. I would have liked to use it for JUMBO and hacked a
simple example, but I need quite a lot of functionality for each displayed
node and I haven't yet found out how to do that (basically I need a
miniPanel for each node).
>
>(What I haven't really worked out yet, and would appreciate advice
>on, is how to turn the XML objects into a set of genealogical
>objects, with methods like getFather(), getMother(), getSpouses(). Do
>I need to build a separate tree with the data organised differently,
>or should I write methods/functions that operate on the nodes in
>the XML tree? I guess the chemists must have similar problems.)
I nearly replied to your earlier posting, but was too busy.
Any pure tree is extremely easy to represent in XML. So, if you simply want
to trace an ancestor tree (i.e. two parents, 4 grand parents, etc.) this is
trivial. If some happen to be identical you can use entities to normalise
the data or hyperlinks. E.g. your father's father's father could be your
mother's mother's father in most countries (cousin marriage). I can display
an animal taxonomy using nothing more than XML and standard JUMBO.
The difficulty comes when the graph has cycles. I am not an expert
genealogist, but most 'family trees' seem to me to be Directed Acyclic
Graphs (DAGs) where the arcs are isParentOf(); and is directional. DAGs
are common in areas like multiple inheritance graphs (C++), multiple
ontological views, etc. I would hope that some standard ways of
representing DAGs might come out of XML and that there would be standard
viewing tools.
Note that the use of ID/IDREF may introduce additional complexity.
Personally I am not clear on the value of IDREF over XLL - it's not trivial
to support in a browser and I doubt that JUMBO will do it.
If we include marriage or other descriptions of human liaison we have a
different type of link. This results in a complex structure, which I would
use XLL to represent. I'd value views on this, because we shall be
encountering XLL on this list from time to time :-)
One approach is to regard all nodes as disjoint, and to create every
relationship in a separate database. A fictitious family might look like:
Elizabeth
Philip
Charles
Diana
Camilla
To represent this structure prettily, and to navigate it usefully, is
almost certainly application-dependent. There are some nice graph layout
tools but they cannot render every application in a meaningful manner.
(Some might even be molecules :-)
>
>The other thing I need, which has not really been fully addressed,
>is access to the DTD. (Not for this application, which I am doing
>just as a learning exercise, but for my real job.) I think we need
>some kind of extension to the DOM to provide this.
I await other comments, but my expectations is that the DOM will actively
deal with this. Experts, please?
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 18 14:33:51 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:37 2004
Subject: Example DOM ObjectBuilder
In-Reply-To:
Message-ID: <3.0.1.16.19971218152409.08df867e@pop3.demon.co.uk>
At 06:14 18/12/97 -0800, Mark L. Fussell wrote:
[...]
>The current release is based on Aelfred but it was slightly modified to
>support InputStreams and so is included under a different package name.
>I will migrate mindo/MONDO to support the standard Java XML API when it
>is finalized.
Great. This is getting very close to David's 3+3 :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Thu Dec 18 14:41:10 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:37 2004
Subject: GEDCOM Model RFC: Simple XML Event-Based API for Java
In-Reply-To: <01bd0ba9$145a9060$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID:
Michael Kay wrote:
> (What I haven't really worked out yet, and would appreciate advice
> on, is how to turn the XML objects into a set of genealogical
> objects, with methods like getFather(), getMother(), getSpouses(). Do
> I need to build a separate tree with the data organised differently,
> or should I write methods/functions that operate on the nodes in
> the XML tree? I guess the chemists must have similar problems.)
I would strongly suggest first designing the genealogical object model
from the GEDCOM definitions (and other sources) without considering XML
or DOM at all. You need to first get a good model of the information you
want to represent in a computer (usually called a DomainModel) before
considering technological/application constraints on it. After you have
the model you can consider how that information could be best constructed
from an XML/GEDCOM encoding.
The GEDCOM spec has a very specific model behind it, so you can
decide whether to use that model, a subset of it, or some improvement to
it. There is a lot of stuff in there so it may take a while to get a
good DomainModel out of it and then implement that model in Java. After
that, the XML should be very easy.
Last time I checked (maybe a year or two ago), nobody had a
publically available GEDCOM object model or implementation in Java, but
maybe that has changed. I spent several days starting the process of
building a model but got called off to other tasks [not sure where my
notes are].
If you have not already, you may want to look at Martin Fowler's Analysis
Patterns book or any of the three Amigos' books (Booch, Rumbaugh,
Jacobson). Full references for these books are at:
http://www.chimu.com/projects/mondo/links.html
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 18 14:51:48 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:37 2004
Subject: Goals: XML Event Interface
In-Reply-To: <199712181232.HAA00429@unready.microstar.com>
Message-ID: <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>
At 07:32 18/12/97 -0500, David Megginson wrote:
>I think that the time has come to deal with a question that we have
>postponed so far: the goal of a simple XML event-driven interface.
Good thinking. One of the really great aspects of XML was/is the 10 goals.
>Right now, there are two completely different ideas:
>
>1. The interface will provide standardised low-level, pre-DOM
> functionality for parsers to implement, for programmers who do not
> want to incur the overhead of using the DOM; perhaps a DOM tree
> could be built using only these interfaces.
Yes. This is needed. It will be needed after the DOM is finalised. (It
might then be built on top of the DOM - I don't know). It is needed now (==
Jan 12).
>
>2. The interface will provide standardised high-level, post-DOM
> functionality for parsers to implement, for programmers who do not
> want to take the time to learn the XML concepts in the DOM; perhaps
> the events could be generated from a DOM tree.
I understand and agree with the concept. I am not qualified to comment on
whether it is needed or is different from the API to the DOM.
>
>These two are actually quite incompatible: the first is an attempt to
>create a less abstract user model, while the second is an attempt to
>create a more abstract user model. It's only a (happy) co-incidence
>that we have managed a broad agreement so far.
Yup. In my limited vision it is *possible* that (1) might be a subset of
(2), but not necessarily.
>
>
>LOW-LEVEL INTERFACE
>-------------------
>
>If we decided on (1), then I would consider making the interface the
>core interface for lfred, and I would probably want to expand it
>slightly to include enough functionality to build a basic level-1 DOM
>tree, by adding some or all of the following information:
>
>- an event for the doctype declaration
Essential IMO
>- an isSpecified flag for attributes
Not quite clear what this is. I assume it is NOT the value of the Default
in the ATTLIST (i.e. "#IMPLIED"). BUT this concept is required in some XLL
applications.
Is it the question of the return value of a non-existent attribute. IOW
what does
return for
String s = element.getAttval("BAR"); // answer: "baz"
String s = element.getAttval("BLORT");// answer "six spaces"
String s = element.getAttval("XYZZY");// answer ""
String s = element.getAttval("PLUGH");// could be "", or null
String s = element.getAttval("Y2");// could be "", or null
This is an area where I think we MUST spell out in graphic detail what is
returned. If nothing else, this is a prime reason for this API. I have got
this hopelessly muddled throughout JUMBO simply because there was no API. I
didn't want to hardcode in anything until the semantics of all this was
clear. At present JUMBO does not distinguish between a null String and "".
If this is going to be important (and I suspect it might) we need to know
NOW. It will be almost impossible to reprogram an application that gets it
"wrong".
Note for newcomers. If I add the declaration:
and wave it over the document, the value of BLORT changes to
"six spaces"
This is always good for a laugh at XML parties, and you can probably make
money out of carefully placed bets.
>- ignorable whitespace (lfred should return this anyway)
>- comments (yech -- _WHY_ is that in the DOM???)
>
>This interface could use only JDK 1.0.2 features, since I have no
>intention of making lfred incompatible with existing browsers.
Agreed.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Dec 18 15:01:15 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:37 2004
Subject: Goals: XML Event Interface
In-Reply-To: <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>
References: <199712181232.HAA00429@unready.microstar.com>
<3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>
Message-ID: <199712181458.JAA00315@unready.microstar.com>
Peter Murray-Rust writes:
> >- an isSpecified flag for attributes
>
> Not quite clear what this is. I assume it is NOT the value of the Default
> in the ATTLIST (i.e. "#IMPLIED").
DTD:
Document instance:
...
The attribute "bar" has the value "hack", and is not specified
(i.e., it is a defaulted value).
...
The attribute "bar" has the value "hack", and is specified.
...
The attribute "bar" has the value "hello", and is specified.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Thu Dec 18 15:28:25 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:37 2004
Subject: RFC: Simple XML Event-Based API for Java
Message-ID: <3.0.32.19971218072536.00acf1c0@pop.intergate.bc.ca>
At 04:15 AM 15/09/97 -0400, Tyler Baker wrote:
>I am not sure if this is at all relevant to this discussion, but I got some info
>via email from the JDC newsletter that gives an interesting tip on how to
>efficiently build tree structures without sucking up too much RAM.
Lark does this now; amazing how Java, which "doesn't have pointers because
they're error-prone", does have something that smells just like (void *)...
in fact, one of the problems that bedevilled programmers for a generation
is that lots of useful C programs were written on VAXes, where a pointer
to everything was always the same size, then that wasn't true any more
on 16-bit DOS boxes; looks like from that point of view, Java is back
to the good old days of the VAX. -T.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 18 15:39:18 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:37 2004
Subject: A bit of fun
Message-ID: <3.0.1.16.19971218163007.0a3f967e@pop3.demon.co.uk>
Since some of us may have time to relax, and since the following is *very*
close to what we are doing with the API, I have forwarded something I have
just received. [There was no useful metadata with the message.]
If anyone feels like translating the spirit into SGML/XML that could be
appropriate at this time of year in some countries.
[... header clipped ...]
-----------------------------------------------------------
Task is to write a program that prints "Hello World" on the
screen...make sure you see the last few attempts (Dilbert).
High School/Jr.High
==================
10 PRINT "HELLO WORLD"
20 END
First year in College
====================
program Hello(input, output)
begin
writeln('Hello World')
end.
Senior year in College
=====================
(defun hello
(print
(cons 'Hello (list 'World))))
New professional
===============
#include
void main(void)
{
char *message[] = {"Hello ", "World"};
int i;
for(i = 0; i < 2; ++i)
printf("%s", message[i]);
printf("\n");
}
Seasoned professional
====================
#include
#include
class string
{
private:
int size;
char *ptr;
public:
string() : size(0), ptr(new char('\0')) {}
string(const string &s) : size(s.size)
{
ptr = new char[size + 1];
strcpy(ptr, s.ptr);
}
~string()
{
delete [] ptr;
}
friend ostream &operator <<(ostream &, const string &);
string &operator=(const char *);
};
ostream &operator<<(ostream &stream, const string &s)
{
return(stream << s.ptr);
}
string &string::operator=(const char *chrs)
{
if (this != &chrs)
{
delete [] ptr;
size = strlen(chrs);
ptr = new char[size + 1];
strcpy(ptr, chrs);
}
return(*this);
}
int main()
{
string str;
str = "Hello World";
cout << str << endl;
return(0);
}
Master Programmer :-))
================
[
uuid(2573F8F4-CFEE-101A-9A9F-00AA00342820)
]
library LHello
{
// bring in the master library
importlib("actimp.tlb");
importlib("actexp.tlb");
// bring in my interfaces
#include "pshlo.idl"
[
uuid(2573F8F5-CFEE-101A-9A9F-00AA00342820)
]
cotype THello
{
interface IHello;
interface IPersistFile;
};
};
[
exe,
uuid(2573F890-CFEE-101A-9A9F-00AA00342820)
]
module CHelloLib
{
// some code related header files
importheader();
importheader( );
importheader();
importheader("pshlo.h");
importheader("shlo.hxx");
importheader("mycls.hxx");
// needed typelibs
importlib("actimp.tlb");
importlib("actexp.tlb");
importlib("thlo.tlb");
[
uuid(2573F891-CFEE-101A-9A9F-00AA00342820),
aggregatable
]
coclass CHello
{
cotype THello;
};
};
#include "ipfix.hxx"
extern HANDLE hEvent;
class CHello : public CHelloBase
{
public:
IPFIX(CLSID_CHello);
CHello(IUnknown *pUnk);
~CHello();
HRESULT __stdcall PrintSz(LPWSTR pwszString);
private:
static int cObjRef;
};
#include
#include
#include
#include
#include "thlo.h"
#include "pshlo.h"
#include "shlo.hxx"
#include "mycls.hxx"
int CHello::cObjRef = 0;
CHello::CHello(IUnknown *pUnk) : CHelloBase(pUnk)
{
cObjRef++;
return;
}
HRESULT __stdcall CHello::PrintSz(LPWSTR pwszString)
{
printf("%ws\n", pwszString);
return(ResultFromScode(S_OK));
}
CHello::~CHello(void)
{
// when the object count goes to zero, stop the server
cObjRef--;
if( cObjRef == 0 )
PulseEvent(hEvent);
return;
}
#include
#include
#include "pshlo.h"
#include "shlo.hxx"
#include "mycls.hxx"
HANDLE hEvent;
int _cdecl main( int argc, char * argv[])
{
ULONG ulRef;
DWORD dwRegistration;
CHelloCF *pCF = new CHelloCF();
hEvent = CreateEvent(NULL, FALSE, FALSE, NULL);
// Initialize the OLE libraries
CoInitializeEx(NULL, COINIT_MULTITHREADED);
CoRegisterClassObject(CLSID_CHello, pCF, CLSCTX_LOCAL_SERVER,
REGCLS_MULTIPLEUSE, &dwRegistration);
// wait on an event to stop
WaitForSingleObject(hEvent, INFINITE);
// revoke and release the class object
CoRevokeClassObject(dwRegistration);
ulRef = pCF-Release();
// Tell OLE we are going away.
CoUninitialize();
return(0);
}
extern CLSID CLSID_CHello;
extern UUID LIBID_CHelloLib;
CLSID CLSID_CHello = { /* 2573F891-CFEE-101A-9A9F-00AA00342820 */
0x2573F891,
0xCFEE,
0x101A,
{ 0x9A, 0x9F, 0x00, 0xAA, 0x00, 0x34, 0x28, 0x20 }
};
UUID LIBID_CHelloLib = { /* 2573F890-CFEE-101A-9A9F-00AA00342820 */
0x2573F890,
0xCFEE,
0x101A,
{ 0x9A, 0x9F, 0x00, 0xAA, 0x00, 0x34, 0x28, 0x20 }
};
#include
#include
#include
#include
#include
#include "pshlo.h"
#include "shlo.hxx"
#include "clsid.h"
int _cdecl main( int argc, char * argv[])
{
HRESULT hRslt;
IHello *pHello;
ULONG ulCnt;
IMoniker * pmk;
WCHAR wcsT[_MAX_PATH];
WCHAR wcsPath[2 * _MAX_PATH];
// get object path
wcsPath[0] = '\0';
wcsT[0] = '\0';
if( argc 1) {
mbstowcs(wcsPath, argv[1], strlen(argv[1]) + 1);
wcsupr(wcsPath);
}
else {
fprintf(stderr, "Object path must be specified\n");
return(1);
}
// get print string
if(argc 2)
mbstowcs(wcsT, argv[2], strlen(argv[2]) + 1);
else
wcscpy(wcsT, L"Hello World");
printf("Linking to object %ws\n", wcsPath);
printf("Text String %ws\n", wcsT);
// Initialize the OLE libraries
hRslt = CoInitializeEx(NULL, COINIT_MULTITHREADED);
if(SUCCEEDED(hRslt)) {
hRslt = CreateFileMoniker(wcsPath, &pmk);
if(SUCCEEDED(hRslt)
hRslt = BindMoniker(pmk, 0, IID_IHello, (void **)&pHello);
if(SUCCEEDED(hRslt)) {
// print a string out
pHello- PrintSz(wcsT);
Sleep(2000);
ulCnt = pHello- Release();
}
else
printf("Failure to connect, status: %lx", hRslt);
// Tell OLE we are going away.
CoUninitialize();
}
return(0);
}
Apprentice Hacker
==================
#!/usr/local/bin/perl
$msg="Hello, world.\n";
if ($#ARGV = 0) {
while(defined($arg=shift(@ARGV))) {
$outfilename = $arg;
open(FILE, " " . $outfilename) || die "Can't write $arg: $!\n";
print (FILE $msg);
close(FILE) || die "Can't close $arg: $!\n";
}
} else {
print ($msg);
}
1;
Experienced Hacker
==================
#include
#define S "Hello, World\n"
main(){exit(printf(S) == strlen(S) ? 0 : 1);}
Seasoned Hacker
==================
% cc -o a.out ~/src/misc/hw/hw.c
% a.out
Guru Hacker
==================
% cat
Hello, world.
^^D
New Manager
==================
10 PRINT "HELLO WORLD"
20 END
Middle Manager
==================
mail -s "Hello, world." bob@b12
Bob, could you please write me a program that prints "Hello,
world."?
I need it by tomorrow.
^^D
Senior Manager
==================
% zmail jim
I need a "Hello, world." program by this afternoon.
Chief Executive
==================
% letter
letter: Command not found.
% mail
To: ^^X ^^F ^^C
% help mail
help: Command not found.
% damn!
!: Event unrecognized
% logout
-------------- next part --------------
begin: vcard
fn: Tim Preston
n: Preston;Tim
org: MDIS
adr;dom: ;;Boundary Way;Hemel Hempstead;Herts;HP2 7HU;
email;internet: tpreston@uk.mdis.com
title: Principal Consultant
tel;work: +44 1442 272084
tel;fax: +44 1442 272777
x-mozilla-cpt: ;0
x-mozilla-html: FALSE
version: 2.1
end: vcard
-------------- next part --------------
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
From tms at ansa.co.uk Thu Dec 18 16:33:47 1997
From: tms at ansa.co.uk (Toby Speight)
Date: Mon Jun 7 16:59:37 2004
Subject: Unspecified #IMPLIED attributes in Java (was: Goals: XML ...)
In-Reply-To: Peter Murray-Rust's message of "Thu, 18 Dec 1997 15:21:22"
References: <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>
Message-ID:
Peter> Peter Murray-Rust
> In article <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>,
> Peter wrote:
Peter> Is it the question of the return value of a non-existent
Peter> attribute. IOW what does
Peter>
Peter>
Peter>
Peter>
Peter>
Peter> return for
Peter> String s = element.getAttval("PLUGH");// could be "", or null
David has answered the original question (what is isSpecified() for in the
Java simple API?), but I thought I'd mention that DSSSL's attribute-string
function returns #f for PLUGH; the Java equivalent of this is of course,
null. I think this is the Right Thing to do; it's sometimes important to
tell the difference between and .
The first case is often used to mean a known, empty value; the second
to mean "not known" or "not applicable".
Concrete example: I'm a rock climber, and I keep a record of all my
climbing in XML format. Climbs are defined as
climbs.dtd>
climbs.dtd> grade CDATA ""
climbs.dtd> stars CDATA #IMPLIED
climbs.dtd> style (l|2|al|s|tr|mt) #IMPLIED
climbs.dtd> with CDATA #IMPLIED
climbs.dtd> >
Note the "stars" attribute, which is used for a climb's star rating
(an indication of quality). An instance looks like
climbs.xml> with="&p-hkm;">Difficult Crack
Here, the lack of stars is explicit - it's not a high-quality climb.
Whereas
climbs.xml> style="l">King's Chimney
is a climb in a part of Britain where the star system isn't used, and
so I omitted the attribute - even though it probably deserves a star
or two.
I would not want these two values confused!
--
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Thu Dec 18 20:00:48 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:37 2004
Subject: Goals: XML Event Interface
References: <199712181232.HAA00429@unready.microstar.com>
<3.0.1.16.19971218152122.29672990@pop3.demon.co.uk> <199712181458.JAA00315@unready.microstar.com>
Message-ID: <3499639F.CDAF9A3A@technologist.com>
David Megginson wrote:
>
> ...
> The attribute "bar" has the value "hack", and is not specified
> (i.e., it is a defaulted value).
>
> ...
> The attribute "bar" has the value "hack", and is specified.
I don't think nsgmls (for example) makes this distinction and I don't
remember ever wishing it did. When do you need to know this?
As an author, I certainly don't think that my software is going to work
differently if I use the default or specify it. I would be quite
disconcerted if it did.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Thu Dec 18 20:01:05 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:37 2004
Subject: Goals: XML Event Interface
In-Reply-To: <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>
References: <199712181232.HAA00429@unready.microstar.com>
<3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>
Message-ID: <199712181958.OAA00474@unready.microstar.com>
Perhaps I should clarify my question:
Should a common XML event-based API supply enough information to
build a DOM representation of a document?
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 18 21:02:45 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:37 2004
Subject: Unspecified #IMPLIED attributes in Java (was: Goals: XML
...)
In-Reply-To:
References:
<3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971218190703.0a3f9548@pop3.demon.co.uk>
At 16:32 18/12/97 +0000, Toby Speight wrote:
>Peter> Peter Murray-Rust
>
>> In article <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>,
>> Peter wrote:
>
>Peter> Is it the question of the return value of a non-existent
>Peter> attribute. IOW what does
>Peter>
>Peter>
>Peter>
>Peter>
>Peter>
>Peter> return for
>Peter> String s = element.getAttval("PLUGH");// could be "", or null
>
>David has answered the original question (what is isSpecified() for in the
>Java simple API?), but I thought I'd mention that DSSSL's attribute-string
>function returns #f for PLUGH; the Java equivalent of this is of course,
>null. I think this is the Right Thing to do; it's sometimes important to
>tell the difference between and .
I agree that it is the Right Thing to do. If everyone else agrees it is the
Right Thing to do I will be very happy. If 10% agrees and the other 90%
don't know what we are on about, we need to make sure they can't Go Wrong :-)
>
>The first case is often used to mean a known, empty value; the second
>to mean "not known" or "not applicable".
>
>Concrete example: I'm a rock climber, and I keep a record of all my
>climbing in XML format. Climbs are defined as
How exciting - I used to be (not a very good one).
[...]
>
>climbs.xml> climbs.xml> with="&p-hkm;">Difficult Crack
>
>Here, the lack of stars is explicit - it's not a high-quality climb.
>Whereas
>
>climbs.xml> climbs.xml> style="l">King's Chimney
>
>is a climb in a part of Britain where the star system isn't used, and
>so I omitted the attribute - even though it probably deserves a star
>or two.
Not according to MacInnes' star system; he gives it zero stars :-).
Seriously, if we adopt this system then we should make every effort to
promote it. One difficult area is in editors - how do you signal the
difference between
"" and null when entering a value in a box? You either have to get them to
input NULL (yukk) or add another button for "IMPLIED" (Ugh).
I'd like some other expert opinion on this. It's a tricky area if we get it
wrong.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tbray at textuality.com Thu Dec 18 21:05:45 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun 7 16:59:37 2004
Subject: Goals: XML Event Interface
Message-ID: <3.0.32.19971218130511.00a98cc4@pop.intergate.bc.ca>
At 02:58 PM 18/12/97 -0500, David Megginson wrote:
> Should a common XML event-based API supply enough information to
> build a DOM representation of a document?
Maybe, maybe not, depending what you mean by "common". For the simple
interface we're trying to build, this cannot be remotely a goal. The
only goal should be to give application authors access to the
elements, attributes and character data of a document in the most
transparent possible way. -Tim
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Thu Dec 18 21:54:57 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:37 2004
Subject: LISTRIVIA: A bit of fun
In-Reply-To: <199712181939.LAA29002@mehitabel.eng.sun.com>
Message-ID: <3.0.1.16.19971218223117.344fc9ca@pop3.demon.co.uk>
At 11:39 18/12/97 -0800, Murray Altheim wrote:
>PLEASE can you avoid having any fun. I have received private mail in
>support of this view and I shall be very boring in pursuing this. It's not
>difficult to avoid, and for most people it's a waste of time and money.
My apologies to anyone who was offended, or whose manager was offended.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jharmon at telecnnct.com Thu Dec 18 23:21:29 1997
From: jharmon at telecnnct.com (Jim Harmon)
Date: Mon Jun 7 16:59:37 2004
Subject: LISTRIVIA: A bit of fun
References: <3.0.1.16.19971218223117.344fc9ca@pop3.demon.co.uk>
Message-ID: <3499A963.5656AEC7@telecnnct.com>
Peter Murray-Rust wrote:
>
> At 11:39 18/12/97 -0800, Murray Altheim wrote:
> >PLEASE can you avoid having any fun. I have received private mail in
> >support of this view and I shall be very boring in pursuing this. It's not
> >difficult to avoid, and for most people it's a waste of time and money.
>
> My apologies to anyone who was offended, or whose manager was offended.
I've been lurking on this list for months now.
Peter's post is the first one I've actually copied to friends.
In sight of the holiday(s), I think it's appropriate to lighten up a
little.
Thankyou, Peter for the very entertaining post in a very staid topic
forum.
(And thank you, everyone else, for having this forum. I learn from you
all, every time I scan a message.)
> P.
>
> Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
> net connection
> VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
> http://www.venus.co.uk/vhg
>
> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
> (un)subscribe xml-dev
> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
> subscribe xml-dev-digest
> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
--
Jim Harmon The Telephone Connection
jim@telecnnct.com Rockville, Maryland
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Fri Dec 19 01:09:41 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:37 2004
Subject: Jade and isSpecified
In-Reply-To: <349990ED.64CBCE41@technologist.com>
References: <199712181232.HAA00429@unready.microstar.com>
<3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>
<199712181458.JAA00315@unready.microstar.com>
<3499639F.CDAF9A3A@technologist.com>
<199712182004.PAA00496@unready.microstar.com>
<349990ED.64CBCE41@technologist.com>
Message-ID: <199712190106.UAA00332@unready.microstar.com>
Paul Prescod writes:
> Could you help me find it? I can see that "implied" boolean
> characteristic on "attributes", but it only seems to mean really
> implied, not defaulted.
Sorry, my mistake -- it's Omnimark, not Jade, that tells you whether
an attribute was specified. With groves, I think that you'd need the
basesds1 module to get that information.
That said, the information _is_ available in SP itself, using
Boolean Attribute::specified()
in SP's native interface (see include/Attribute.h).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Dec 19 08:37:57 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:37 2004
Subject: LISTRIVIA: A bit of fun
In-Reply-To: <3499A963.5656AEC7@telecnnct.com>
References: <3.0.1.16.19971218223117.344fc9ca@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971219093411.09d7b57e@pop3.demon.co.uk>
At 17:53 18/12/97 -0500, Jim Harmon wrote:
>Peter Murray-Rust wrote:
>>
>> At 11:39 18/12/97 -0800, Murray Altheim wrote:
>> >PLEASE can you avoid having any fun. I have received private mail in
>> >support of this view and I shall be very boring in pursuing this. It's
not
>> >difficult to avoid, and for most people it's a waste of time and money.
>>
>> My apologies to anyone who was offended, or whose manager was offended.
>
>I've been lurking on this list for months now.
>
>Peter's post is the first one I've actually copied to friends.
>
>In sight of the holiday(s), I think it's appropriate to lighten up a
>little.
>
>Thankyou, Peter for the very entertaining post in a very staid topic
>forum.
>
>(And thank you, everyone else, for having this forum. I learn from you
>all, every time I scan a message.)
Lets' put people out of their misery! And not let it escalate :-) Murray's
post was intended in the same spirit and contained enough allusions to be
interpreted that way. However, I wasn't *absolutely* sure and couldn't
afford to post a humorous reply if it *were* genuine. So my reply was
deadpan and covered all eventualities. [It is remarkable how easy it is to
get entangled in farcelike situations in the virtual world. One of mine,
which I dare not repeat, arose from a 1:10000 chance and let to an almost
Shakespearean comedy.]
Seriously, many SGML documents *do* look very similar to the middle of the
posting. With catalogs, SGML declarations, entity sets, DTDs, parameter
entities, etc. it is possible to obfuscate SGML documents pretty well. XML
self-denies itself the first two, but the spec itself gets off to a good
start with the "tricky" entity replacement. XLL adds the ability of
Xpointers and SHOW="EMBED" to do some interesting transclusion. And there
is always Unicode :-) So some limited examples could be educational....
P.
>
>> P.
>>
>> Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
>> net connection
>> VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
>> http://www.venus.co.uk/vhg
>>
>> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
>> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
>> (un)subscribe xml-dev
>> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
>> subscribe xml-dev-digest
>> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
>
>--
> Jim Harmon The Telephone Connection
>jim@telecnnct.com Rockville, Maryland
>
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Dec 19 10:08:58 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:37 2004
Subject: XML as a programming tool
Message-ID: <3.0.1.16.19971219110605.54ef09d6@pop3.demon.co.uk>
This message is probably trivial for those with a lsp+ gene, but it may
open new horizons for those like me.
It has come as a revelation to me that XML *with its assorted toolkit* is a
powerful programming aid for many applications. Most (non-textual)
applications of XML will come with a Tree tool including editing, display,
searching (a la TEI Xpointer), and transformation. These facilities are
extremely useful in program development and maintenance. Since JUMBO
implements all of these I have started to use these *in creating JUMBO
itself*, and potentially as library routines for other non-Jumboid
applications.
For example, I have been revising the menu structure in JUMBO under
java.awt. It's easy to make the mistake of hardcoding this, so it needs a
flexible data structure. Moreover the menus may easily be changed at
runtime (e.g. a new DTD or namespace may be loaded). Java menus (presumably
like many other systems) are tree-structured with a number of different
terminals (e.g. addSeparator();).
I have therefore created the data structure as an XML document, which is
built into a tree at startup. This is very easily extensible, both in
structure (e.g. adding new MenuItems or Menus) or adding properties to
individual parts (e.g.
** Because of the Xpointer I don't have to remember the structure of the
tree!! **. I can just search for a
DESCENDANT(ALL,MENUITEM,TITLE,Print)ANCESTOR(1,FILE), for example to get
all instances of the "Print" command in the menu (and, say,
SGMLNodeSet.addSGMLAttribute("ENABLED", "T")
An amusing byproduct is that the menu itself is available as a tree, and so
can be navigated or edited. It's trivial to attach HELP to the nodes of
this tree. So it's a really efficient re-use of tools.
As I may have mentioned before, I am converting all *external* files to XML
so that a JUMBO application can rely on namespace schemas,
mimetypes/helpers, Classloaders, DTDs, Help, semantic validation, etc. all
being manageable through XML technology. The benefits of this (at least
for me) are enormous! Obviously all of this is in Java for JUMBO, but I
assume that people will convert or develop tools for other languages such
as C and UNIX. I hope that we may see
man xmltree
or
man teisearch
on UNIX systems in the near future and that people will be able to use the
treetools that these provide. Obvious extensions to other environments.
I am sure that the original proposers of XML saw and knew all of this, but
it must be very clear that XML has much more to offer than 2D paper
technology :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tyler at infinet.com Fri Dec 19 10:41:05 1997
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 16:59:37 2004
Subject: XML as a programming tool
References: <3.0.1.16.19971219110605.54ef09d6@pop3.demon.co.uk>
Message-ID: <341E5BC5.1830BD4F@infinet.com>
Peter Murray-Rust wrote:
> This message is probably trivial for those with a lsp+ gene, but it may
> open new horizons for those like me.
>
> It has come as a revelation to me that XML *with its assorted toolkit* is a
> powerful programming aid for many applications. Most (non-textual)
> applications of XML will come with a Tree tool including editing, display,
> searching (a la TEI Xpointer), and transformation. These facilities are
> extremely useful in program development and maintenance. Since JUMBO
> implements all of these I have started to use these *in creating JUMBO
> itself*, and potentially as library routines for other non-Jumboid
> applications.
>
In case everyone does not already know, JDK 1.2 beta 2 is out on SUN's web site
at http://java.sun.com
It now has unsynchronized collection classes which should significantly improve
the performance of any parser which uses these features since most parsers only
use one thread anyways. Hashtable and Vector have everything synchronized which
slows things down a lot. I just thought this might be of use to anyone
developing XML parsers.
Tyler
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From M.H.Kay at eng.icl.co.uk Fri Dec 19 11:50:46 1997
From: M.H.Kay at eng.icl.co.uk (Michael Kay)
Date: Mon Jun 7 16:59:37 2004
Subject: GEDCOM model in XML
Message-ID: <01bd0c74$3f1f56c0$1e09e391@mhklaptop.bra01.icl.co.uk>
Mark L. Fussell:
>I would strongly suggest first designing the genealogical object model
>from the GEDCOM definitions (and other sources) without considering XML
>or DOM at all.
Thanks, yes. I agree absolutely. Fortunately my background is in data
modelling so I'm happy with this side of things.
My design problem is whether to implement the genealogical objects as
pointers to XML DOM objects or as copies/conversions of data extracted from
DOM objects. Of course the choice can be hidden behind the interface.
Peter Murray-Rust:
>Any pure tree is extremely easy to represent in XML. So, if you simply want
>to trace an ancestor tree (i.e. two parents, 4 grand parents, etc.) this is
>trivial....
>The difficulty comes when the graph has cycles. I am not an expert
>genealogist, but most 'family trees' seem to me to be Directed Acyclic
>Graphs (DAGs) where the arcs are isParentOf(); and is directional. DAGs
>are common ...
>
Unfortunately the "family tree" is not isomorphic with the XML tree. There
is
no hierarchic relationship between a husband and wife. It isn't even a
DAG, (because I can record relationships like "A is-the-godfather-of B"
and "B is-the-executor-of A" ).
>Note that the use of ID/IDREF may introduce additional complexity.
>Personally I am not clear on the value of IDREF over XLL - it's not trivial
>to support in a browser and I doubt that JUMBO will do it.
>
I am currently using ID/IDREF to represent these relationships, because
it maps directly to the current GEDCOM standard. I still feel uncomfortable
that this is unrelated to the XLL linking model. I do recognise that
displaying
information in a genealogically-useful way is going to require application
logic, and won't be achieved by general purpose XML tools; though it
would certainly be nice if the general tools made it easy to follow ID/IDREF
relationships.
Mike Kay
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Dec 19 12:10:40 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:37 2004
Subject: JDK 1.2 (was Re: XML as a programming tool)
In-Reply-To: <341E5BC5.1830BD4F@infinet.com>
References: <3.0.1.16.19971219110605.54ef09d6@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971219124142.0c4fe670@pop3.demon.co.uk>
At 06:13 16/09/97 -0400, Tyler Baker wrote:
Thanks very much Tyler - this was news to me.
>
>In case everyone does not already know, JDK 1.2 beta 2 is out on SUN's web
site
>at http://java.sun.com
>
JUMBO is 1.02 and I was planning to go to 1.1.4. Is there any reason (other
than brain overload) why I shouldn't now jump straight to 1.2? i.e. is the
beta reasonably stable?
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jjc at jclark.com Fri Dec 19 13:01:45 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun 7 16:59:37 2004
Subject: RFC: Simple XML Event-Based API for Java
References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> <199712171903.OAA04014@unready.microstar.com> <3498E043.5F764F28@technologist.com>
Message-ID: <349A6636.7D7E28AF@jclark.com>
Paul Prescod wrote:
> XML attributes will probably have relative URLs in them and the XML
> Application will have to know how to resolve them.
This reminds me of another reason why you need positional information
even in a simple interface.
Suppose you have a document doc.xml that references an external parsed
entity chapters/3.xml and suppose chapters/3.xml contains some element
with an attribute that is a relative URL "4.xml". I would claim that
the appropriate URL to use as the base for resolving that relative URL
is the URL of the resource that contains the URL, so that relative to
the document URL, that relative URL should be interpreted as
chapters/4.xml rather than 4.xml. But unless the parser passes through
positional information, there's no way an application can do this. I
think apps are going to need at least:
startExternalEntity(URL url)
endExternalEntity()
James
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tyler at infinet.com Fri Dec 19 13:34:20 1997
From: tyler at infinet.com (Tyler Baker)
Date: Mon Jun 7 16:59:37 2004
Subject: JDK 1.2 (was Re: XML as a programming tool)
References: <3.0.1.16.19971219110605.54ef09d6@pop3.demon.co.uk> <3.0.1.16.19971219124142.0c4fe670@pop3.demon.co.uk>
Message-ID: <341E8472.4A859008@infinet.com>
Peter Murray-Rust wrote:
> At 06:13 16/09/97 -0400, Tyler Baker wrote:
>
> Thanks very much Tyler - this was news to me.
>
> >
> >In case everyone does not already know, JDK 1.2 beta 2 is out on SUN's web
> site
> >at http://java.sun.com
> >
>
> JUMBO is 1.02 and I was planning to go to 1.1.4. Is there any reason (other
> than brain overload) why I shouldn't now jump straight to 1.2? i.e. is the
> beta reasonably stable?
>
>
Well yah most of it is stable. It has no new language features other than weak
references (which really are not a language feature like inner classes), but a
lot new API's including Swing.
Tyler
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Fri Dec 19 14:20:47 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:37 2004
Subject: GEDCOM model in XML
In-Reply-To: <01bd0c74$3f1f56c0$1e09e391@mhklaptop.bra01.icl.co.uk>
Message-ID:
On Fri, 19 Dec 1997, Michael Kay wrote:
> Mark L. Fussell:
> >I would strongly suggest first designing the genealogical object model
> >from the GEDCOM definitions (and other sources) without considering XML
> >or DOM at all.
>
> Thanks, yes. I agree absolutely. Fortunately my background is in data
> modelling so I'm happy with this side of things.
>
> My design problem is whether to implement the genealogical objects as
> pointers to XML DOM objects or as copies/conversions of data extracted from
> DOM objects. Of course the choice can be hidden behind the interface.
There is another choice: build your DomainObjects directly from the XML
Event stream. This is what MONDO/mindo supports doing and could also be
done in several other ways.
With that change in focus you then have (at least) three choices: (1)
Provide the DOM interfaces onto existing Domain classes. This would work
if your Domain Model is easily represented as a simple containment
hierarchy and you only have one such view. (2) Generate a DOM specific
view when it is asked for and link the generated objects to the original
domain objects. This allows multiple DOM perspectives on the same
DomainModel and enables some transformation between the classes
(collapsing of associations into simple attributes). (3) Provide one or
more DOM Adapters onto the Domain classes, which provide similar
functionality as (2) but do not maintain a seperate "cache" of DOM
specific state. This is basically the same approach as Tim Howard's
DomainAdapter except using document terminology instead of general GUI
terms.
You can also combine these approaches in various ways. Effectively (3)
is the most general since it simply says: you can functionaly transform
the Domain into a DOM model. (2) Caches that result [and allows
intermediate transitions]. (1) Says the transform is trivial: 1-1. So
these are just gradations in function and state transforms.
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From markb at iosphere.net Fri Dec 19 14:34:20 1997
From: markb at iosphere.net (Mark Baker)
Date: Mon Jun 7 16:59:37 2004
Subject: XML as a programming tool
In-Reply-To: <3.0.1.16.19971219110605.54ef09d6@pop3.demon.co.uk>
Message-ID:
On Fri, 19 Dec 1997, Peter Murray-Rust wrote:
> It has come as a revelation to me that XML *with its assorted toolkit* is a
> powerful programming aid for many applications.
Yes! It's part of a shift away from Turing completeness and towards
declarative programming.
Curiously enough, it's been approached from two different angles by two
different camps.
The Web/Hypertext camp has, to my knowledge, had this vision for ages.
But only recently has the distributed object camp been leaning in this
direction.
There's a project at PARC called "Aspect Oriented Programming", that's
attempting to evolve component software to widen the scope of interface
declarations (even beyond contracts). Basically, the many "aspects" of a
typical program are separated out into a minimal Turing complete core,
plus lots of declarative documents specifying such information as
concurrency, data flow, compositional structure, etc..). All of this is
run through a "weaver" to produce your end product.
http://www.parc.xerox.com/spl/projects/aop/
You might also be interested in a paper that Adam Rifkin and Rohit Khare
have submitted to WWW7;
http://www.cs.caltech.edu/~adam/papers/www/origin-of-species.html
Since this is a little off-topic, I'd recommend that any followups
be taken off-list. Then again, Peter did start it ... 8-)
MB
--
Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans
http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069
Will distribute business objects for food.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From fussellm at alumni.caltech.edu Fri Dec 19 14:52:02 1997
From: fussellm at alumni.caltech.edu (Mark L. Fussell)
Date: Mon Jun 7 16:59:37 2004
Subject: Unspecified #IMPLIED attributes in Java (was: Goals: XML ...)
In-Reply-To:
Message-ID:
On 18 Dec 1997, Toby Speight wrote:
> David has answered the original question (what is isSpecified() for in the
> Java simple API?), but I thought I'd mention that DSSSL's attribute-string
> function returns #f for PLUGH; the Java equivalent of this is of course,
> null. I think this is the Right Thing to do; it's sometimes important to
> tell the difference between and .
I certainly agree that it is useful to tell the difference between these
two cases, but it does bring up the issue that Peter said: do all users
understand the issue? Also, null can only be used for 'notSpecified' if
null is not an acceptable value. Frequently it is, so it is better to
have a seperate 'notSpecified' marker or attribute.
> The first case is often used to mean a known, empty value; the second
> to mean "not known" or "not applicable".
Standardizing on a particular interpretation is unfortunately much more
difficult. Relational databases have generally failed at this (SQL is
broken because of it) and Codd now uses multiple "marks" in his view of
the Relational model. The problem is that there are many
possible and useful interpretations of "missing information":
(1) Uninitialized
(2) Inapplicable
(3) NotYetKnown
(4) NotEntered
(5) FunctionallyUncomputable
(6) OutOfDomainBounds
and so on... See C.J. Date's writings for good descriptions of the above.
It is always [yes, I believe always] better to be explicit about what is
known (which can include explicitly what is not known) than it is to rely
on a meaning for something that is "missing". So:
<... stars="0" >
<... noStarRating="true" >
are all better than to just leave 'stars' off and imply an application
meaning.
But it can be convenient to not be so "wordy". In which case the
application will have to be very explicit and consistent about what
'notSpecified' means (and, for XML, how that relates to #IMPLIED when
there is a DTD). For MONDO, this can be very consistent because
'notSpecified' and #IMPLIED are both treated exactly equivalent to the
parameter not existing. But other applications may have difficulty with
this.
But, in general, defaults seem to be easily understood and anything else
is on the brink of infinite possibilities.
--Mark
mark.fussell@chimu.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From smith at interlog.com Fri Dec 19 15:05:24 1997
From: smith at interlog.com (Chris Smith)
Date: Mon Jun 7 16:59:37 2004
Subject: XML as a programming tool
In-Reply-To:
Message-ID:
On Fri, 19 Dec 1997, Peter Murray-Rust wrote:
> It has come as a revelation to me that XML *with its assorted toolkit* is a
> powerful programming aid for many applications.
See http://www.cam.org/~pierlou/prototype/ for an app, "Prototype".
I haven't actually tried this yet, but it appears to be exactly this
type of thing.
---------------------------------------------------------------------------
Chris Smith
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Fri Dec 19 15:31:24 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:37 2004
Subject: RFC: Simple XML Event-Based API for Java
In-Reply-To: <349A6636.7D7E28AF@jclark.com>
References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca>
<199712171903.OAA04014@unready.microstar.com>
<3498E043.5F764F28@technologist.com>
Message-ID: <3.0.1.16.19971219152149.54bf9f52@pop3.demon.co.uk>
At 19:19 19/12/97 +0700, James Clark wrote:
>
>Suppose you have a document doc.xml that references an external parsed
>entity chapters/3.xml and suppose chapters/3.xml contains some element
>with an attribute that is a relative URL "4.xml". I would claim that
>the appropriate URL to use as the base for resolving that relative URL
>is the URL of the resource that contains the URL, so that relative to
>the document URL, that relative URL should be interpreted as
>chapters/4.xml rather than 4.xml. But unless the parser passes through
>positional information, there's no way an application can do this.
I would strongly support this interpretation. It's the natural one from
HTML browsers and it is what I have implemented in XLL in JUMBO. I have
found that the best way forward for me is that every WF fragment possesses
a URL, since it may further reference other fragments. This works OK for me
as far as I have got, but I am not a URL specialist. I don't know what
happens when we get XML which is formed 'in vacuo' - e.g. as part of a
serialized object, typed in on the command line, etc. :-)
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tms at ansa.co.uk Fri Dec 19 15:32:54 1997
From: tms at ansa.co.uk (Toby Speight)
Date: Mon Jun 7 16:59:38 2004
Subject: Unspecified #IMPLIED attributes in Java
In-Reply-To: "Mark L. Fussell"'s message of "Fri, 19 Dec 1997 06:51:24 -0800 (PST)"
References:
Message-ID:
Mark> Mark L. Fussell
> In article , Mark
> wrote:
Mark> On 18 Dec 1997, Toby Speight wrote:
>> ... DSSSL's attribute-string function returns #f for [unspecified
>> #IMPLIED attributes]; the Java equivalent of this is of course, null.
>> I think this is the Right Thing to do; it's sometimes important to
>> tell the difference between and .
Mark> I certainly agree that it is useful to tell the difference
Mark> between these two cases, but it does bring up the issue that
Mark> Peter said: do all users understand the issue?
That's up to the application program. I have no problem with programs
that treat the two examples the same *provided their documentation
says that's what they are doing* (though I'd be more likely to declare
the default value to be the empty string in the DTD). In DSSSL, this
behaviour would be
(let ((val (attribute-string "bargh")))
(if val
val
""))
Mark> Also, null can only be used for 'notSpecified' if null is not an
Mark> acceptable value. Frequently it is, so it is better to have a
Mark> seperate 'notSpecified' marker or attribute.
Are we talking about the same thing here? If the parser returns a
string for each attribute value, then the Java null reference is
distinct from any acceptable (i.e. writable in the XML document)
value. You've confused me with your suggestion that null may be an
acceptable value; would you care to clarify?
>> The first case is often used to mean a known, empty value; the second
>> to mean "not known" or "not applicable".
Mark> Standardizing on a particular interpretation is unfortunately
Mark> much more difficult. ... The problem is that there are many
Mark> possible and useful interpretations of "missing information":
Mark> ...
I realise this; I was merely attempting to describe what #IMPLIED is
used for in practice, with specific application[*] conventions -
that's why I used the word "often" ;-).
[*] using the word "application" in its SGML sense - argh!
Mark> But it can be convenient to not be so "wordy". In which case the
Mark> application will have to be very explicit and consistent about what
Mark> 'notSpecified' means (and, for XML, how that relates to #IMPLIED when
Mark> there is a DTD).
Agreed.
Mark> For MONDO, this can be very consistent because 'notSpecified' and
Mark> #IMPLIED are both treated exactly equivalent to the parameter not
Mark> existing. But other applications may have difficulty with this.
I've been looking at it the other way around - to me, it seemed "obvious"
to return #IMPLIED as null, and then to think about whether the no-DTD
case is equivalent. [I think that that bias springs from the fact that I
haven't written any DTD-less applications and I generally use traditional
SGML tools (SP, Jade, psgml-mode, etc.).]
FWIW, I concur that DTD-less processing ought to be equivalent to
specifying all attributes as #IMPLIED, but for the parser API, there
is a difference: I think that the parser should return null in the
valid-processing case, but in the well-formed DTD-less case, it cannot
know that the attribute has been omitted, and so will return neither
the name nor the value of the attribute.
If I write a {grove, tree} builder, it would be useful to know whether
a DTD was used, so that it can report an error to an application trying
to access an attribute that was not declared (this may be the symptom
of a typo, perhaps). If a DTD was not used for the parse, then the
access should return null (as if the attribute were declared #IMPLIED).
--
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From SimonStL at classic.msn.com Fri Dec 19 16:17:52 1997
From: SimonStL at classic.msn.com (Simon St.Laurent)
Date: Mon Jun 7 16:59:38 2004
Subject: XML as a programming tool
Message-ID:
One of the examples in my book was of state-driven programming. It's not
exactly like writing programs - the XML document specifies states and triggers
for those states. I'd foolishly thought about using it for a remote control
airplane, but the prospect of crashes (more than the computer) was not so
pleasant. Instead, it controls light switches, which are a lot safer most of
the time. It might also be an interesting tool for model railroads - feed a
controller an XML schedule, let the controller run the train. (Just don't let
any real railroad hear about this.)
I guess it's programming like 'programming' a VCR - someone else has written
the program, I just feed it the data that controls its behavior. Still, even
that limited prospect was exciting, and capable of some pretty complex stuff.
Simon St.Laurent
Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February)
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From cskerr at geocities.com Fri Dec 19 16:32:58 1997
From: cskerr at geocities.com (Charles Kerr)
Date: Mon Jun 7 16:59:38 2004
Subject: JDK 1.2 (was Re: XML as a programming tool)
Message-ID: <001801bd0c9b$da184610$375c0f81@plato>
The APIs seem to be mostly stable -- if I were you I'd try jumping to 1.2.
However, every time I try to use MSXML 1.8 with the JDK 1.2 beta, I get
an Exception...
>> JUMBO is 1.02 and I was planning to go to 1.1.4. Is there any reason
(other
>> than brain overload) why I shouldn't now jump straight to 1.2? i.e. is
the
>> beta reasonably stable?
>
>Well yah most of it is stable. It has no new language features other than
weak
>references (which really are not a language feature like inner classes),
but a
>lot new API's including Swing.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Fri Dec 19 17:00:00 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:38 2004
Subject: Unspecified #IMPLIED attributes in Java
References:
Message-ID: <349A9A89.D145C298@technologist.com>
Toby Speight wrote:
>
> If I write a {grove, tree} builder, it would be useful to know whether
> a DTD was used, so that it can report an error to an application trying
> to access an attribute that was not declared (this may be the symptom
> of a typo, perhaps). If a DTD was not used for the parse, then the
> access should return null (as if the attribute were declared #IMPLIED).
The DSSSL model is that trying to access a random attribute merely
returns #f. Although this could allow a typo to pass, it has the benefit
of making stylesheets a little more robust to DTD variations. For
instance, the same stylesheet can work with various versions of HTML.
Paul Prescod
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Fri Dec 19 18:40:27 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 16:59:38 2004
Subject: XML as a programming tool
Message-ID: <199712191840.SAA30600@mail.iol.ie>
The concept of a DTD has a resonance with data driven programming such as
JSP Jackson Structured Programming and JSD - Jackson System Design.
I have on occasion used DTDs to document time ordered interfaces to objects. It
can be a very powerful technique!
Take a really simple object interface - an object with open,close,read,write
methods.
These have a time ordering which is not captured in this:
int open();
int close();
int read();
int write();
Compare this:-
Sean Mc Grath
sean at digitome dot com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Fri Dec 19 18:58:12 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:38 2004
Subject: AElfred 1.0beta4 release
Message-ID: <199712191855.NAA03585@unready.microstar.com>
There is a new version of Microstar's Ælfred XML parser available,
incorporating some of the suggestions that have come up in recent
discussions on this list. You can try out the new version online or
download it using the following URL:
http://www.microstar.com/XML/
Ælfred 1.0beta4 contains some major changes to the interface:
1. New callbacks
void startExternalEntity (XmlParser p, URL systemId)
void endExternalEntity (XmlParser p, URL systemId)
void charData (XmlParser p, char ch[], int length)
void ignorableWhitespace (XmlParser p, char ch[], int length)
2. Removed callbacks
void data (XmlParser p, String data)
2. Modified callbacks
void startDocument (XmlParser p)
void attribute (XmlParser p, String aname, String value, boolean isSpecified)
Apologies in advance to those of you who have already integrated
Ælfred into your tools -- I hope that the changes won't cost you more
than 15 minutes or so of modification and testing.
The addition of ignorable whitespace is required by the XML spec
(though Ælfred is non-conforming for error-reporting, I want the
information that it provides to be correct).
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Fri Dec 19 19:59:47 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:38 2004
Subject: Plug and Play XML
Message-ID: <199712191957.OAA04476@unready.microstar.com>
I recently had a request about samples texts to use with Ælfred
(Microstar's XML parser). With Ælfred, or any other URL-enabled XML
parser, you should be able to parse an XML document directly from the
Internet.
For example, when you download aelfred-1.0beta4.zip (from
http://www.microstar.com/XML/), you should be able to just unzip it
and point it at a URL, with no other setup. With the JDK, you change
to the directory where you unzipped Ælfred and type
java EventDemo
With Microsoft's Java VM, you can type
jview EventDemo
(Of course, you can run the command from any directory once Ælfred is
on your classpath).
Here are two URLs that you can use to start playing:
http://www.microstar.com/XML/donne.xml
http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml
Type them in, and watch the events roll down your screen -- no manual
downloading required.
I'd love to see the URLs for more online XML documents that we can all
try out (the XML specification at www.w3.org does not currently work,
because of character-encoding errors in the XML document). I might
put up Beowulf in UTF-8, just to keep the other parser writers busy...
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From Jon.Bosak at eng.Sun.COM Fri Dec 19 22:26:04 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun 7 16:59:38 2004
Subject: LISTRIVIA (was Re: RFC: Simple XML Event-Based API for Java)
In-Reply-To: <3.0.1.16.19971217080814.49c74e72@pop3.demon.co.uk> (message from Peter Murray-Rust on Wed, 17 Dec 1997 08:08:14)
Message-ID: <199712192224.OAA01708@boethius.eng.sun.com>
I don't ordinarily send mail just to say "me too," but I want to
publicly support Peter in his campaign against unnecessary quoting and
attachments in mail to public lists. For people like me who archive
their mail and have to get a lot of it over a phone line, such things
are enormously annoying *despite* the fact that my company is picking
up the expense. I can't imagine how frustrating it must be for people
who are paying by the kilobyte.
Jon
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From donpark at quake.net Fri Dec 19 22:47:33 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun 7 16:59:38 2004
Subject: LISTRIVIA (a proposal)
Message-ID: <000a01bd0ccf$98a9f280$0100007f@localhost>
Here is my "me too" and a proposal. Lets shorten the xml-dev signature
from:
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
to:
xml-dev: XML Developer mailing list. For info:
http://ic.ac.uk/xmldev/info.html.
Don
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 20 00:47:51 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:38 2004
Subject: LISTRIVIA (a proposal)
In-Reply-To: <000a01bd0ccf$98a9f280$0100007f@localhost>
Message-ID: <3.0.1.16.19971220002313.2d0f2b72@pop3.demon.co.uk>
At 14:43 19/12/97 -0800, Don Park wrote:
>Here is my "me too" and a proposal. Lets shorten the xml-dev signature
I'll let Henry reply to this. He *did* ask me a few days ago about
shortening it, and I suggested not - but maybe we should reconsider. I
suspect Henry does not have resources on the list server itself, so might
have to put it at www.ch.ic.ac.uk.
We have been very lucky in the lack of 'Unsubscribes' on this list and
perhaps there isn't a need for such a long .sig. But it's a difficult
business. If you make it too difficult (after all everyone forgets the
syntax of the list administration) then this simply means that Henry (not
me ) gets all the "please help me, I want to get off" mails which none of
the rest of us see.
P.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 20 00:51:11 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:38 2004
Subject: LISTRIVIA (Duplicate postings)
In-Reply-To: <199712192224.OAA01708@boethius.eng.sun.com>
References: <3.0.1.16.19971217080814.49c74e72@pop3.demon.co.uk>
Message-ID: <3.0.1.16.19971220014355.37af7b70@pop3.demon.co.uk>
At 14:24 19/12/97 -0800, Jon Bosak wrote:
[... in support of unnecessary bytecount on the list...]
It's something that comes partially out of the SGML culture. When I first
started posting to comp.text.sgml, I was quickly shown by Erik Naggum -
gently but very firmly - the appropriate way to use quoting. For those who
remember his time on c.t.s., I think Erik is one of the most precise people
I have "met" on the Internet.
There is another matter of style, which I was going to raise at an
appropriate time, but which Jon's contribution has catalysed me to mention.
On XML-SIG there is a very strict policy against duplicate postings.
Penalties (which of course are confidential) are Draconian. I'll explain
the problem...
A duplicate posting occurs when someone (B) replies to the list and
simultaneously to the poster (A). If you do the arithmetic you will see
that the original poster (A) gets two copies of the message, one from the
list (L) and one from (B). Not quite identical because the headers are
different, so they *look* like different messages. It gets quite
disappointing for (A) to find that it's the same old letter again. Again,
if you do the sums you will see that (A) gets about twice as many bytes as
they really want. If you think deeply about the psychology, you'll see that
it often has a similar effect on (A) as unnecessary quoting has.
Now, if you don't *post* to the list, you won't be aware of this. BUT, if
you do, then you'll find that sometimes you get two copies with the same
content. You'll also start to recognise the people who fall into category
(B).
Why do they do it? Not because (B) wants to upset (A), IMO. It works
something like this:
When (B) gets a message posted by (A) to the list, (B) will see two fields
in the header, something like this:
To: xml-dev@ic.ac.uk
From: A
[This is not very attractive markup, and will look much nicer when mailers
represent it as:
A
xml-dev@ic.ac.uk
but a surprisingly large number of people can, in fact, interpret the first
syntax without error. It is normally taken to mean that A sent a message to
xml-dev@ic.ac.uk, and that xml-dev@ic.ac.uk sent it on to all the
participants.
Now, it starts to get a bit complicated. Let's assume that B is a member of
XML-DEV, and wants to reply so that everyone can see what they (B) have
written. Most mailers have a "Reply" option, often on a menu, or by
pressing the "R" key. If you simply Reply to the message, it will go to
(A), because most mailers look in the "From" field and assume that you want
to send to the address represented by the content of the "From:" fields. So
the mailer would generate a reply something like:
To: A
From: B
and the message would go to (A), the original poster.
Rats! This isn't what B wanted. Of course they (B) want (A) to read the
message, but they also want everyone else on XML-DEV to read it. One way to
do it would be to type the words "xml-dev@ic.ac.uk" into the "To:" field,
like this:
To: xml-dev@ic.ac.uk
From: B
and, perhaps surprisingly, this actually works - i.e. it sends a message
from B to the XML-DEV list.
So, what's the problem? Well, typing "xml-dev@ic.ac.uk" is 16 characters
and it's very tedious to type this and check that it's right. So there's a
clever way round this. Many mailers have a "Reply to All" function. This
looks at everyone mentioned in the mail header and sends them all a copy of
the mail. So when (B) Replys in this fashion, their outgoing mail header
looks something like this:
To: xml-dev@ic.ac.uk, A
From: B
So everyone on XML-DEV and A gets a copy. This is just what B wants.
Everyone's happy.
Unfortunately not. There's a very subtle point which lots of people quite
naturally miss. A gets sent a message. And everyone on XML-DEV gets a
message. But wait! A is a member of XML-DEV. The majordomo at ic.ac.uk
isn't clever enough to know that B has sent their own personal copy of the
mail to A. So, if you do the arithmetic, you'll se that A gets TWO copies
of the message. And, if you think very carefully, you'll see that they
aren't quite the same. One has a header saying that it has come from
XML-DEV, and the other that it has come from B. But the content of the two
messages is the same.
What can be done about it? Well, those of you who have followed so far will
see that deleting the string "A" from the To: field will solve the problem.
But this is often quite long - it might be something like:
"Peter Murray-Rust"
which is now *45* characters - a lot of deleting. And easy to miss one out.
But there's a clever trick, which perhaps not everyone knows (and probably
works on most mailers). It needs practice, but most people learn in time.
A. click the cursor just in front of the string you want to remove. You may
see a vertical bar, or block character.
B. Without taking your finger off the mouse, move it slowly to the right.
The background to the letters will go green! [It might be blue on some
machines, but don't worry.] When you've got to the end of the string (the
one you want to delete) take your finger *off* the mouse. The background
will still be green!
C. Now - before you do anything else, find the "Delete" key. It's usually
got "Delete" written on it. Sometimes it says "Del", or sometimes "DEL".
Press it firmly, just once. The green string will disappear, *and* all the
letters in it.
D. *Now* you can press the "Send" button. If you work it out, your To:
field will be simply:
To: xml-dev@ic.ac.uk
just as if you'd typed it in, but so much less effort.
I realise this has been a long tutorial, and we've not even been able to
cover the Cc: field, or what to do without a mouse. But if you can master
this, you'll probably be able to manage the Cc: field [It stands for "Copy"
and when you "Reply to all" you'll reply to people in that field as well.
If it happens to be (A) you can use the same technique to delete the
characters.]
So, let's see if we can get the duplicate postings down to zero :-) Then I
won't even have to mention things that might otherwise happen...
P.
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Sat Dec 20 01:41:22 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:38 2004
Subject: XML as a programming tool
References: <199712191840.SAA30600@mail.iol.ie>
Message-ID: <349ACE06.BFB3D358@technologist.com>
Sean Mc Grath wrote:
>
> The concept of a DTD has a resonance with data driven programming such as
> JSP Jackson Structured Programming and JSD - Jackson System Design.
>
> I have on occasion used DTDs to document time ordered interfaces to objects. It
> can be a very powerful technique!
We discuss this in a paper we gave at SGML/XML 97. We call this a
"protocol."
"Software Component Interface Description in SGML"
"Additional architectural constraints may be provided which currently
are not enforced by any programming language."
"Examples include protocols and design patterns. Protocols are
permissible sequences of method invocation and attribute access,
possibly with additional temporal constraints. Design patterns are
specifications of a set of roles in a pattern and identification of the
mapping of specific classes and methods in the current definitions onto
these roles."
http://www.cgl.uwaterloo.ca/meta/sgml97/mmccool/index.html
I do see an interesting correlation between the ideas in that paper and
the aspect programming paper someone posted earlier.
ON THE OTHER HAND, protocols should be rare in good software design. You
can usually define an interface so that it doesn't require much explicit
time ordering. For instance you can open file objects automatically when
they are created and close them automatically when they are destroyed.
--
Paul Prescod -- http://itrc.uwaterloo.ca/~papresco
Art is always at peril in universities, where there are so many people,
young and old, who love art less than argument, and dote upon a text
that provides the nutritious pemmican on which scholars love to chew.
-- Robertson Davies in "The Cunning Man"
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 20 09:38:16 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:38 2004
Subject: XML as a programming tool
In-Reply-To: <349ACE06.BFB3D358@technologist.com>
References: <199712191840.SAA30600@mail.iol.ie>
Message-ID: <3.0.1.16.19971220103508.2b3f1292@pop3.demon.co.uk>
At 14:41 19/12/97 -0500, Paul Prescod wrote:
>Sean Mc Grath wrote:
>>
>> The concept of a DTD has a resonance with data driven programming such as
>> JSP Jackson Structured Programming and JSD - Jackson System Design.
>>
>> I have on occasion used DTDs to document time ordered interfaces to
objects. It
>> can be a very powerful technique!
>
>We discuss this in a paper we gave at SGML/XML 97. We call this a
>"protocol."
[...]
>"Software Component Interface Description in SGML"
[...]
>http://www.cgl.uwaterloo.ca/meta/sgml97/mmccool/index.html
These look very interesting. AIUI Paul's tool is for generating code and
documentation for software projects, essentially by attaching semantics to
an SGML document. In a sense the document is acting as a series of
instructions. There would seem to be extensions to recipes in general, so
that XML could be used to perform tasks - this is the vision I have for
chemistry, for example (though it could also work for cakes). In a sense
that is what I am doing in my simple case with Java menus. has
the implied semantics of "insert a call to addSeparator() at this point".
requests calls to a hierarchy of new Menu
and new Menuitem calls.
This is another reason, for example, the BEHAVIOR attribute in XLL seems
important. You could use it to do lots of things, "directed" by a core XML
script. I have already suggested we would benefit from some agreed
semantics, so that we can write the code that carries them out. For
example, BEHAVIOR="display" would call the display() routine (this is what
JUMBO does at present), but BEHAVIOR="doit" could call the doit() routine.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Sat Dec 20 10:46:54 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 16:59:38 2004
Subject: XML as a programming tool
Message-ID: <199712201046.KAA31284@GPO.iol.ie>
[Paul Prescod]
>
>ON THE OTHER HAND, protocols should be rare in good software design. You
>can usually define an interface so that it doesn't require much explicit
>time ordering. For instance you can open file objects automatically when
>they are created and close them automatically when they are destroyed.
>--
Oh, I'd have to disagree with you there!
In many problem domains object abstractions
are used to represent things that have a life history: bank accounts,
customers, space flights
etc. Recognising and leveraging the natural time ordering of the events that
occur to
these objects can be both powerful and natural. Grady Booch et al have
written about a variety of
ways to do it: time lines, flow diagrams etc. IMHO SGML/XML can gainfully be
applied in this
field.
I made a stab at it at SGML '96 in Boston when I gave a paper that compared
SGML DTDs
with the ideas in the JSP and JSD software development methodologies.
I would argue that *not* utilising the natural time ordering of events
inherent in many systems
is one of the things that can make event driven programming a real dog.
How many times have you seen this:-
if (event == OPEN) {
if (ALREADY_OPENED==TRUE)
barf();
else {
ALREADY_OPENED=TRUE
do somthing useful.
}
}
In SGML/XML, very analagous sort of stuff results from loose content models:
start_foo {
InFoo == TRUE
}
start_a {
if (InFoo == TRUE)
.....
}
An interface that allows events to occur in any old order leads to the
introduction of state variables
that control what events are valid and when. The state space gets very large
very quickly. For N boolean
state variables a program can be in 2**N possible states!
SGML/XML is a great way to reduce a state space because SGML/XML DTDs can be
usefully thought of as devices for imposing a time ordering on events. Take
something like
a simple bank account model:
...
This is both a concise piece of documentation about the goings on of these
BankAccounts and
a starting point for the implementation code. As events occur they are
"parsed" prior to the
real processing code thus checking the desired time ordering and obviating
the need for state
variables to do it in the processing code.
In Jackson, simple structure editors are used to create life histories which
quite frankly are
within a syntactic asses roar of DTDs. A Jackson structure editor is a bit
like a DTD
editor except that processing code can be attached to all the nodes in the
DTD tree structure.
Case in point:
I used to write real-time financial trading systems for the PC in 80286
assembler(!). We used
a Jackson Editor to model the whole system and auto-generate the procedural
aspects of the
code from our life histories/data models. Before I left we have re-written
the whole thing
for Sun Workstations in Ansi C. The point? The life-histories, data models
did not change
only the implentation language.did.
Substitute "implemention language" for "formatting codes" in the above and
it sure
sounds like SGML.
Sean Mc Grath
sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 20 11:30:18 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:38 2004
Subject: LISTRIVIA (Was Re: XML as a programming tool)
In-Reply-To: <199712201046.KAA31284@GPO.iol.ie>
Message-ID: <3.0.1.16.19971220122449.0a5712ec@pop3.demon.co.uk>
A lot of members on this list are new to XML and SGML, and will hope to
"learn as they read". (This can be quite hard as they may have
misconceptions through experience of "broken" HTML.) I think it will be
useful if all sample code is well-formed XML (rather than SGML) unless
explicitly specified. (e.g. if you include SGML rather than XML, write
At 11:16 20/12/97 +0000, Sean Mc Grath wrote:
[... lots of very exciting stuff ...]
>a simple bank account model:
>
>
>
>
>
>
This isn't WF XML for several reasons. A correct version might read:
>...
[...]
>
>within a syntactic asses roar of DTDs. A Jackson structure editor is a bit
We all make syntactic asses of ourselves and I have done so on numerous
occasions, especially on XML-SIG. People have been very patient - "they
know what I mean". But here the readers *don't* know what you mean. So we
can all try to be well-formed asses :-).
Therefore:
(a) try to be very careful about XML examples and related matters. People
will say "this is written by an expert so it must be right - I'll cut and
paste it..."
(b) tactfully and gently correct any errors that *do* get through. It won't
be taken badly - we all make errors. For example I've corrected the
"element" to
"ELEMENT" as this is now required by the PR. [At one stage it wasn't, and
it's often easy to work with outdated versions.]. If it's unclear, you
might ask
"why isn't this "X"? - the answer may be revealing.
I have *never* seen any flames on these lists when people are corrected for
genuine mistakes.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From murata at apsdc.ksp.fujixerox.co.jp Sat Dec 20 12:13:19 1997
From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto)
Date: Mon Jun 7 16:59:38 2004
Subject: Plug and Play XML
In-Reply-To: <199712191957.OAA04476@unready.microstar.com>
Message-ID: <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp>
David Megginson writes:
>
>Here are two URLs that you can use to start playing:
>
> http://www.microstar.com/XML/donne.xml
> http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml
As a co-editor of an (upcoming) RFC for text/xml and application/xml,
I think that I should point out the correct procedure for encoding determination. (I have not checked these two Web sites, and
Flfred.)
For those XML documents transmitted by the HTTP protocol, XML parsers
should use the charset parameter of the media type text/xml (BTW,
the default of this parameter is 8859-1). XML parsers should ignore
the encoding declaration within XML documents transmitted by HTTP.
More about this, see the XML PR and the HTTP/1.1
Makoto
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 20 12:56:42 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:38 2004
Subject: Plug and Play XML
In-Reply-To: <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp>
References: <199712191957.OAA04476@unready.microstar.com>
Message-ID: <3.0.1.16.19971220134605.2227cfa4@pop3.demon.co.uk>
At 21:10 20/12/97 +0900, MURATA Makoto wrote:
>David Megginson writes:
>>
>>Here are two URLs that you can use to start playing:
>>
>> http://www.microstar.com/XML/donne.xml
>> http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml
There are a large number of non-textual XML files under:
http://ala.vsms.nottingham.ac.uk/vsms/java/jumbo/cml12/cml/
most of them served from APPLETs, but you can get the *.xml from the HTML
source.
>
>As a co-editor of an (upcoming) RFC for text/xml and application/xml,
>I think that I should point out the correct procedure for encoding
determination. (I have not checked these two Web sites, and
>Flfred.)
>
>For those XML documents transmitted by the HTTP protocol, XML parsers
>should use the charset parameter of the media type text/xml (BTW,
>the default of this parameter is 8859-1). XML parsers should ignore
>the encoding declaration within XML documents transmitted by HTTP.
>More about this, see the XML PR and the HTTP/1.1
Thanks for this reminder. For Chemical Markup Language Henry and I had
originally devised our own MIME type (not official) : chemical/x-cml. But,
with the likely introduction of other namespaces (e.g. RDF:*, MathML) in
CML documents, it is clear that there is no need for diversity, since the
namespaces themselves will have means of identifying the XML application.
So CML documents will be "text/xml", unless we should use "application/xml"
instead.
What will differentiate a text/xml document from an application/xml one?
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From murata at apsdc.ksp.fujixerox.co.jp Sat Dec 20 13:41:15 1997
From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto)
Date: Mon Jun 7 16:59:38 2004
Subject: Plug and Play XML
In-Reply-To: <3.0.1.16.19971220134605.2227cfa4@pop3.demon.co.uk>
Message-ID: <9712201340.AA02986@lute.apsdc.ksp.fujixerox.co.jp>
Peter Murray-Rust writes:
>
>There are a large number of non-textual XML files under:
>
>http://ala.vsms.nottingham.ac.uk/vsms/java/jumbo/cml12/cml/
>
>most of them served from APPLETs, but you can get the *.xml from the HTML
>source.
I am pleasantly surprised to see a lot more information than
before. I should send this URL to one of my friends (a Ph. D in
chemistry)!
Peter Murray-Rust writes:
>
>What will differentiate a text/xml document from an application/xml one?
text/* is used for text, and appliction/* is for binary data. Thus,
text/xml is appropriate for XML documents. (The reason that
application/xml is introduced is only for transmitting XML documents
in UTF-16 or UCS-2 via e-mail.)
text/* has the charset parameter, which specifies the encoding method.
text/* (implicitly) allows code conversion by proxy servers.
application/* does not have the charaset parameter (if not explicitly
defined for subtypes). application/* (again, implicitly) disallows
code conversion by proxy servers.
Makoto
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ricko at allette.com.au Sat Dec 20 14:21:58 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun 7 16:59:38 2004
Subject: Plug and Play XML
Message-ID: <199712201423.BAA15123@jawa.chilli.net.au>
> From: Peter Murray-Rust
> What will differentiate a text/xml document from an application/xml one?
When is each appropriate? I think the idea is to use text/xml in the
normal case, and application/xml as a fallback.
I think I first suggested it, but it certainly was not my preferred
option: I would prefer everything to be application/xml, because I do
not like the idea of dumb HTTP/MIME systems fiddling and transcoding data,
which they may do for text/xml. Application/xml is a binary transmission;
no bits are molested en route.
The trouble with text/xml is that XML positively encourages the use
of all ISO 10646 characters, for example all the symbol and publishing
characters. If the data is "transcoded" enroute from a large character
set encoding (e.g. Unicode or an East Asian one) to a small encoding
(e.g. 8859-n) then a dumb transcoder will not translate a non-encoding-
repertoire character into its numeric character reference, but probably
swallow it, or put out something strange.
In practise this means that all characters above 127 should be encoded
using numeric character references rather than directly by XML
document generators. Smart intermediate XML systems should also attempt
to replace characters in data and attributes with numeric character
references. When you are devising your own PI notations, and comment
conventions you should also duplicate numeric character references.
The unpleasant implication in all this is for native language markup.
If your XML data will be sent to users who use other scripts, do not
use characters in XML names that are not available in their regional
character sets. Numeric character references do not apply, currently,
to names. (I hope this will eventually be changed in SGML and XML,
but I think the facts and the effected users will eventually speak
for themselves in due time.)
This is why you should be conservative in your choice of name characters.
The < 127 characters are OK. The 128-255 range of characters in 8859-1
and ISO 10646 are probably pretty safe too. This problem even effects
within nations, if the nation has a few different repertoires in common
use: in particular in Japan Unix systems using EUC have available several
thousand more kanji than older PC (i.e. shift-JIS) and macintosh systems:
it is probably prudent for Japanese users to only use those characters
available in shift-JIS for naming.
None of these considerations were new for the XML discussion: what was
new was that XML works with a particular operating model that says that
documents must cope with HTTP/MIME systems but also must provide
enough information to create the MIME headers in the first place.
The restriction that numeric character references cannot be used
in markup, just in data and attribute values, comes from the old
character model of SGML. In this model, it made no sense to
allow numeric character references in names, and indeed would be
considered bad, because it created markup that could not be read
in a simple editor.
XML is probably one of the most thoroughly internationized software
systems around: in particular, this internationalization has been
in place and under discussion from the very beginning, and not
"tacked on". Internationalization (I18n) is one area of XML that
must cause difficulties for parser writers to get right. But the
benefit is that once they have it right, it makes life much simpler
and richer for users. Which is not to say that XML i18n is perfect,
but it is certainly near state-of-the-art, given the need to fit
in with HTTP/MIME and operating systems. I certainly hope that XML
will not remain "state-of-the-art" for long, and that advances
in various technologies--in particular, for operating system
vendors to agree on a charset/encoding labelling schema that
they all implement in their OS (or the adoption of MIME as a
file format, e.g. .MIM)-- will overtake it.
Rick Jelliffe
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From murata at apsdc.ksp.fujixerox.co.jp Sat Dec 20 15:15:04 1997
From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto)
Date: Mon Jun 7 16:59:38 2004
Subject: Plug and Play XML
In-Reply-To: <199712201423.BAA15123@jawa.chilli.net.au>
Message-ID: <9712201514.AA02987@lute.apsdc.ksp.fujixerox.co.jp>
Rick Jelliffe writes:
>When is each appropriate? I think the idea is to use text/xml in the
>normal case, and application/xml as a fallback
I believe that this is the idea of the XML WG and also the idea of W3C.
However, it is still not cleary presented in the XML PR.
Rick Jelliffe writes:
>I think I first suggested it, but it certainly was not my preferred
>option: I would prefer everything to be application/xml, because I do
>not like the idea of dumb HTTP/MIME systems fiddling and transcoding data,
>which they may do for text/xml. Application/xml is a binary transmission;
>no bits are molested en route.
You might want to try this once again in the XML SIG. If everybody agrees
on this, I am more than happy to agree. But I do not want to have
both text/xml and application/xml for HTTP, as this is likely to confuse
people. Is it possible to persuade people *not* to use text/xml?
Makoto
Fuji Xerox Information Systems
Tel: +81-44-812-7230 Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sat Dec 20 16:20:50 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:38 2004
Subject: Plug and Play XML
In-Reply-To: <9712201514.AA02987@lute.apsdc.ksp.fujixerox.co.jp>
References: <199712201423.BAA15123@jawa.chilli.net.au>
Message-ID: <3.0.1.16.19971220171515.0a5796a0@pop3.demon.co.uk>
At 00:14 21/12/97 +0900, MURATA Makoto wrote:
>Rick Jelliffe writes:
>>When is each appropriate? I think the idea is to use text/xml in the
>>normal case, and application/xml as a fallback
>
>I believe that this is the idea of the XML WG and also the idea of W3C.
>However, it is still not cleary presented in the XML PR.
>
>Rick Jelliffe writes:
>>I think I first suggested it, but it certainly was not my preferred
>>option: I would prefer everything to be application/xml, because I do
>>not like the idea of dumb HTTP/MIME systems fiddling and transcoding data,
>>which they may do for text/xml. Application/xml is a binary transmission;
>>no bits are molested en route.
>
>You might want to try this once again in the XML SIG. If everybody agrees
>on this, I am more than happy to agree. But I do not want to have
>both text/xml and application/xml for HTTP, as this is likely to confuse
>people. Is it possible to persuade people *not* to use text/xml?
There are two conflicting messages here, and I think it's critical that
this is addressed *quickly* :-). Otherwise a large number of servers will
have been set up where people guess the type (probably as text/xml), and
the chance of uniformity will have been missed. Personally I am neutral,
although given the effort that has gone into i18n, the thought of anything
tweaking the bits en route sounds horrid. The application has enough to do
without mending documents that have been tweaked for humans to read.
I would hope that there is only one MIME type for XML as it will be
impossible for most people to work out the difference. Two will simply
confuse people and they (the types) will simply serve as synonyms. From
what Rick says, application seems more logical, but I imagine there are
lots of text/sgml documents out there already and people will go by analogy.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sat Dec 20 18:32:28 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:38 2004
Subject: text/xml vs. application/xml
In-Reply-To: <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp>
References: <199712191957.OAA04476@unready.microstar.com>
<9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp>
Message-ID: <199712201829.NAA00608@unready.microstar.com>
MURATA Makoto writes:
> > http://www.microstar.com/XML/donne.xml
> > http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml
> As a co-editor of an (upcoming) RFC for text/xml and
> application/xml, I think that I should point out the correct
> procedure for encoding determination. (I have not checked these
> two Web sites, and ?lfred.)
Thank you very much for the information. Currently, both of these web
servers return "application/octet-stream" as the MIME type for *.xml
and *.dtd files: in this case, is it correct for an XML parser to fall
back on other character-encoding detection techniques, as ?lfred does?
> For those XML documents transmitted by the HTTP protocol, XML parsers
> should use the charset parameter of the media type text/xml (BTW,
> the default of this parameter is 8859-1). XML parsers should ignore
> the encoding declaration within XML documents transmitted by HTTP.
> More about this, see the XML PR and the HTTP/1.1
I have two important queries:
1) Are you certain that ignoring the encoding declaration is
conforming behaviour? It seems to me that it would make more sense
to report an error if the charset parameter and the encoding
declaration differ (especially since the PR requires any document
without a BOM or encoding declaration to be in UTF-8).
2) Why pick a default encoding that conforming XML parsers are not
required to support? ?lfred does accept encoding="ISO-8859-1", but
some other parsers do not. It seems to me that either the RFC or
the PR needs to be amended.
I can also anticipate a different problem: few private people (as
opposed to companies or organisations) have any control at all over
what their HTTP servers send out.
Imagine an exchange student at a big American University, who wants to
publish a UTF-8 or UCS-2 Arabic XML text in her personal web space.
She will have a very hard time even finding out who is in charge of
the university's HTTP server (if she knows what an HTTP server is),
and she will probably have graduated before the university's
administration has gotten around to approving letting the web-master
look into reporting the correct encoding for her document.
In the end, it looks like application/xml is a _much_ better choice
than text/xml -- with ?lfred, I have found that I can do a very good
job autodetecting character encoding, and I imagine that other parser
writers will find the same.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From jeremy at allaire.com Sat Dec 20 20:32:08 1997
From: jeremy at allaire.com (Jeremy Allaire)
Date: Mon Jun 7 16:59:38 2004
Subject: XML as a programming tool
Message-ID: <01bd0cd4$6ddc5c00$LocalHost@jeremyhp>
>I guess it's programming like 'programming' a VCR - someone else has
written
>the program, I just feed it the data that controls its behavior. Still,
even
>that limited prospect was exciting, and capable of some pretty complex
stuff.
I've put a fair amount of thinking into the problem (opportunity) of XML and
Web devices. For experimental purposes, I began work with wrapping an X.10
device automation interface (X.10 is a late 70s standard for very simple
device automation over AC wiring) with a tag wrapper. The proof of concept
actually worked. You can check out the custom tag which enables this at the
following site; search for "X10":
http://www.allaire.com/TagGallery/
X.10 already has a concept of loading device activity profiles (essentially
schedules, in fact very close to CDF in terms of the kind of data required).
A next-generation "over the wire protocol" -- CEBus -- promises to enable
much richer forms of device automation and profiling, and I would surmise
that XML will be a pretty important enabler. I'm betting on it.
Jeremy Allaire
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Dec 21 00:53:13 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:38 2004
Subject: text/xml vs. application/xml
In-Reply-To: <199712201829.NAA00608@unready.microstar.com>
References: <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp>
<199712191957.OAA04476@unready.microstar.com>
<9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp>
Message-ID: <3.0.1.16.19971221014239.0a57e018@pop3.demon.co.uk>
At 13:29 20/12/97 -0500, David Megginson wrote:
[...]
>I can also anticipate a different problem: few private people (as
>opposed to companies or organisations) have any control at all over
>what their HTTP servers send out.
I am extremely sympathetic to this. XML will revolutionise the 'publishing
process' by providing direct author2reader communications (and much else).
It seems to me essential that authors are allowed to say what their
documents are, and XML gives them this opportunity, whilst - as David says
- with MIME they do not have complete freedom. [I have suffered the same
problem - people mailing me and asking 'can I change the MIME type of my
files?'; answer 'sorry'.]
BTW I have now hacked AElfred beta 4 under JUMBO, and it seems to work
fine. I can extract all the DTD information I want and render it as a tree,
as well as the conventional data. If - as David suggests - the current
AElfred API is close to the planned convergence, then fine. It *did* take
me longer than 15 mins - but I wasn't at my brightest :-). I'd still like
to see the #IMPLIED problem clearly agreed. AElfred AIUI outputs null as
the value for a non-existent attribute whether the attribute is declared or
not
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From rwaldin at pacbell.net Sun Dec 21 03:22:53 1997
From: rwaldin at pacbell.net (Ray Waldin)
Date: Mon Jun 7 16:59:38 2004
Subject: element content vs. element attribute
References: <01bd0cd4$6ddc5c00$LocalHost@jeremyhp>
Message-ID: <349C8DF2.EC73820F@pacbell.net>
Hi everyone,
I've used XML twice now and in both cases I've ended up with nothing to markup
except more tags and whitespace :). These languages are used to communicate
relationships between external resources, not marked up text. The intent was to
describe these relationships in a flexible but well defined format and XML
offered a simple (and soon to be standard!) way of doing this. So far, so
good.
I've seen other examples of this type of "pure tag language" and noticed that
some of them seem to force content into tags for no reason. My question is,
given the "nothing to markup" scenario, which is more appropriate?, when is each
more appropriate?, and why?:
1234
or
In other words, when should data be contained by elements? Or conversely, when
should data be an attribute of an element instead of contained by that element?
I prefer the latter method, given an attributes ability to store CDATA without
CDATA section delimiters. OSD and CDF use the former method for:
Solitaire
and I'm not sure why as:
could serve the same purpose and is more inline with the rest of the language.
Any general guidelines?
Thanks!
-Ray
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From liamquin at interlog.com Sun Dec 21 09:09:47 1997
From: liamquin at interlog.com (Liam Quin)
Date: Mon Jun 7 16:59:38 2004
Subject: element content vs. element attribute
In-Reply-To: <349C8DF2.EC73820F@pacbell.net>
Message-ID:
On Sat, 20 Dec 1997, Ray Waldin asked:
> when should data be contained by elements? Or conversely, when should
> data be an attribute of an element instead of contained by that element?
There are a number of issues that may help here, depending on how the
information is going to be used...
Some pragmatics first:
* it's often easiest for people writing ad-hoc parsers if you only use
elements; there's only one syntax to handle
* if you will ever need to have more complex structured values with markup
in them, they will need to be in element content, because XML (like SGML)
has a restriction that you can't put element markup inside attributes
* if you want the information to be displayed in XML or HTML or SGML browsers
most or all of the time, use content, as the style sheets are generally
less flexible with attributes.
* it's relatively easy to strip out all attribute values and make a
pared-down instance, if that's useful
* attributes are good for things like interpretations of a text by someone
transcribing it, not part of actual content
A philosophical view:
* attributes may be used for annotating the element tree; in other words,
you could use them to store element properties.
for example,
steam
water
gunk
Unfortuantely, a practical example would add units and tolerance to
the temperature, and then you need to use elements or a non-XML sub-
structure:
This is generally unsatisfactory because it's not using XML; so
Kelvin
7
Clearly you could take those items i have left as attributes and turn
them into elements, and in fact any element E with attribute list A and
content model C can be converted into an element E' with content model
E.atts(A), E.content(E)
e.g.
It is therefore possible to think of attributes as syntactic sugar for
a very restricted kind of content model.
Unfortunately, this is not quite correct, because XML attributes support
a set of constraints on their content which is entirely different to
that supported for elements.
If you only ever use CDATA, ID and name group attributes, retain ID
attributes as attributes, and convert name group token lists to
corresponding empty elements, the conversion still applies.
In theory, then, attributes are a useful but limited shrthand in most
cases, but, essential for IDref and other cases that are not supported
in element content.
In practice, they can be used to make an instance more readable, or to
reduce file size, or to distinguish between different sorts of information.
Hope this helps.
Lee (tired at 4 am!)
--
Liam Quin -- the barefoot typographer -- Toronto
lq-text: freely available Unix text retrieval
IRC: Learn about XML/SGML/XSL/XLL/DSSSL on irc.dragonnet.org in #xml
email address: l i a m q u i n, at host: i n t e r l o g dot c o m
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From peter at ursus.demon.co.uk Sun Dec 21 10:08:21 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:38 2004
Subject: element content vs. element attribute
In-Reply-To:
References: <349C8DF2.EC73820F@pacbell.net>
Message-ID: <3.0.1.16.19971221110404.0baf108e@pop3.demon.co.uk>
At 04:09 21/12/97 -0500, Liam Quin wrote:
>On Sat, 20 Dec 1997, Ray Waldin asked:
[... a very common and important question of style ...]
>> when should data be contained by elements? Or conversely, when should
>> data be an attribute of an element instead of contained by that element?
>
>There are a number of issues that may help here, depending on how the
>information is going to be used...
>
>Some pragmatics first:
>
[...]
I have run into exactly this problem with Technical Markup Language. I
wanted to design it with as few ELEMENTs as possible and have evolved this to
120-125
where there are a number of attributes that qualify the value. For reasons
Liam has outlined, and some others (see below) I have come to the
conclusion that ELEMENTs are easier to work with than attributes. So, to
Liam's criteria I'll add:
* X*L tools formally require more support to be given to ELEMENTs than
attributes. For example, if I have a unit of length (metre), but don't know
whether it occurs as kilometre or centimetre [1], I can search in content
with standard XML syntax:
DESCENDANT(ALL,UNITS)STRING(1,"metre",0)
whereas I have no way of searching in attribute values unless I write my
own software.
* When you have to write significant amounts of code to process an
attribute it may be work reworking it as an ELEMENT. JUMBO includes a lot
of code for automatic conversion between UNITS and so it makes sense to
make this an ELEMENT, because much of that processing can then be done
automatically. Put another way, at present JUMBO has to know which ELEMENTs
might have UNITS attributes and call special code. If UNITS is contained,
the processing is requested just like any other ELEMENT.
> Unfortuantely, a practical example would add units and tolerance to
> the temperature, and then you need to use elements or a non-XML sub-
> structure:
Yes - and I ran into trouble here. So the example I gave is horrid, and I
am reworking this using and things like this. I would much
rather have a proliferation of ELEMENTs than attributes.
[Part of my worry about multiplying ELEMENTs was that the content models
can get very complex. Since much of my XML will not be validatable, that's
less of a problem now.]
>
> This is generally unsatisfactory because it's not using XML; so
Liam is an illicit helium distiller, I see. :-)
P.
[1] Some countries use the variant "meter" so you will have to do two
searches.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From gfrer at luna.nl Sun Dec 21 11:34:02 1997
From: gfrer at luna.nl (Gerard Freriks)
Date: Mon Jun 7 16:59:38 2004
Subject: element content vs. element attribute
Message-ID:
> My question is,
>given the "nothing to markup" scenario, which is more appropriate?, when
>is each
>more appropriate?, and why?:
>
>1234
>
>or
>
>
>
>Any general guidelines?
>
In my view:
XML (or any other Tag-language) will be used to express:
- Datamodels of a part of the universe, which handle the relationships
between entities with the Model. It defines the Context of the information.
- with Terminology (a set of Tags) which give names to the entities
- with Rules to obey.
It will be likely that the above examples will be Tagged like:
1234 if rules of the model allow it
allow it.
or
1234
will be an entity from a Model indicating a context
will be an atribute indicating how things are coded
Attributes will be derived from other Models.
'Nothing to markup' is nothing.
It equals chaos.
Greetings
Gerard Freriks
Gerard Freriks,huisarts, MD
C. Sterrenburgstr 54
3151JG Hoek van Holland
the Netherlands Telephone: (+31) (0)174-384296/ Fax: -386249
Mobile : (+31) (0)6-54792800
ARS LONGA, VITA BREVIS
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sun Dec 21 11:42:31 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:39 2004
Subject: Undeclared attributes in Ælfred
In-Reply-To: <3.0.1.16.19971221014239.0a57e018@pop3.demon.co.uk>
References: <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp>
<199712191957.OAA04476@unready.microstar.com>
<199712201829.NAA00608@unready.microstar.com>
<3.0.1.16.19971221014239.0a57e018@pop3.demon.co.uk>
Message-ID: <199712210153.UAA01766@unready.microstar.com>
Peter Murray-Rust writes:
> BTW I have now hacked AElfred beta 4 under JUMBO, and it seems to
> work fine. I can extract all the DTD information I want and render
> it as a tree, as well as the conventional data. If - as David
> suggests - the current AElfred API is close to the planned
> convergence, then fine. It *did* take me longer than 15 mins - but
> I wasn't at my brightest :-). I'd still like to see the #IMPLIED
> problem clearly agreed. AElfred AIUI outputs null as the value for
> a non-existent attribute whether the attribute is declared or not
Thank you for taking the time to try out the new release.
For documents without DTDs, this is probably the only option (any
attribute is potentially an #IMPLIED attribute). For documents with
DTDs, I could create a query method like
boolean isDeclaredAttribute (String elname, String aname)
but ?lfred has already grown too large (it's over 25K), so I would
need evidence of a pressing need. In the meantime, you can use the
query method
Enumeration declaredAttributes (String elname, String aname)
to build a hashtable, and then look in the hashtable whenever you need
to know whether an attribute is #IMPLIED or simply undeclared.
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From ak117 at freenet.carleton.ca Sun Dec 21 11:58:20 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun 7 16:59:39 2004
Subject: element content vs. element attribute
In-Reply-To: <349C8DF2.EC73820F@pacbell.net>
References: <01bd0cd4$6ddc5c00$LocalHost@jeremyhp>
<349C8DF2.EC73820F@pacbell.net>
Message-ID: <199712211155.GAA00316@unready.microstar.com>
Ray Waldin writes:
> In other words, when should data be contained by elements? Or
> conversely, when should data be an attribute of an element instead
> of contained by that element?
Here's a good, general distinction:
* use elements for structurally-significant information; and
* use attributes for meta-data.
One problem, that will become more obvious when more XML tools are
available, is that most WYSIWYMG (M="might" or "may") XML editing
software will like show character data (element content) on the screen
by default, but will show attributes only on request, possible in a
pop-up dialog. It makes sense to have the most important information
(the real content) inside elements, then, and to have the meta-data
out of the way in attributes.
(Peter: how do you display attributes in Jumbo?)
Of course, what is and isn't meta-data will vary depending on the
document type, but here are some common examples:
* a unique identifier
* a security level
* a revision or release level
* rendition information (yech)
* configuration information
* the preferred unit of measurement
All the best,
David
--
David Megginson ak117@freenet.carleton.ca
Microstar Software Ltd. dmeggins@microstar.com
http://home.sprynet.com/sprynet/dmeggins/
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From digitome at iol.ie Sun Dec 21 13:55:12 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun 7 16:59:39 2004
Subject: element content vs. element attribute
Message-ID: <199712211355.NAA08564@mail.iol.ie>
[David Megginson]
>
>One problem, that will become more obvious when more XML tools are
>available, is that most WYSIWYMG (M="might" or "may") XML editing
>software will like show character data (element content) on the screen
>by default, but will show attributes only on request, possible in a
>pop-up dialog.
I suspect this comment is right on the money. I have never come across a way
of displaying
attribute data in an SGML editor that "felt" right.
For one job I was involved in, attribute editing was such a pain that we wrote
"ConvertAttrbutesToElements" and "ConvertElementsBackToAttriibutes"
transformations:-
SGML doc -> [ConvertAttributesToElements] -> SGML doc -> [Editing
Environment] -> SGML doc -> [ConvertElementsBackToAttributes] -> SGML doc.
The fact that this is doable in a lossless fashion suggests that the
attribute/element decision
is largely a product of taste and a pragmatic consideration of the tools you
intend to use.
As for the philosophical difference, I dunno. I suspect that a Bertrand
Russel or a Kurt Godel
or a Daniel Dennet or a Douglas Hofstadter could always rustle up a
counter-example for any
hypothesis. My head hurts and I am heading at full speed past the point
where I know what
I am talking about but if we take the data versus meta-data distinction --
Is "SayingSomethingAboutThisData" data or meta-data in this case:-
Sean Mc Grath
sean at digitome dot com
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From tadmc at metronet.com Sun Dec 21 15:48:48 1997
From: tadmc at metronet.com (Tad McClellan)
Date: Mon Jun 7 16:59:39 2004
Subject: element content vs. element attribute
In-Reply-To: <349C8DF2.EC73820F@pacbell.net> from "Ray Waldin" at Dec 20, 97 07:33:06 pm
Message-ID: <199712211435.IAA00768@magna.flash.net>
A non-text attachment was scrubbed...
Name: not available
Type: text
Size: 1668 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971221/d2dcafde/attachment.bat
From peter at ursus.demon.co.uk Sun Dec 21 16:11:58 1997
From: peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun 7 16:59:39 2004
Subject: element content vs. element attribute
In-Reply-To: <199712211155.GAA00316@unready.microstar.com>
References: <349C8DF2.EC73820F@pacbell.net>
<01bd0cd4$6ddc5c00$LocalHost@jeremyhp>
<349C8DF2.EC73820F@pacbell.net>
Message-ID: <3.0.1.16.19971221170925.2bf7468e@pop3.demon.co.uk>
At 06:55 21/12/97 -0500, David Megginson wrote:
>Ray Waldin writes:
>
>(Peter: how do you display attributes in Jumbo?)
JUMBO uses 4 sorts of display:
- event stream ("text with embedded tags")
- tree
- X*L-predicated
- specialist (downloadable classes or user-applied)
Event Stream
This is essentially to be rendered as text. JUMBO has the following ways
of dealing with tags:
- recognise them as HTML and produce HTML-compliant rendering. It's not
pretty, and I've only done HTML 2.0 [I didn't set out to produce a browser,
remember :-). However it's necessary to have one, because people will start
"embedding XML in HTML" so we have to have a renderer. The most likely
first task is to render XML-LINKs
- present them as text with tags. Because Java does not have a nice way of
embedding buttons in text, I either have to use paint(), which is very slow
and which has no defined textual semantics (e.g. Ctl-X) OR use TextArea,
where the tags are simple transliterations of the input and have no
clickability (because TextArea 1.02 has no clickability that *I* can see).
In a better situation I would create pretty buttons for the tags and paint
them nice colours according to whether they have attributes
Note that the second model is editable, and is XML-sensitive (e.g. there
are options like "JumpBalancedtags")
Tree
The Nodes have a variety of buttons in paint(). One button is "At" in
cyan. Clicking it reveals a box with attributes in. This can be edited, and
the editor is DTD driven. It deals with #IMPLIED, REQUIRED, #FIXED, etc. It
does not deal with NOTATION because I don't understand it. It will deal
with XML-LINK when I have written a drag and drop top add the internal
links (isn't Java boring...)
XML-driven.
JUMBO makes a best guess as to what the drafters of the spec expect for
things like xml:link SHOW="EMBED". JUMBO has asked about this a number of
times and will try no to do anything to unexpected. JUMBO has also asked
about xml:space="DEFAULT", but has no default at present
There are already quite a few hardcode attributes in X*L and all require
specialist code to be written.
Specialised
This requires bespoke code to be written, e.g.
1 2 3 4 5 6
has (I think) nothing displayed in the lower half.
>
>Of course, what is and isn't meta-data will vary depending on the
>document type, but here are some common examples:
>
>* the preferred unit of measurement
I took this view initially but (see recent posting) have changed my mind
because UNITS are complex objects. As everyone agrees, the distinction is
subjective BUT will be influenced by the tools that we create on this list
and elsewhere. personally I am against having structure in attributes if it
can be avoided because it requires additional code to be written. I have
been so impressed with the economy of doing everything in XML, that I would
hate to see more 'mini-languages' inside attributes.
P.
BTW I am working hard on a new snapshot of JUMBO. It will be the last 1.02
version, I think. There are quite a lot of new goodies, and I will try to
create the distribution in smaller packets as I know it has been difficult
to download.
P.
Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic
net connection
VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary
http://www.venus.co.uk/vhg
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Sun Dec 21 17:06:50 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:39 2004
Subject: element content vs. element attribute
References: <199712211435.IAA00768@magna.flash.net>
Message-ID: <349D4570.7188F25B@technologist.com>
Tad McClellan wrote:
> [ I hope I don't mispeak here. I haven't yet gotten my arms around all
> the differences between XML and SGML. (That's why I am lurking here ;-)
>
> Someone please correct me if I have it wrong in an XML context
> ]
No, you are absolutely right. The original poster was a little confused
about CDATA. I meant to point that out but forgot. Thanks for doing so.
--
Paul Prescod -- http://itrc.uwaterloo.ca/~papresco
Art is always at peril in universities, where there are so many people,
young and old, who love art less than argument, and dote upon a text
that provides the nutritious pemmican on which scholars love to chew.
-- Robertson Davies in "The Cunning Man"
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)
From papresco at technologist.com Sun Dec 21 17:07:15 1997
From: papresco at technologist.com (Paul Prescod)
Date: Mon Jun 7 16:59:39 2004
Subject: element content vs. element attribute
References: <01bd0cd4$6ddc5c00$LocalHost@jeremyhp> <349C8DF2.EC73820F@pacbell.net>
Message-ID: <349D4C8A.CE7B9CD5@technologist.com>
Probably the best forum for DTD questions is comp.text.sgml.
After all, XML DTDs are SGML DTDs and people there have been making them
for more than a decade.
In fact, this very topic was covered recently. Use dejanews and look for
the thread (mis!)named "Entities vs. Attributes" from around 1997/06/16
in the comp.text.sgml archive. Many of the points raised here are the
same as there.
This is a recurring question and perhaps deserves a section on the
special topics page [1] of the SGML Web Page, maybe as part of a DTD
design section. if its patron saint is willing. Here is what I am
thinking of:
On DTD Design
There are many heuristics for and opinions on proper DTD design.
A recurring question is when to use attributes or sub-elements.
This was discussed