From eliot at isogen.com  Sun Jun  1 00:40:45 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:57:53 2004
Subject: XML-LINK
Message-ID: <3.0.32.19970531173735.00c2622c@swbell.net>

At 10:51 PM 5/30/97 GMT, Peter Murray-Rust wrote:
>I am trying to understand how XML-LINK might be used and would be
>grateful for some gentle hints.  

I'll try to offer some guidance.  I have implemented support in
ADEPT*Editor for HyTime that is roughly equivalent to the types of
facilities Peter is asking about for XML Link and JUMBO.  Thus, I think I
can provide some insight to these issues.

Also, trying not to be too pedantic, I've tried to correct Peter's use of
terminology where I think Peter's use may be leading to some of his
confusion.  This is intended to be generally instructive--these misuses are
generally endemic and stem from the Web's singular focus on addressing to
the exclusion of all else.

[NOTE: having written this note, I find I must warn that it is long and
somewhat more theoretical than I had intended.  Peter: There are useful
implementation suggestions in here.  Also, the end of this note includes
what are effectively suggestions to the editors of the XML spec--Tim and
Steve, I've copied you explicitly on this note by way of formal submission
of these comments--I found my explanation of my opinions underlying the
comments to be generally instructive--normally I wouldn't criticize in
public without first conveying the critique directly to the editors.
However, in this case my suggestions are neither indicative of serious
flaws nor is the acceptance of them a necessary condition for my acceptance
of XML Link as a useful spec--it is useful as written (although, like any
such spec, including HyTime, it could use clarrification of some of it's
intended semantics in places).

[to continue...]

>A link has ends which are called resources.  My current understanding is
>that these can be thought of as points in the structure of a document, and
>will often coincide with Elements.  I am as yet unclear about the total 
>number of possible topolgies of a link, and ask some questions here.

I think it's most useful to think of the resources as nodes in trees
("groves" in the HyTime/DSSSL world) [see terminology discussion below].
This is because before you can resolve an address, you must parse the thing
into memory so you have a literal structure your program can address to
(e.g., nodes in some data structure).  HyTime and DSSSL codify this by
defining all of their functioning in terms of operations on nodes in groves
(DSSSL and HyTime are both closed over groves).  I think it will be helpful
to do the same thing here, although we can, for simplicity, just use the
general notion of "parse trees" and avoid the complication of the grove
formalism.  Note that any kind of data can be parsed into a parse tree
(although the tree may consist of a single node)--this is an important
simplifying generalization.

>Structure and Behaviour.
>
>My understanding is that a hyperdocument can have a link structure which is
>independent of behaviour - it simply represents the structure of the 
>information.  

True.

>           I'm happy with this - what I'm less clear about is whether
>there are *commonly agreed semantics* for this, or whether it's all
>application-dependent.  [If the answer to all my concerns is 'application-
>dependent' then it will be a pity because everyone will write individual
>link processors and there will be no reusability.]  I'm aware that all these
>concerns are catered for by HyTime, but since I am ignorant of HyTime,
>answers which refer to that won't be much use to me - ideally they should
>be in the context of the current spec.

There are two schools of thought on this:

1. The "links are everything" school. This school makes no distinction between
   relationships that are purely structural and relationships that are
   annotative.  In this school, all semantics are, by necessity, application
   dependent, because all relationships are fundamentally annotative
   and are only made structural by labeling them as such.

2. The "structure and annotation are different" school.  In this school,
   a fundamental distinction is made between purely structural relationships
   and annotative relationships.  The semantics of structural relationships
   (inclusion) are well defined and not open to interpetation.  For
   example, in SGML, the markup structure defines structural relationships.
   HyTime augments this by providing a generalized, indirect, structural
   relationship called a "value reference", which lets you use any form
   of address to identify the effective value of something, such as an
   element's content or an attribute's value (as opposed to using direct
   containment via markup or specifying attribute values directly).
   Annotative relationships are created using hyperlinks.

   The rule of thumb for distinquishing hyperlink relationships from other
   relationships is that if hyperlinks are removed, they don't change the
   fundamental properties of the data linked (e.g., they don't change it's
   structure, remove required property specifications, etc.).

NOTE: This issue is confused because the same addressing methods (e.g., 
URLs, IDREFs) may be used for both structural and annotative relationships.
 In addition, the styles applied to annotative relationships may make them
appear to be structural (e.g., "present this anchor at the point of
occurence of this other anchor") when they are not.

A good example of this latter case is using hyperlinks to associate notes
with a source document.  Some systems, such as HyBrowse, let me style
hyperlinks in various ways, including presenting one anchor at the point of
occurrence of another anchor.  Using this facility, I can style my
"annotation" links such that it appears that the annotation is part of the
data annotated, even though it isn't: choose another style and you get a
clickable button that takes you to the annotation.  Choose a third and the
annotation is hidden.  Obviously, the annotation is not part of the content
of the source document and styling it as though it were doesn't make it so.

THUS: the only way to know for sure if a given use of addressing is in the
service of structural relationships or annotative relationships
(hyperlinks) is to examine the semantics of the thing making the reference:
you can't tell from the form of address.  It is up to the designers of
document types and architectures to define a method for distinquishing
structural relationships from annotative.  If they fail to do so, they are
requiring the processors (browsers, formatters, style sheet writers) to do
the defining. [HyTime formalizes the distinction between structure and
annotation with the "value reference" facility (nee conloc), which lets you
define the structural semantic associated with particular references.
Value reference defines structural relationships semantically rather than
lexically (as SGML does with markup).]

NOTE: Text entity references in SGML are not semantic, they are lexical,
being a parser-level include.  Data entity references (references to
graphics or subdocuments) are not lexical and may be used for either
structural relationships or annotative relationships.  SGML also makes a
clear distinction between addressing storage objects (entities) and
addressing semantic objects inside storage objects.  The URL mechanism
combines storage object reference and semantic object reference into a
single, inseparable syntax (one of the reasons URLs are so fragile).

>SIMPLE
>The simplest link is XML-LINK="SIMPLE" and is an analogue of HTML's <A>
>or <IMG>.  My view of it is exemplified by this fictitious XML
>document:
>
><P>This is <A HREF="#foo" ID="A">resource A</A> which points to
><FOO ID="foo">the foo bird</FOO> (see picture 
><IMG HREF="foo.gif" TITLE="foo bird" ACTUATE="AUTO" SHOW="EMBED" ID="gif">)
></P>
>
>Here there are two links, both being unidirectional.  

Any hyperlink is inherently bi-directional, in the sense that knowing where
both ends are, you can traverse from one to the other.  Whether traversal
in both directions is *allowed* is a matter of style or the semantics of
particular link type.  The directionality of hyperlinks is independent of
the directionality of the addressing used to create the link.  Note that
XML Link does (unnecessarily in my opinion) limit simple links to traversal
initiation from the SIMPLE link element.

We tend to think of simple links as being directional because it is
impractical to resolve all links in order to find the other ends in order
to enable traversal from the non-pointing anchor in an unbounded
environment like the Web.  However, in a closed system (such as within an
intranet or a system like Hyper-G) this need not be a problem.

In other words, while all links are inherently bi- or multi-directional,
the practicalities of address resolution in some environments may preclude
making both traversal directions available.  If you are at the element
making the reference, you know it's an end of the link; the reverse is not
always true.

                                                 I understand the the 
>ends of the first link are the 'point' described by 'ID=A', and the point
>described by ID=foo (though this is still being discussed).  If this is true,
>then in a **tree-based** tool like JUMBO the ends of the link correspond
>to nodes in the tree (labelled by ID=A and ID=foo).  The second link is
harder
>because the resource in foo.gif is not clear (perhaps it is the inode in
>the UNIX system?).  

If we require that all addresses are to nodes in trees, then we have to say
that the address "foo.gif" is implicitly a reference to the node in the
tree created by "parsing" the gif into memory.  If the GIF consists of a
single image, the tree may have a single node, it's root, with some
properties, one of which is the image data itself.  If the GIF consists of
multiple images, the tree would have a root and one child for each image.
Once you've built the tree, the result of addressing is well defined
(possibly through some implicit addressing rules defined for the format,
such as "reference to a GIF image is really a reference to the first Image
node in the tree produced by interpreting the GIF--note that someone has to
define what the rules are for parsing GIFs into trees, but this is probably
part of the GIF spec, either explicitly or implicitly in the way GIF data
is organized).

In HyTime and DSSSL, this concept is generalized through the notion of
property sets and "grove constructors", which are nothing more than
notation-specific processors that understand that notation and the rules
for creating groves from it.  The property set is nothing more than a
formal class schema that defines the classes and properties of the nodes in
the resulting grove.

>I have (I believe) implemented SIMPLE links in JUMBO.  Each Node has a method
>isLink() which says whether it's the start of a SIMPLE link.  (I may have to
>change this nomenclature when the other links become clearer.).  So, for
>example, when process()ing a Node, JUMBO looks to see if it isLink() and
if so
>what does it point at (value of HREF).  It seems to work.

It might be helpful to generalize this slightly from "isLink()" to
"IsEndMember()".  In other words, any node in any document may be a member
of one or more link ends (remember that XML pointers can address multiple
objects).  Simple link elements are also members of at least one link end
[I say "at least one" because they could themselves be linked to].  By
generalizing this question, you don't need to distinguish between simple
links and extended links because simple links are simply special cases of
extended links.

In other words, the core processing semantics for links are the same
regardless of whether the links are "simple" (that is, the link is one of
its own ends) or "extended" (that is, completely "out of line").  The
relationships represented are the same and are independent of both the
syntax of link representation and the addressing methods used to address
the members of the link ends (including the implicit address of being the
link element).

[This is why it's impossible for XML Link (or HTML) to not be HyTime
conformable: links are links are links, regardless of syntax or addressing.
 HyTime is now sufficiently general that any syntax of link represenation
and any form of addressing can be connected to the linking and addressing
semantics defined by HyTime. &Borg-motto;]

>Note that in this model, the resource which is pointed to (ID=foo, or
foo.gif)
>is not required by XML-LINK to know anything about the link.  I asumme it
could be argued both ways that the pointedAt should/should_not know what is 
>pointing at it.  [SHOW and ACTUATE are deliberatly not discussed, although I
>think they are straightforward (at least compared to EXTENDED).]

In fact, in the general case, no object can "know" that it is being pointed
at--only the "link manager" knows for sure.  However, the processing
associated with an object should be able to ask the link manager (e.g.,
JUMBO) "am I being pointed at?", i.e., "am I a member of the ends of any
links you know about?"

>EXTENDED
>
>EXTENDED is a container for an indefinite number of LOCATOR links.  

TERMINOLOGY ALERT: LOCATOR elements are NOT (I repeat ARE NOT) links.
They are addresses, semantically equivalent to the HREF attribute of
SIMPLE.  It is vitally important to maintain a clear distinction between
linking, which is the definition of relationships, and addressing, which is
the mechanics by which the things related are pointed to. 

This is important for at least two reasons:

1. Addressing can be used for purposes other than linking.  If you conflate
   linking with addressing, you will conflate linking with things that are
   not linking (see above).

2. It reminds you that the relationship and its definition is independent
   of the form of address.  If you change an IDREF to a URL, you have 
   changed the form of address but you haven't changed the relationship
   expressed.  [If I move from place to place changes, my address changes,
   but my relationship to my wife, namely that we are married, does not
   change just because my address has.]

[LOCATOR
>has exactly the same syntax as SIMPLE but has presumably different
>semanttics.]  

Not presumably, explicitly.  SIMPLE and EXTENDED have *exactly* the same
semantics (the representation of a relationship).  The difference between
them is the *syntax* of how the things related are addressed. For SIMPLE,
the link end address is an attribute of the link element (the address of
the other end, the SIMPLE element itself, is implicit and thus not
specified). For EXTENDED, the addresses of the link ends are specified by
subelements.

                 EXTENDED does not by itself define a resource and is normally
>remote from the resources.  

If my memory of the last ERB discussion of this is correct, EXTENDED will
be able to be one of its own resources in the next draft of the link standard.

In other words, EXTENDED can be used just as SIMPLE is, differing only in
the syntax by which the other link ends are addressed.

>I can see how a bi-directional link might be constructed from EXTENDED 
>[It's other multiplicities I don't feel so happy with.]  Does this 
>example capture it?  

Yep.

><P> Friends, Romans, Countrymen, <WORD ID="W1">lend</WORD> me your 
><WORD ID="W2">ears</WORD></P>.
>...
><ANNOTATION XML-LINK="EXTENDED" ID="link1">
><POINTER XML-LINK="LOCATOR" HREF="#W1" ROLE="verb">
><POINTER XML-LINK="LOCATOR" HREF="#W2" ROLE="noun">
></ANNOTATION>
>...
>We therefore have a bidirectional link between the verb and the noun, so
>that each of them can locate the other.  

Per the discussion of directionality above, it's more useful to say that
the ANNOTATION link is a "two end" link, rather than "bi-directional", as
the allowed directions of traversal are independent of the number of anchors.

                                                 Therefore, in JUMBO, there
>has to be a pointer which is available to each Node.  My temptation would be
>for each node to carry a hashtable of links to other nodes so that (say)
>when W1 was asked what it linked to it would come up with a list of the
>Nodes at the other end of its links.  W2 would be such a node.  On the other
>hand it might point to the LINK (i.e. link1, and it might be clear from the
>'contents' of link1, what the other end was.  Is this too restricted?

The way I implemented this in my ADEPT code was to build the following
tables in memory as a result of processing all links in all documents
within a bounded document set:

1. For each node, what link ends it is a member of
2. For each link end, what link it is an end of
3. For each link element, what link ends it has (remembering that a link
   end is an abstract object listing the members of that end)
4. For each link end, its defined role (remembering that each link
   end has a defined role [the "anchor role" in HyTime terms]).
5. For each link end, objects that are a member of it.
6. For each link end, the values for the various HyTime-defined
   link end (anchor) properties: link traversal, list traversal, etc.

The key to these tables is the management of links by managing link ends as
virtual objects, from which all other information can be gleaned.

>From these tables, I can get from any object that is a member of any link
end to any member of any of the ends of the links it is a member of.  Given
a node, I look it up in the "node-to-link-end" table.  For each link end
the node is a member of, I then look up the link end in the
"link-end-to-link" table and then look up the other link ends
("link-to-link-ends" table) of that link.  For each link end, I look up the
members of those link ends ("link-end-to-members") and thus get a list of
all the nodes the starting node is linked to, classified by link type and
anchor role.

I build these tables as a start-up process applied to all documents in the
set, but you could also do it only for a single document and then only
enable traversal from those link end members you know about from processing
the links in that document (thus the motivation in XML Link for having a
document that contains nothing but links to be used as a starting point).
As links are traversed to new documents, you can process the links in those
documents, adding to your tables as you go.  

>I am not clear how this extends to 'multidirectional links'  Here is a
typical
>problem.
>
>to <WORD ID="W3">bear</WORD> the <WORD ID="W4"> slings</WORD> and 
><WORD ID="W5">arrows</WORD> of
>...
><ANNOTATION XML-LINK="EXTENDED" ID="link2">
><POINTER XML-LINK="LOCATOR" HREF="#W3" ROLE="verb">
><POINTER XML-LINK="LOCATOR" HREF="#W4" ROLE="noun">
><POINTER XML-LINK="LOCATOR" HREF="#W5" ROLE="noun">
></ANNOTATION>
>...
>Here I want to indicate that the verb 'bear' links to two nouns at the
>same time and that each noun points to 'bear'.  But it isn't obvious that
>this is the case (unless perhaps ROLE is used for that, and that doesn't
>seem general).  

Yes--the use of ROLE is the key: all the members of ends with the same role
are members of the same (virtual) link end.  Thus, the above is a two-ended
link relating the single verb object to the two noun objects. [See
discussion below for more on this issue.]

If there were three roles (noun, verb, subject), there would be three link
ends.

If you're interested in my data structures and algorithms, you can find my
ADEPT*Editor HyTime code at http://www.isogen.com/demos/hylibcmd.html.
ADEPT*Command language is very similar to Perl and C, so anyone familiar
with those languages should be able to figure out what's going on.  I've
tried to comment the code as completely as I could, especially with respect
to the data structures.

I don't claim that my particular implementation is necessarily the best,
but it seems to work so far.  I think I need to augment it to better
capture the stages of indirection used to address individual
nodes--currently I only capture the result of addresses, which limits my
ability to delay address resolution and provide complete error reporting
and debugging facilities (very important in an editor, if not in a browser).

Here is a brief XML-to-HyTime terminology translator (my understanding or
use of XML terms may not be accurate, caveat emptor):

<dl>
<dthd>XML Term</dthd>
<dt>resource</dt>
<dd>No direct mapping, as HyTime (and SGML) distinguish storage objects
from addressible objects within storage objects.  However, resource most
closely maps to "node in grove", as that's what HyTime is always 
ultimately addressing.  When storage objects are the thing named by the
address syntax (e.g., a URL, entity SYSID, etc.), HyTime (or the notation
itself) defines rules for getting a grove from the storage object.
XML sometimes uses resource in the way that HyTime uses "anchor" or "anchor
member", but doesn't make the same formal distinction between anchors and
members of anchors that HyTime does (see below).
</dd>
<dt>linking element</dt>
<dd>In HyTime, any element derived from any of the HyTime hyperlink
forms hylink, clink, agglink, varlink, or ilink.  HyTime distinguishes
hyperlinks from forms of reference used to establish purely structural
relationships ("value reference").  SIMPLE can be derived from hylink
in the same way that clink is itself derived from hylink.  SIMPLE could
also be derived from clink.  EXTENDED can be derived from varlink
(in fact we designed varlink specifically to enable direct derivation
of EXTENDED, see my recent post to the XML WG list).  The only difference
between these forms is the syntax by which the anchors are addressed (and,
in the case of clink and agglink, the fixing of the anchor roles in 
the HyTime standard to reflect common practice).  All HyTime linking
forms are semantically identical.
</dd>
<dt>locator</dt>
<dd>"Location address".  HyTime defines the general notation of 
attributes and content as being potentially "referential", meaning that
they contain what XML calls a "locator".  HyTime defines a specific
element-based syntax for representing indirect location addresses.  HyTime
also lets you use other forms of address by defining them formally as
queries that return nodes in groves.  (Thus, XML's locator syntax can be
defined as a query notation to HyTime by formally defining how XML locators
address nodes in groves--this is done already to a large extent by
reference to the underlying TEI spec, which says that TEI extended pointers
use the SGML property set and HyTime default grove plan for addressing SGML
documents.)  My personal recommendation is that the developers of
HyTime-aware systems implement support for URLs, TEI extended pointers, and
XML pointers as query notations that are integrated out of the box, both
because they are in common use and because they provide a convenient syntax
for addressing when you don't need HyTime's indirection machinery.
Note that the existence of the XML link spec does not preclude the use of
HyTime indirect addressing with XML documents.  Having implement support
for TEI locators, support HyTime's indirection syntax and semantics is not
that much more effort.
</dd>
<dt>label</dt>
<dd>No HyTime analog.  HyTime doesn't define a specific mechanism for
labeling links or anchors as it's not relevant to the level of semantics
HyTime defines and should be left open to specific applications.  XML's
definition of such an attribute and the meaning for is entirely appropriate
and useful.
</dd>
<dt>traversal</dt>
<dd>HyTime defines the same meaning.  In addition, HyTime defines a default
mechanism for describing the traversal constraints on anchors. However,
this mechanism is probably more than XML link needs and XML Link correctly
avoids it in preference to a simpler mechanism that matches the
expectations of most Web users and browser vendors.
</dd>
<dt>multi-directional-link</dt>
<dd>HyTime doesn't formally define this concept in isolation, although the
HyTime link traversal rules do define a way to express this constraint.
HyTime does make the same distinction between "go back" or "return" and
bi-directionality.
</dd>
<dt>in-line link</dt>
<dd>"Contextual" link.  In HyTime, any link can, potentially, be one or
more of its own anchors.  If that anchor also allows traversal initiation,
then the link is said to be "contextual" in that it presumably occurs in a
context from which it could be used to initiate traversal, as opposed to
being somewhere else (possibly inaccessible to users).
</dd>
<dt>out-of-line link</dt>
<dd>"independent" link, i.e., a link that is not contextual (because either
it is not self anchored at all or it is self anchored but the self anchor
does not allow traversal initiation).
</dd>

HyTime also makes a distinction that the current XML link spec appears not
to make between "anchors" of links and the members of those anchors.  In
HyTime, a link anchor is a virtual object consisting of all the objects
addressed as a given anchor role within a single link type for an instance
of that type.  The XML link spec appears to conflate anchors and the
members of anchors into the term "resource" (in that it doesn't distinquish
the objects addressed from their organization within a particular role of a
link).

The current XML Link spec doesn't clearly define the meaning of having
multiple locators with the same role.  I've interpreted it in the only way
that makes sense to me (probably because it's the HyTime way).  My logic is
that choosing the same role name within a link expresses common grouping
under the semantic lable of that role, so it follows that the objects
addressed for that role should be grouped together for access.  There
doesn't appear to be much difference between:

resource "W3" role: "verb"
resource "W4" role: "noun"
resource "W5" role: "noun"

And:

Role "verb":
  resource "W3"
Role "noun":
  resource "W4"
  resource "W5"

Note that, baring traversal restrictions, the traversal result (the things
you can traverse to) is the same in both cases.  The only difference is how
the semantic groupings are organized.

The real question is not one of traversal, but one of relationship
representation: can an observer of the link element tell whether the author
meant for the two nouns to be grouped under a common label or was the
presense of two nouns a coincidence? With formal anchors, it must be the
first, because all resources with the same role are, by definition,
semantically grouped under that role.  Without formal anchors, it's up to
the link creator to indicate what they meant.  If your addressing method is
incapable of addressing multiple objects (e.g., normal URLs), then you
can't depend on addressing multiples from a single Locator to indicate the
intended role grouping.  Thus, in my opinion, the only reliable
interpretation is that roles define semantic groups (anchors) independent
of how they are specified syntactically.  FWIW.

Cheers,

E.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtauber at jtauber.com  Sun Jun  1 07:12:19 1997
From: jtauber at jtauber.com (James K. Tauber)
Date: Mon Jun  7 16:57:53 2004
Subject: Entity replacement
Message-ID: <01BC6E8D.A7E3CEA0@dial126.cygnus.uwa.edu.au>


Am I correct in thinking that one benefit of requiring entity and element 
structure to be synchronized is that a well-formed document is also 
well-formed before general entity replacement; ie you can parse the 
document before entity replacement, parse the entities and then just insert 
the parse tree of the later into the former?

If this is true, then is there some similar constraint that could be 
applied to use of parameter entities? This might already have been done in 
the choice of where to use % in the productions but I can't quite work out 
the pattern. It would be nice if parameter entity replacement could be 
described without recourse to the % notation in the spec's productions. 
Would it be helpful, for example, to increase the number of non-terminal 
symbols in the grammar and then specify which non-terminal symbols can be 
replaced by parameter entity reference?

PS Can people check out http://www.jtauber.com/xml/ and let me know 
(off-list) what they think and what could be added?

Thanks

James
--
James K. Tauber / jtauber@jtauber.com
Perth, Western Australia


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From peat at erols.com  Sun Jun  1 18:13:30 1997
From: peat at erols.com (Peat)
Date: Mon Jun  7 16:57:53 2004
Subject: A few thoughts on XML and EDI
Message-ID: <199706011613.MAA10043@smtp2.erols.com>

XML/EDI
Advantages of including Electronic Data Interchange (EDI)
entities with eXtensible Markup Language (XML)


The advantages of including Electronic Data Interchange (EDI) entities with
eXtensible Markup Language (XML) differs for each camp.  

-- For the EDI camp the unification means making application implementation
easier, allowing for quicker reach into vertical markets, reduced message
stores when processing transactions, and most importantly enabling
document-centric tools such as search engines and Internet "push" products
to supplement database mechanisms.  

-- By assuring EDI compatibility, the XML camp gains almost instance use
among thousands of companies.  XML will gain a common extensible data
entity definition which has under gone the test of time. 

The bottom line:  the XML camp gains Fortune 1000 support and the EDI camp
gains a common presentation protocol. 


If the combination can bring this much to the table why hasn't it been done
before now?

The attempt to combine structured presentation with structured data for
transactions is not new.  The last attempt ended a little over a year ago. 
At that time the researchers of the Joint Electronic Document Interchange
(JEDI) project which were managed through the Division of Learning
Development Research Group at De Monfort University Leicester, the Computer
Science at University College London, and the Document Interchange project
at UKERNA completed their study.  The project's intent was to analyze the
current international and industry de facto standards that are in use for
electronic document creation, transfer and presentation. The project was to
identify the set of common elements that would allow the conversion of both
logical and layout aspects of a document. The documents would then be
viewed using a WWW type browser that was available for common computer
platforms.  The JEDI project concluded that SGML is ideally suited for EDI
as it is text based and is independent of platform and operating system.
The actual results were a little disappointing in that the world was and is
still not ready for an SGML/DSSSL implementation.


What has changed, for us to try again?

It is a year later, and in the Internet timeframe this is plenty for
momentum to shift.  Due for release sometime this summer is an important
specification to WWW browser-based applications - the eXtensible Markup
Language (XML).  The intent is to make the rather rich HTTP protocol even
richer.  It is a scaled down simpler version of SGML, in fact the one of
the goals of the specification is to  "...be straightforwardly usable over
the Internet."  The key here is "straightforwardly usable."  This flavor in
the design of XML which is why the specification will succeed for
transactions where the SGML/DSSSL failed.  This is not to say that
SGML/DSSSL wouldn't work, but more a reflection on us accepting change. 
Change sometimes needs to be taken in a series of steps - XML is the next
step.


What about the momentum with XML?

XML, managed by the World Wide Web Consortium (W3C) working group, will no
doubt become the next significant enabling technology for the Web. XML will
provide Web publishers and consumers  with unprecedented power, flexibility
and control over the creation of and access to Internet and intranet
content. To date the XML specification is backed by SoftQuad, Adobe, IBM,
HP, Microsoft, Netscape, Lockheed Martin, NCSA, Novell, Sun, Boston
University, Oxford University, and the Universities of Illinois and
Waterloo. In addition to the authors of the specification, about 30
companies already support the CDF; Channel Definition Format, an XML
application which brings to the Internet various "push" operations. 
Netscape and Microsoft and have already pledged XML support in their future
WWW browser releases.  And many corporations are being added to the list as
they learn of the specification's existence and capability.  


What could the EDI entities look like?

The general format of the transaction would be described in HTML.  The EDI
segments and elements could go something like this...
....
DUNS Number:
<![CDATA[<N101>FR</N101><N103>1</N103><N104>123456</N104>]]>


 <or>

DUNS Number:
<![CDATA[<N1>FR*1*123456</N1>]]>


The above items are just a thought.  Hopefully, when both camps view the
above lines, they see only a slight modification to the methods implemented
today.  To include the right hooks, CDATA or other XML entities might have
to include some specific syntax for EDI.  The details, though not many, can
be ironed out by the excellent authors of both camps.


So then XML documents are really just EDI templates, Right?

Yes and no.  Yes the documents can be used as templates.  But in addition
to this application, the XML document can also be a transaction itself. 
XML/EDI would allow in a non-proprietary way, for structured presentation
format to be included now in the transaction.  Combined effort in template
or application form creation and development is estimated in the thousands
of man-years, not hundreds.  Soon there will be a standard which to share
the work others have done, applications need only to simply access WWW
browser objects.  This object-based approach to applications will make
document transaction exchange even easier.  Bottom line: The EDI camp could
leverage XML to aid in lowering implementation costs. 

In addition to templates, and transactions, tools are available today to
store, search, route, narrowcast and maintain information in document-form.
By adding defined data entities, these tools can be enhanced to make EDI
processing and integration much easier.  Database, EDI specific, and
application programming tools were for the longest time the only choices,
the only options for EDI administrators.  XML/EDI will give the EDI
administrator more choices.


If presentation elements are included in the transaction what happens to
our transmission bandwidth? 

The transaction would certainly require more bandwidth as compared to EDI
specification today.  The additional strain on a corporation's
infrastructure must be weighed with those advantages gained by the use of
XML/EDI on a case by case basis.  It is estimated that the XML/EDI-based
transactions would add about 15% to the size of the current transactions. 
In the cases where this increase is significant, the XML/EDI standard
documents can replace proprietary templates, which would still allow for
use of document-based tools internal to the organization. 


Where do we go from here?

- Introduction of the two camps - XML and EDI
- Education of both camps of the others existence, tools and implementation
methods
- Assure that the proper hooks are in XML to support EDI
- Create an EDI application for the Extensible Markup Language (XML)
- EDI "mappers" must add XML parsing to their front-end logic.


Please reference

Joint Electronic Document Interchange (JEDI)
http://www.sil.org/sgml/gen-apps.html#jedi


EC/EDI References
 
Electronic Commerce Resource Center http://www.ecrc.ctc.com
EC/EDI Jumpstation http://www.premenos.com/Resources/Organization
Overview of EC can be found at http://www.dmx.com
Listing EC sites: http://planetx.bloomu.edu/~jsdutt/EC-urls.html

Mailing list devoted to issues of EDI:  To subscribe to the list, called
EDI-L, send an Email message to  listserv@uccvma.ucop.edu with the line 

subscribe edi-l firstname lastname 

(l for List, not numeric 1) in the message area; To send a message to the
mail list, address messages to edi-l@uccvma.ucop.edu


XML References

XML Press Release (SoftQuad) http://www.sq.com/press/releases/prmar1197.htm
The XML W3C Working Draft is at
http://www.w3.org/pub/WWW/TR/WD-xml-961114.html
eXtensible Markup Language Site http://www.jtauber.com/xml/
Channel Definition Format application for XML
http://www.microsoft.com/standards/cdf.htm

Mailing list devoted to issues of XML: To subscribe to the list, called
XML-DEV, send an Email message to majordomo@ic.ac.uk with the line 

subscribe xml-dev name@address

(where name@address is your actual email address) in the body of the
message. To send a message to the mail list, address messages to
xml-dev@ic.ac.uk


Bruce Peat
peat@erols.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Jun  1 20:04:06 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:53 2004
Subject: XML-LINK
Message-ID: <7417@ursus.demon.co.uk>

Many thanks indeed Eliot,

	This is extremely useful and I think I follow everything you have 
put forward.  [It's taken me several months to get to that stage, so
it's clear that for webhackers, rather than rocket scientists there is a
longish learning curve - at least until real applications become common].

I will some back to some of the detalied points shortly.  Eliot's reply 
confirmed my suspicion that there could be different interpretations of
the role of XML-LINK ('structure' and 'annotation').  If this is not realised
then it would easy to create software for XML-LINK which was inappropriate 
in the wrong context, and it could be very confusing for newcomers.  
There is also the strong likelihood that *some* XML-LINK processors will be 
tightly bound to particular applications (browsers, database engines, etc.).  
The converse may be that a general XML-LINK engine (covering both approaches 
above) might be described in language sufficiently abstract that newcomers 
to XML might fail to understand its purpose and value.

It would be extremely useful to see where XML-DEV readers see a link-processor
in the XML architecture.  From Eliot's reply I see it as a browser-independent
engine which answers queries about links regardless of what use they are
to be put to.  (It presumably *holds* the traversal information, but simply 
hands it to the querying engine.)  So JUMBO should be independent of the 
link processor.  When JUMBO was acting as a generic browser and a node was 
actuated/processed/arrived_at, etc. JUMBO would query the link processor as 
to whether it had information about this node.  If so, it would decide whether
to act upon it.  However I assume an application could instruct JUMBO that
certain Nodes (or collections of nodes) required information from the link
processor, such as whether they were part of a DAG, linked list or
whatever.  Then it could extract the whole structure independently of 
behaviour (haven't thought this through in detail :-).

So I am extremely wary of starting to code anything more in this area until
its limits become clear.  It's clearly venturing into rocket science
territory.  By comparison XML-LINK=SIMPLE is relatively straightforward.
Therefore I suspect there will be implementers (like myself) who find that
full XML-LINK implementation is too difficult/expensive/undefined, whilst
SIMPLE is useful and doable.

IMO it will be valuable to have an application-independent link processor for
XML-LINK.  Is this likely to happen?  Or is this only really conceivable 
in very large organisations?

	Once again many thanks

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eliot at isogen.com  Sun Jun  1 22:06:55 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:57:53 2004
Subject: XML-LINK
Message-ID: <3.0.32.19970601150222.006c3028@swbell.net>

At 09:53 AM 6/1/97 GMT, Peter Murray-Rust wrote:

>So I am extremely wary of starting to code anything more in this area until
>its limits become clear.  It's clearly venturing into rocket science
>territory.  By comparison XML-LINK=SIMPLE is relatively straightforward.
>Therefore I suspect there will be implementers (like myself) who find that
>full XML-LINK implementation is too difficult/expensive/undefined, whilst
>SIMPLE is useful and doable.

Remember that there are two basic "modes" of link processing: 

1. The "I know everything mode", in which you need both to know the
boundaries of the documents you need to know about (i.e., a "bounded object
set") and a general processor capable of doing the processing and holding
the result for some reasonable length of time. This is the
HyBrowse/Hyper-G/Intranet approach, in which the bounds of the system are
fairly well defined and you have the infrastructure you need to manage all
the link information more or less persistenty.

2. The "I only know about what I have seen or am now seeing" mode.  This is
the normal "Web" mode or the Panorama mode (except that Panorama only
remembers what it knows about the current document, unfortunately).

If you constrain your links to be in line ("contextual"), whether using the
SIMPLE or EXTENDED syntax, then the second mode can always be applied and
any XML browser should be capable of handling it (if you can handle SIMPLE
you can handle inline EXTENDED links and EXTENDED links in the document
you're processing).  This is not an unreasonable constraint to have as it
greatly simplifies processing (the Web is forced to impose this constraint
for the general case, as the Web is so large it is effectively unbounded).

Note that the two modes can be combined, such that you might know
everything about one set of documents but only about pointers to another
set.  The data structures needed to manage knowledge of the links is the
same in both modes, the only question is when do you gather the knowledge?

Cheers,

E.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From cbullard at hiwaay.net  Mon Jun  2 01:47:01 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun  7 16:57:53 2004
Subject: XML-LINK
References: <3.0.32.19970601150222.006c3028@swbell.net>
Message-ID: <339209C3.3F5C@hiwaay.net>

W. Eliot Kimber wrote:
 
> Remember that there are two basic "modes" of link processing:
> 
> 1. The "I know everything mode", in which you need both to know the
> boundaries of the documents you need to know about (i.e., a "bounded object
> set") and a general processor capable of doing the processing and holding
> the result for some reasonable length of time. This is the
> HyBrowse/Hyper-G/Intranet approach, in which the bounds of the system are
> fairly well defined and you have the infrastructure you need to manage all
> the link information more or less persistenty.

Isn't this also the mode of WinHelp?  WinHelp uses a restricted target 
set (footnotes in page delimited chunks), but the rest of the
hyperlinking 
is topical based on the project file (defines the objects whose topics 
are targets) and the #define files (string and paired ID) for the 
software to use for contextual help, and the compiled files.

> 2. The "I only know about what I have seen or am now seeing" mode.  This is
> the normal "Web" mode or the Panorama mode (except that Panorama only
> remembers what it knows about the current document, unfortunately).

How does Panorama store linking information?

> Note that the two modes can be combined, such that you might know
> everything about one set of documents but only about pointers to another
> set.  The data structures needed to manage knowledge of the links is the
> same in both modes, the only question is when do you gather the knowledge?

Well, not just when, but how is it packed if it is in separate files for 
different processors?  Consider the winHelp model.  I ask because my 
guess is a very high percentage of the legacy hypertext in the world 
right now in need of conversion is WinHelp.  That means taking the 
separate pieces and mapping them to the XML-n models.

len

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eliot at isogen.com  Mon Jun  2 05:57:14 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:57:53 2004
Subject: XML-LINK
Message-ID: <3.0.32.19970601225027.006aa5e0@swbell.net>

At 06:46 PM 6/1/97 -0500, len bullard wrote:
>> 1. The "I know everything mode", in which you need both to know the
>> boundaries of the documents you need to know about (i.e., a "bounded object
>> set") and a general processor capable of doing the processing and holding
>> the result for some reasonable length of time. This is the
>> HyBrowse/Hyper-G/Intranet approach, in which the bounds of the system are
>> fairly well defined and you have the infrastructure you need to manage all
>> the link information more or less persistenty.
>
>Isn't this also the mode of WinHelp?  WinHelp uses a restricted target 
>set (footnotes in page delimited chunks), but the rest of the
>hyperlinking 
>is topical based on the project file (defines the objects whose topics 
>are targets) and the #define files (string and paired ID) for the 
>software to use for contextual help, and the compiled files.

Relating parts of a program to entries in help files is really nothing more
than providing a query interface to your browser that lets you find things
(the entries) by property (the program object name or menu hierarchy or
whatever).  Not really hyperlinking in that sense, just plain old access.

You could, however, represent the relationship between the program objects
and the code objects by creating a hyperlink that used queries that were
interdependent, e.g.:

<help-to-code-link xml-link='extended'>
 <locator id=entries role="help-entry">
 select(entries_with_IDs(help_file($help.filename)))
 <!-- Returns list of help entries in file named by variable
      "$help.filename" -->
 </locator>
 <locator role="code-object">
 for_each(locaddr(has_id("entries")),
   select(code_objects($code.set), code_object_with_id(current_node()))
 <!-- Returns list of code objects whose names match the name
      of help entries returned by the "help-entry" role's
      locator. -->
 </locator>
 <!-- NOTE: in HyTime, you would also indicate that these two anchors
      are "correspondent anchors", meaning that for each member of one,
      there must be exactly one member in the other and that the two
      lists correspond in the order they are addressed.  Here, the
      use of the first anchor's list in generating the second ensures
      that these constraints hold.
  -->
</help-to-code-link>

Because the "code-object" role's query depends on the "help-entry"
role's query (the second iterates over the first), this single link defines
the assocation between all code objects and all help entries for a single
help/program pairing.  You could also create one such link for each pair,
but we really just need to express the intent, as the implementation will
probably be hard coded.

Note that this is very similar to using a DynaText style sheet to associate
hyperlinking style with element types, except that here the relationship is
defined more abstractly and distinct from any particular implementation of
it (apart from interpretation of the query notation itself, which in this
case, I've made up for this example).  For example, given the link above, I
could use it as a specification to guide me in creating the equivalent
DynaText style functions and SDK extensions to make DynaText into a help
system.  The link above defines the relationship semantics, the programmer
of the DynaText customization implements it.

>> 2. The "I only know about what I have seen or am now seeing" mode.  This is
>> the normal "Web" mode or the Panorama mode (except that Panorama only
>> remembers what it knows about the current document, unfortunately).
>
>How does Panorama store linking information?

For documents read in, it just keeps it in memory.  For Webs you create
with it, it creates HyTime documents with the necessary location addresses.
 Apart from Webs, it never keeps linking information around from documents
opened prior to opening the current document in the same Panorama session.
Panorama does provide bi-directional traversal for contextual (in-line)
links, but only within a single document--having used a contextual link to
traverse from one document to another, it doesn't remember the links that
started in the first document in order to provide the back links to any
clinks in the first that point to the second.  It does, of course, provide
a "go-back" feature, but that's different, as going back is not traversal
(as both XML Link and HyTime are careful to point out).

>> Note that the two modes can be combined, such that you might know
>> everything about one set of documents but only about pointers to another
>> set.  The data structures needed to manage knowledge of the links is the
>> same in both modes, the only question is when do you gather the knowledge?
>
>Well, not just when, but how is it packed if it is in separate files for 
>different processors?  

Who cares?  We're talking about implementation, not data representation.
If you want to interchange the result of building these data structures,
use the HyTime property set, build HyTime semantic groves, and interchange
those, either as objects using some object interchange standard (e.g.,
CORBA) or spit out the equivalent canonical grove documents.  Or define a
standard relational schema.  Or define a document type that expresses that
tables.

We should expect some standardization of the APIs for communicating about
hyperlinks and their properties, but not standards for the representation
of the internal data structures programs really use (I wouldn't expect most
implementors to use the HyTime property set as their object model
directly--there's too much room for optimization, especially with respect
to tool-specific features). 

Cheers,

E.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Mon Jun  2 13:14:57 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:53 2004
Subject: XML-LINK and IDREF
In-Reply-To: <3.0.32.19970601225027.006aa5e0@swbell.net>
Message-ID: <0Zm96CAybpkzEwi6@light.demon.co.uk>

While we're on the subject of linking, I'm intrigued about the status of
the IDREF attribute type in XML.

In the SGML world, simple links within a document are mostly done with
IDREF -> ID, e.g.:

<div1 id="chap.beginnings"><head>New Beginnings</head>
<p> ...
...
<p>As we saw earlier in <ref target="chap.beginnings">Chapter 5</ref>,
...

(where the TARGET attribute has type IDREFS).  Browsers such as Panorama
recognise and support these links natively.

Is the intention in XML that IDREF(S) attributes are only supported "for
compatibility", and that the XML simple link should be used instead,
i.e.:

...
<p>As we saw earlier in <ref HREF="#chap.beginnings">Chapter 5</ref>,
...

(where REF is now defined as having attribute XML-LINK="simple") ?

This obviously has implications for XML processors.

Richard Light.

>xml-dev: A list for W3C XML Developers
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To unsubscribe, send to majordomo@ic.ac.uk the following message;
>unsubscribe xml-dev
>List coordinator, Henry Rzepa (rzepa@ic.ac.uk)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Mon Jun  2 14:22:01 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:54 2004
Subject: XML-LINK and IDREF
Message-ID: <7439@ursus.demon.co.uk>

In message <0Zm96CAybpkzEwi6@light.demon.co.uk> Richard Light writes:
> While we're on the subject of linking, I'm intrigued about the status of
> the IDREF attribute type in XML.

Agreed.  I had thought about this as well.  It's not easy to see how
both might be used fruitfully at the same time without confusion.

Formally, my understanding of ID/IDREF is that it is part of XML-LANG and
must be supported by XML-LANG processors ([50] Validity checks).  The IDREF
can only point to an ID in the same document (at least how I read it).  
Therefore one option is for implementers not to use XML-LINK and to use 
ID/IDREF for whatever purposes they wish (structural, annotation, and
with whatever behaviour.)  Although I'm not an SGML expert I imagine this
is frequently done already.

The advantage/characteristic of ID/IDREF is that checking is at syntactic
level (i.e. parsers are required to analyse it).  [I am not sure what
status ID/IDREF has in WF documents, as it will only know which the IDs are
if there is an ATTLIST, i.e. a DTD or DTD fragment is included].

> 
> In the SGML world, simple links within a document are mostly done with
> IDREF -> ID, e.g.:

It is part of the language.  In XML-LINK we are introducing another part
of the 'language.'

> <div1 id="chap.beginnings"><head>New Beginnings</head>
> <p> ...
> ...
> <p>As we saw earlier in <ref target="chap.beginnings">Chapter 5</ref>,
> ...
> 
> (where the TARGET attribute has type IDREFS).  Browsers such as Panorama
> recognise and support these links natively.

'Support' is presumably application-dependent.  Presumably browsers have
something like:  ACTUATE="USER" SHOW="REPLACE" as default - if an IDREF is 
discovered it is announced to the user who can navigate from there.

> 
> Is the intention in XML that IDREF(S) attributes are only supported "for
> compatibility", and that the XML simple link should be used instead,
> i.e.:
> 
> ...
> <p>As we saw earlier in <ref HREF="#chap.beginnings">Chapter 5</ref>,
> ...

XML-LINK can also point outside the document.

> 
> (where REF is now defined as having attribute XML-LINK="simple") ?

The implementer and author is obviously given more help and guidance for
implementing links with XML-LINK sinec there are a number of attributes
which will be well documented and where usage will develop.  [There would
be nothing to stop DTD authors including ROLE, ACTUATE, SHOW with IDREF
but it would not be likely to be standard practice, whilst the usage
of these with HREF presumably will have a good communality of purpose and
implementation.].

> This obviously has implications for XML processors.

Yes!  The full XML-LINK spec is rather daunting for an implementer.  In
principle it could require writing something part of the way towards Hyper-G
or HyTime.  *How* far is what concerns me at present :-)  So my present approach
is to:
	- ignore ID/IDREF for my own DTDs
	- enable JUMBO to locate IDs.  [Note this is not trivial, because
		not all parsers provide this information at present.]
	- not provide special support in JUMBO [the application programmer can
		find the IDREFs and build their own stuff if 
		required.]
	- fully implement XML-LINK=SIMPLE (hopefully more or less on track at
		present.)
	- think furiously about EXTENDED.  From what Eliot has written, I 
		suspect it will be a lazy implementation - i.e. storing links
		as they are 'discovered' in documents and adding this 
		information to nodes.  It will be Web-like and unlikely to be 
		a complete linkset unless it becomes very clear how these are 
		created and used.


	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Mon Jun  2 15:06:37 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:54 2004
Subject: A few thoughts on XML and EDI
In-Reply-To: <199706011613.MAA10043@smtp2.erols.com>
Message-ID: <cZp3KFAOypkzEwHO@light.demon.co.uk>

In message <199706011613.MAA10043@smtp2.erols.com>, Peat
<peat@erols.com> writes
>XML/EDI
>Advantages of including Electronic Data Interchange (EDI)
>entities with eXtensible Markup Language (XML)
>
>What could the EDI entities look like?
>
>The general format of the transaction would be described in HTML.  The EDI
>segments and elements could go something like this...
>....
>DUNS Number:
><![CDATA[<N101>FR</N101><N103>1</N103><N104>123456</N104>]]>
>
> <or>
>
>DUNS Number:
><![CDATA[<N1>FR*1*123456</N1>]]>

Bruce,

Another approach (which is compatible with that used for the XML linking
specification) is to have attributes which identify certain elements as
holding EDI information.  That way, the EDI information is explicitly
labelled, and an XML processor can be asked to return it to an
application using standard API calls.

This approach means that the EDI information forms part of the logical
structure of the XML document, rather than being a CDATA 'implant'.  It
also means that users can define their own element types to hold EDI
information, so long as they label them with the agreed attributes.
Furthermore, it allows them to use XML's built-in validation facilities
to check for structurally valid input, e.g.:

in the DTD:
<!ELEMENT DUNS-GROUP (N101, N102?, N103, N104) >
<!ATTLIST DUNS-GROUP EDI-TYPE CDATA #FIXED "ANALYSED DUNS NUMBER">
<!ELEMENT N101 (#PCDATA) >
<!ATTLIST N101 EDI-TYPE CDATA #FIXED "DUNS NUMBER PREFIX">
...

in the document:
....
<DUNS-GROUP>
<N101>FR</N101><N103>1</N103><N104>123456</N104>
</DUNS-GROUP>

Note that the EDI-TYPE information is declared once and once only, in
the DTD, and does not add to the markup overhead in the actual document.

Richard Light
SGML and Museum Information Consultancy
richard@light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lex at www.copsol.com  Mon Jun  2 18:02:14 1997
From: lex at www.copsol.com (Alex Milowski)
Date: Mon Jun  7 16:57:54 2004
Subject: ANNOUNCE: DSSSLTK 1.0 Available
Message-ID: <199706021559.KAA07346@copsol.com>


ANNOUNCEMENT: DSSSLTK Available

The DSSSL Developer's Toolkit (DSSSLTK) from Copernican Solutions Incorporated
is now available for download.  A distribution may be obtained from:

http://www.copsol.com/products/index.html

This is the first of a set of DSSSL technology releases from
Copernican Solutions Incorporated.


What is the DSSSL Developer's Toolkit?
==========================================================================
This toolkit is similar in nature to the applet or serverlet architectures
developed by Sun Microsystems/JavaSoft.  This toolkit is a set of abstract
interfaces written in Java to allow application developers to work
with different Java-based DSSSL environments.


What does it do?
===========================================================================
This toolkit serves as an interface between difference DSSSL components.  
It represents an architecture for building DSSSL-oriented systems using the
Java programming language.


What is available?
==========================================================================
The DSSSL Developer's Toolkit contains the following:

  * Full source code to the interfaces and classes.
  * Javadoc for the API reference.
  * Configuration and makefile utilities for building the distribution.
  * A prebuilt zip file containing all the classes.


What is the purpose of the DSSSL Developer's Toolkit
==========================================================================
The DSSSL Developer's Toolkit was developed as part of the Seng DSSSL 
Environment.  One of the design constraints for the Seng engine was a
completely componentized system such that developers could integrate
their own implementations of components such as parsers, grove, 
processing engines and the other components would not be affected.

In solving this problem, Copernican Solutions developed the DSSSL
Developer's Toolkit as a set of abstract interfaces for accessing DSSSL
constructs.  These interfaces were developed under the premise that they
should be standardized and include the requirements of more than the
development efforts at Copernican Solutions.

Developers interested in standardizing the DSSSLTK should contact
Alex Milowski at alex@copsol.com.


What are the licensing restrictions for the toolkit?
===========================================================================
All the source is available free of charge and may be integrated into
other systems without licensing.


Is there an implementation of this toolkit?
===========================================================================
Yes, our Seng DSSSL engine implements this toolkit.  Included in Seng is
the Java SGML Parser Interface (JSPI) which builds groves from SGML document
sources using a native library based on James Clark's SP SGML parser.  Both
will be available for download soon.


==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eliot at isogen.com  Tue Jun  3 16:58:07 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:57:54 2004
Subject: XML-LINK and IDREF
Message-ID: <3.0.32.19970603084223.00f4ed80@mail.swbell.net>

At 12:58 PM 6/2/97 GMT, Peter Murray-Rust wrote:
>In message <0Zm96CAybpkzEwi6@light.demon.co.uk> Richard Light writes:
>> While we're on the subject of linking, I'm intrigued about the status of
>> the IDREF attribute type in XML.
>
>Agreed.  I had thought about this as well.  It's not easy to see how
>both might be used fruitfully at the same time without confusion.
>
>Formally, my understanding of ID/IDREF is that it is part of XML-LANG and
>must be supported by XML-LANG processors ([50] Validity checks).  The IDREF
>can only point to an ID in the same document (at least how I read it).  
>Therefore one option is for implementers not to use XML-LINK and to use 
>ID/IDREF for whatever purposes they wish (structural, annotation, and
>with whatever behaviour.)  Although I'm not an SGML expert I imagine this
>is frequently done already.

There are two problem with IDREFS and XML:

1. Without DTDs, it may not be possible to know what attributes are IDs and 
   which are references.  

2. IDREFs provide no direct way to address elements in other documents.
Therefore,
   if you want to enable IDREFs, you have to provide some indirection
mechanism
   that can transform an IDREF to an address into other documents.  This is
what
   HyTime and the TEI do by providing various location address element
forms.  If you don't do
   this, then you require documents to have different element types for
elements
   that use IDREFs and elements that don't.  This has the effect of
necessarily binding 
   element types to the forms of address they use, which should not
normally be
   necessary (because addressing is distinct from the semantics of
reference and
   therefore shouldn't necessarily influence the element type).

Unless I've misunderstood the current spec, XML Link doesn't provide any
ID-based
indirection method, so that pretty much elimitates direct ID reference in the
general case.  [However, using the pointer syntax, you can address elements
with 
IDs, but only through the use of an XML Link URL.]

Indirect addressing certainly complicates the processing--it requires you
to build
recursive processes and may impose significant processing overhead. On the
other
hand, indirect addressing is very powerful and lets you do things that are 
difficult or impossible otherwise, especially in terms of managing links and
addresses automatically, largely because you can isolate initial references
from
the details of the addresses of the things referenced.

Cheers,

E.
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Wed Jun  4 17:02:15 1997
From: nmikula at edu.uni-klu.ac.at (Norbert Mikula)
Date: Mon Jun  7 16:57:54 2004
Subject: (NXP)/Java/XML parser : Passing Error Information
Message-ID: <Pine.OSF.3.93.970604165310.27204A-100000@edusrv.edu.uni-klu.ac.at>


NXP, as of today, simply prints error messages to Stderr.
This is fine for now, but it is certainly not the best
way to do things.
 
There was a suggestion made to me, to throw 
an exception, but I think exceptions are not
the best solution as recovery from them is practically
not possible (From the level of the application
programm)
 
To my understanding there are several classes
of errors that can be passed along
 
1.) Warnings  
2.) WF violations
3.) Violations with respect to the DTD
4.) In general these errors that are reportable - if the user wishes 
 
Should they be handled differently ?

I was thinking in terms of "callback" functions. Like
I do it right now with the "Esis" interface.
 
How would you, as the user/developer community envision
handling this.
 
What information would you like to have passed along
to an application ? Error code, textual description (what
about localization ..).

Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtauber at jtauber.com  Wed Jun  4 17:23:45 1997
From: jtauber at jtauber.com (James K. Tauber)
Date: Mon Jun  7 16:57:54 2004
Subject: Entity replacement
Message-ID: <01BC713E.361E55C0@dial130.cygnus.uwa.edu.au>

> The % notation does, in effect, specify which non-terminal symbols
> can be replaced by p.e. references.

Not really because you get cases of %(...)

Now admittedly, productions that include this in their RHS could be 
rewritten with an additional non-terminal symbol, so that production [43] 
could be written

choice::='(' S? choicelist S? ')'
choicelist::=cps ('|' cps)+

And this is exactly what I would like to see done because you could then 
simply list (apart from the productions themselves) those non-terminal 
symbols that can be replaced by PEs.

Do other developers feel this would make it easier to go from spec to 
implementation?

Now, relating my previous parsing/GE query to PEs:
Is it easy, given the current syntax spec, to build a correct parse tree of 
a DTD before PE replacement?
If not, should it be?

James K. Tauber / jtauber@jtauber.com
Perth, Western Australia


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ddb at criinc.com  Wed Jun  4 22:30:13 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun  7 16:57:54 2004
Subject: (NXP)/Java/XML parser : Passing Error Information
Message-ID: <3.0.32.19970604132934.00a1a290@mailhost.criinc.com>

At 05:02 PM 6/4/97 +0200, Norbert Mikula wrote:
>To my understanding there are several classes
>of errors that can be passed along
> 
>1.) Warnings  
>2.) WF violations
>3.) Violations with respect to the DTD
>4.) In general these errors that are reportable - if the user wishes 
> 
>Should they be handled differently ?
>
>I was thinking in terms of "callback" functions. Like
>I do it right now with the "Esis" interface.

I would tend to agree that a callback mechanism would be the most useful
way to accomplish error handling.  I strongly believe that it should be
possible to break up error messages into different types, with all the
necessary information for a application to process the error itself (and
build it's own error message, open an editor to the correct file &
character in that file, etc).  It should be easy for an application to
ignore certain types of error messages (or, reverse that, it should be easy
to only pay attention to the ones it cares about).

Not having looked at NXP's code, I am not sure how it's current ESIS
interface works, but this is also a reasonable application of inheritance.
There might be  a general HandleError() method, which (by default) is just
a multiplexor to call separate methods for each type of error.  This is
akin to the AWT 1.0 event model.  Alternatively you could have a more
callback-like (AWT 1.1 like) model, which would be marginally more
difficult to code (for Norbert) but is slightly more elegant.  From my
point of view it is an even call, since both work and neither is
particularly horrid.  (Most of the issues for why the AWT event model was
changed do not apply here since this is a very specific case, not a general
event model.)

My main interest, with regard to what I would like in any parser I would
use, would be a clean mechanism to be able to handle errors in an
application specific way.  This is more than just, 'where do I print the
error message?' and includes the ability to write an editor application
which could use the parser to validate and then jump directly to the
line/character of any errors.  SP goes a ways toward that, but I would
prefer a hierarchy of errors, similar to some of the object oriented event
models that I have seen recently.

-derek
--------------------------------------------------------------
ddb@criinc.com || software-engineer || www/sgml/java/perl/etc.
  "Just go that way, really fast. When something gets 
      in your way, turn."  --  _Better_Off_Dead_

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From peat at erols.com  Wed Jun  4 23:20:48 1997
From: peat at erols.com (Peat)
Date: Mon Jun  7 16:57:54 2004
Subject: XML and EDI
Message-ID: <199706042120.RAA12840@smtp2.erols.com>

I wanted to share this with you. It is from David Weber posted early May on
the EDI-L mailing list.

- Bruce Peat

 
----------------------------------------------------------------------------
------------------

BOO!!! ARE WE ALL HISTORY???

This message is in two parts, in the first part I argue why DISA is about
to
become history, hit by a truck marked 'Microsoft - Internet EC', just
another piece of road kill along the path of computing history. Therefore
we should just switch tacks and pack our bags and figure out how to go and
build EC based systems.


In the second part I argue the opposite and introduce my vision of a 4 Tier
EDI that can become an active part of EC, and migrate from traditional EDI
to the new EC model and retain a role for DISA as a facilitator of future
commerce.


OK - first lets look at some recent press from the 10th DISA conference:

>>>>>> "EDI Expansion"

  "The EDI industry is currently growing at over 40 percent
annually," says Harvey Seegers, president and chief operating officer
of Rockville, Md.-based GE Information Services.  Seegers is a keynote
speaker at the conference.
  Seegers' message is "EDI has a future -- particularly if it
embraces Internet technology -- to extend its capabilities."  An
important component of a successful future for EDI is Web technologies
such as TCP/IP, which enable data to be moved across diverse computing
environments:  from desktops to mainframes, from Windows to UNIX.
Browser interfaces are another innovation that can breathe new life
into EDI by giving the Internet "a much friendlier interface to EDI,"
Seegers says.

  Other industry leaders are seeing examples of EDI growth as well.
  For instance, Stamford, Conn.-based Frontec is growing at a rate
of 40 percent annually, says Peter Stiles, Frontec's senior vice
president of consulting, whose presentation at EC/EDI `97 is about
using EDI in transportation planning and supply chain management.
  Wilton, Conn.-based APL Group reports "a tremendous" growth spurt
in the past six months, especially in its consulting business.  A
software and services provider, APL specializes in personal computer-
based EDI translation software and is preparing to move into offering
Internet and Web-based EDI systems.  APL is one of the exhibitors at
the DISA show, where it plans to announce a new electronic commerce
initiative with Washington-based telecommunications giant, MCI.
<<<<<<

Wake up and smell the Roses!  This says to me that all the big players
couldn't give a fig about old EDI.  New EC based stuff is going to use
HTML, CGI and even more importantly the upcoming XML, extendable
next generation HTML, along with perhaps some Java objects to exchange
business data, business facts, and processes.

(XML - eXtenible Markup Language :
http://www.gca.org/conf/xml/xml_what.htm)

This will make DISA style traditional EDI obsolete within a year.
Especially as the delivery mechanism between trading partners now
becomes an NT box running a Web Server and a hook to your friendly
local ISP for $20 a month.  Add in Microsofts 'Normandy' MCIS product
and now you can click, drag-drop out of Access or VB into your Web
Browser and your data is delivered.  Total integration of MS Office
and the backend delivery channels. (Full details at www.microsoft.com,
search on MCIS).

XML provides the glue to allow the receiving system to successfully
interpret
the received information and store it in the correct places in their
own databases.

Cross-platform is a snap, since if you need to go to a mainframe, the Web
server gives you that easily.  You can pass the data to a COBOL or CICS
process over on the big iron top do final updates.

Also - you can query in real time.  Your Web page can send an Account # to
a remote Web Server for validation, before continuing the data entry
process off your local Web Server with the end user.  Totally seamless.

Where does traditional EDI fit into this picture?  It DOES NOT!  It just
gets in the way.  It's faster to put a Web page and some HTML together
to capture the information that you need, and get people to interface that
way EVEN if they are using a BATCH process to generate HTML from their
database, instead of an expensive translate into EDI format.

So, OK, there is message overhead this way, but who cares?  Bandwidth
is cheap these days.  When EDI was first done 2400 bps was state of the
art.
Now ISDN costs less than 2400 did 10 years ago. 115200 bps can take alot of
HTML overhead.

Better still even new programmers straight out of college know HTML
inside and out and there are great GUI tools for slapping fields onto
forms and creating the programs to drive them so easily.

Plus, when you send someone your sample HTML form they can bring it up
in their FREE browser and understand it instantly.  (Try that with
your proprietary format EDI Implementation Guideline and EDI message format
that
only works with the EDI tool you have purchased).

OK - so lets read some more news - smell some more Roses ->

>>>>>>>>>>" Like many companies, Mobil Corp. was tantalized by the idea
of using electronic data interchange to swap information with business
partners.  But because of the pitfalls--high costs, burdensome software
maintenance and lack of real-time information--Mobil is eschewing
traditional hard-wired EDI and using an intranet for electronic commerce
in what analysts call a "leading-edge approach" to an area of EDI that's
now under intense scrutiny.  After a successful pilot, Mobil this month
began rolling out the new system to its more than 300 lube distributors,
the independent businesspeople who handle Mobil's "heavy
products"--packaged or bulk industrial oils and greases.  The intranet
integrates Mobil's mainframe data with an Oracle Corp. database that
holds product information.
<<<<<<<<<<<

Can I call them or what?  There's more:

>>>>>>>>>>>
  Previously, Mobil had implemented two different EDI systems--one
DOS-based and the other Windows-based--that transmitted business
documents over a VAN (value-added network). Mobil encountered many of
the problems that have stymied the growth of EDI. VAN charges for using
the hard-wired networks topped $100,000 a year. Maintenance was
burdensome: Every time Mobil changed a business rule, new software had
to be sent to each dealer and installed on their desktops. Inventory
information was updated only once a week.
  Because of this, dealers communicated with Mobil through a hodgepodge
of EDI, phone calls and faxes: a system of redundant data input that led
to many time-consuming errors. Mobil began looking for a new system when
its lube group made improving communications with dealers a top
priority. The Internet was an obvious solution.
  An intranet approach for EDI didn't require a hard-wired network, so
VAN charges disappeared. When business rules are altered, Mobil only has
to make changes once, test the new rules and put them on the
server--they become immediately available. "Our customer support people
are excited because distributors can look up the information and make
the transaction electronically, so the support people's phones won't be
ringing off the hook with questions," said Hawkins.
  Mobil's business rules are embedded in the system's Java applets. The
system immediately alerts a distributor if he or she is entering an
incorrect order--say, asking for a product in an unavailable package
size or making an order that is too large for a truck shipment.
<<<<<<<<<<<<<<<

Just what I've been saying.  This last piece is key.  The Java is the
transport layer that links business rules and data handling into the
whole.  OK, but the NAY sayers can still point to some holes "

>>>>>>>>>>>>>
"Mobil's approach is a leading-edge way to do this kind of
application," agreed Rick Drummond, a consultant in Forth Worth, Texas,
who has been helping develop security standards for EDI over the
Internet. Yet he still has reservations. Said Drummond: "The impact
outside this limited application is pretty low because it's not clear if
it's dealing with the interoperative issues" raised by EDI systems that
are not as closed as Mobil's.
<<<<<<<<<<<<<<<<<

Yep, but guess what, this is just a matter of time, NOT technology. XML
is with us, and it will provide that missing piece, along with use
of Java, to be able to link components of one system to another.

So - I can embed rules in XML or Java into my LOCAL data processing
system, and have Mobil et al send me those when they update them.
Thus allowing this thing to move to the next level in a way that
EDI was never able to.

In fact it gets even worse.  The new XML provides all that missing Object
Orientated
transport layer that DISA has been haggling over, built right into your
Web Browser and Web server.  (As an aside DCOM and or CORBA?  Who cares?
As an end user, there is no need to worry about such MIDDLEWARE issues,
since the browser companies will always have to provide a layer that can
transport your objects at the Web server end. DCOM, CORBA? You never see
this! Your
JAVA or XML toolkit and execution environment handle all this 'bits and
bytes' stuff).

Microsoft and Netscape have had to address these issues for their new V4
browsers, not because of EC or EDI but for distributed programming
and client/server deployment reasons.

Just so happens they nailed the EC and EDI side too.

So where does DISA fit into this model?  It does not!  If I'm Mobil
and I'm implementing the next application, I just create my HTML,
XML, Java and do it.  Then my trading partners collect those components
off my Web server and use them to exchange the information we need.
I can even generate all this stuff straight off the SQL database
definitions I already have loaded up in my CASE tools and database
dictionaries.

DISA, who him??

==========================================================================
The Second Part of this story:  4 Tier EDI to the Rescue!

>From DISA's 10th conference I see alot of haggling over who is right.
(Kind a like Nero fiddling while Rome burns around them?)

Let's roll the video tape: >>>>>>>>>>>>>


Modelling a Better EDI

  Many in the EDI standards community hope the creation of a new
type of EDI standard will open up big new markets for the standardized
technology among the nation's estimated 4 million small and medium-
sized businesses.  The new EDI would have to be simple to implement at
little cost.  An important aspect of that future EDI standard will be
that it must also maintain backwards compatibility with the existing
versions of EDI standards, including the standards of ASC X12 and the
United Nations.
  Standards discussions at the DISA show include one by Klaus-
Deiter Naujok, standards manager at Concord, Calif.-based Premenos
Corp.  His subject is Object Oriented EDI, a new proposal for future
EDI developed under the auspices of the United Nations EDIFACT's
CEFACT.  The proposal, made public for the first time at the
conference, makes use of object-oriented techniques to model business
scenarios into business objects.
  Naujok says object-oriented modeling holds the possibility of
making EDI easier to use and less costly.

More Than One Way

  Dan Codman, co-founder of the Wilton, Conn.-based APL Group and
chairman of X12C, the communications and controls subcommittee of ASC
X12, says he is not confident object-oriented techniques are relevant
to EDI standards.  However, ASC X12's Strategic Implementation Task
Group, formed to represent U.S. positions on EDI to the CEFACT, will
look at modeling techniques to aid the development of a new EDI
standard.  It will have to be able to be used around the world,
cheaply implemented, and understood by anyone who uses it, Codman
says.
  David Files, the leader of ASC X12's Business Information
Modeling Group, says Object Oriented EDI fails to address the needs of
large companies doing high-volume EDI.  Such companies, the majority
of those already EDI-enabled, will find that OOEDI is less efficient
for large volumes, he warns.
  Estimates put EDI usage in the United States at only 5 percent of
the potential.  Reaching the hundreds of thousands of small and mid-
sized companies not yet doing EDI is a crucial factor in future
growth, industry experts say, and a new EDI model is one of the key
linchpins in that endeavor.  And, of course, different alternatives
will benefit different segments of the industry.

<<<<<<<<<<<<<<<<

So we have at least three camps!  Well, what if all three can live
together under the one roof?  And the fourth 'grizzly bear' called
Internet EC can also be made a player too?

Enter 4 Tier EDI.  Here's a picture of this, and here's how it works.

Layer 1 -       Traditional EDI                 |
-------------------------------                 |
Layer 2 -       Rule Based EDI/EC       |
-------------------------------                 |
Layer 3 -       Process Based EDI       |
-------------------------------                 |
Layer 4 -       Object Based EDI        |
-------------------------------                 V

Now, what this means is that your total EDI message can consist of
some or all of these components AS YOUR BUSINESS NEEDS require.

Layer 2 is absolutely the KEY LAYER.  (Layer 3 and Layer 4 are
in fact implemented and done with the tools in Layer 2).

Layer 2 supports both XML and Java as the means to define
your complete EDI message, including the data.

You can either embed an EDI message itself using the
standard HTML comment token, i.e. :

<BODY>
<P> This is just some text </P>
<BR>
<! Here comes my EDI message /!>
<!
ST*323*712990413
V1*7039610*NEW ZEALAND QUEEN*D*104N*SCAC***L
LS*0100
R4*D*D*JAX*JACKSONVILLE FL****
V9*EAD**920819**JACKSONVILLE FL***A26
LE*0100
SE*25*712990413
/!>
</BODY>

Or, you can use the newer HTML/XML methods of identifying
data fields and their content within your business forms.

What is more, HTML already has a convenient 'Process' level mechanism -
the URL to the next, or previous linked form, and the POST/SEND
mechanism as a way of telling you where the message came from, plus
status information.  XML of course allows you to roll your own more
extensive features.  XML also allows you to transport binary
information, and is also fully multi-lingual compliant.

OK - so you get the picture.  The ability to define your own
message sets, structure, rules, objects, whatever grabs your
fancy.  And then send this to your nearest neighbour via
Web Server technology.

Also, LAYERS, so that if an Object Orientated approach is meaningless for
your business needs you DO NOT need to use it, or burden your
messaging with any overhead or complexity associated with OO needs.

By defining Layers 3 and 4 using Layer 2, you allow people to choose to
what
level they wish to use each messaging component.

So - how does DISA get into the middle of all this?

Two ways.  First XML is a virgin territory, therefore DISA
can step in and define XML components and standards for
enclosing EDI fields, and mapping Traditional EDI to XML/HTML.

Then DISA can define process components in XML that facilitate EDI, that
for example an applet or object called:  date_format(), or
entity_characteristics(), and so on, that reference the existing EDI
data entity rules that have been loving crafted over the last twenty
years.

This means DISA can publish a CDROM of XML/Java components that
describe and reference existing EDI entities.  This will make
everyones life easier, and speed implementation of EC.

How?  Because right now, programmers are using Sun's Java library as the
next best thing, plus whatever they can grab off the Java/JavaScript
language sites of 'canned' code.  I.e.  I need to check two dates are
valid in my Web form, and so I hook in either an Applet, or JavaScript
module that does this, pass the applets dates as parameters, and presto,
my dates are OK.  Well, instead, on the CDROM is a nice set of routines
called Java_valid_EDI_date(), and XML_valid_EDI_date(), that do this for
me, but now I know my dates are also fully EDI compliant.  And so on.

Tool vendors can build plug-ins to Web Browsers that automatically
associate these kinds of properties to EDI compliant fields.

Also the Web Servers provide services such as Transaction Logging (alot
of them can do that now) and of course message routing to different
backend servers based off content and addressing, security, database
interfacing.  In short everything one would expect from a mature
communications server platform.

The second piece is then obvious for DISA.  Having stepped into the
middle of the EC process by extending XML, and migrating EDI standards
over to this transport layer, DISA then has an on going role.  Providing
both support for this, and also maintaining products such as the
Universal Entity Dictionary, and also defining new methods, processes,
and objects that are specific to EDI for use with XML/Java.

OK, I hope this is fairly clear. 4 Tier EDI, founded on merging EDI
formatting and transport methods with EC Web based methods, where DISA
provides the lead in this, and then maintains the standards and
certifies vendors products as being compliant, et al.  A migration to
Web EC based messaging standards as the foundation for future EDI.

Understandably there is a section of the existing DISA membership that
has to be resistant to this, because their entrenched commercial position
is exposed to newer Web vendors and a completely different business
and trading model.

A brief consideration of the alternatives however should bring everyone
into focus.  This is not really a choice!  If Microsoft and Netscape
saw a role for DISA in the future of EC they would already be very
active members of this forum.

As they are NOT, I can only conclude that DISA needs to quickly make
itself part of the mainstream EC agenda, before it vanishes into the
mists of time.

Otherwise I foresee that Microsoft will shortly be hosting its on
Electronic Commerce Symposium for EC Business Partners, and signing
up vendors to support its MMSP (Microsoft Messaging Standards Protocol)
that it built-in to the Microsoft Web Server engines and Browsers.

Certainly one EDI stalwart has already made the move, two months ago
"EDI World" magazine was renamed "Electronic Commerce International".

=========================================================================

David Webber.

p.s. ----------  I shall also be cross posting this to EDI-L, as the
broader issues
                       fall into their bag.

p.p.s. -------

Just went up on the Microsoft Site to verify some details.

Their product is MICS, now (changed from MCIS), and they have a 570K
White Paper you can download.  Microsoft Internet Commerce Server.

One paragraph in it says "but this does not mean the end of traditional
EDI".

Yeah, right, one paragraph in 570k, and no mention of why not, or how to
link the two!  Excuse me for not believing that for one second, and for
believing that really
Microsoft is now setting the agenda.

The URL is : http://www.microsoft.com/commerce/whitepaper.htm

David Webber.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed Jun  4 23:51:09 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:54 2004
Subject: XML and EDI
Message-ID: <7596@ursus.demon.co.uk>

In message <199706042120.RAA12840@smtp2.erols.com> "Peat" writes:
> I wanted to share this with you. It is from David Weber posted early May on
> the EDI-L mailing list.
> 
> - Bruce Peat

Thanks for your posting.  I don't want to sound negative, but pieces like this
need some tailoring before posting to this list, which is aimed at developers.
It's important that general news stories do not get posted to XML-DEV - if
RobinC doesn't pick them up on www.sil.org, then they either to go c.t.s. or 
you make stronger bids for comp.text.xml.

A cursory reading (I find the metaphors tough going :-) suggests that EDI 
(about which I know nothing :-) can provide information objects for direct
realisation in XML (?and Java), and the piece could perhaps could have been 
condensed to show this.

	P.
	

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jimg at digitalthink.com  Thu Jun  5 01:46:11 1997
From: jimg at digitalthink.com (Jim Gindling)
Date: Mon Jun  7 16:57:54 2004
Subject: (NXP)/Java/XML parser : Passing Error Information
In-Reply-To: 
 <Pine.OSF.3.93.970604165310.27204A-100000@edusrv.edu.uni-klu.ac.at>
Message-ID: <l03010d0aafbbad0926f7@[207.171.223.56]>

Norbert,

I am very happy to hear that you are thinking about adding better error
notification to NXP.  I like your suggestion of adding callback methods to
the "Esis" interface for error notification.  However, I would really like
the interfaces defined such that we can throw exceptions from these
notification methods, which would then bubble up to the place where the
parser is invoked.

For example, I suggest methods similar to the following be added to the
"Esis" interface:
	public void onWarning(ErrorInfo pInfo) throws ParseException

Then modify XMLParser.startParsing() so that it also throws ParseException.

ErrorInfo should contain all relevant information, such as file name, line
number, column number, ...  Maybe it makes sense to have a hierarchy of
ErrorInfo classes for different types of errors.

Just my 2 cents.

Thanks again for making such a wonderful tool available to the XML community.


Jim


>NXP, as of today, simply prints error messages to Stderr.
>This is fine for now, but it is certainly not the best
>way to do things.
>
>There was a suggestion made to me, to throw
>an exception, but I think exceptions are not
>the best solution as recovery from them is practically
>not possible (From the level of the application
>programm)
>
>To my understanding there are several classes
>of errors that can be passed along
>
>1.) Warnings
>2.) WF violations
>3.) Violations with respect to the DTD
>4.) In general these errors that are reportable - if the user wishes
>
>Should they be handled differently ?
>
>I was thinking in terms of "callback" functions. Like
>I do it right now with the "Esis" interface.
>
>How would you, as the user/developer community envision
>handling this.
>
>What information would you like to have passed along
>to an application ? Error code, textual description (what
>about localization ..).
>
>Best regards,
>Norbert H. Mikula
>
>=====================================================
>= SGML, XML, DSSSL, Intra- & Internet, AI, Java
>=====================================================
>= mailto:nmikula@edu.uni-klu.ac.at
>= http://www.edu.uni-klu.ac.at/~nmikula
>=====================================================
>
>
>xml-dev: A list for W3C XML Developers
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To unsubscribe, send to majordomo@ic.ac.uk the following message;
>unsubscribe xml-dev
>List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From srn at techno.com  Thu Jun  5 17:48:34 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:57:54 2004
Subject: XML and EDI
In-Reply-To: <199706042120.RAA12840@smtp2.erols.com> (peat@erols.com)
Message-ID: <199706051543.LAA02284@bruno.techno.com>

Dear Developers of XML :

XML really, really needs notation data attributes.  Without them, you
can't do object inheritance from architecture (DTD) to architecture
(document, whether it has its own DTD or not).  An inheritable
architecture is, in fact, a notation.  We already support notations in
XML.  What we don't have is the ability to declare the mappings
between the inherited architecture's objects (elements and attributes)
and the document's objects (elements and attributes).  For that, we
need notation data attributes.  It's a small thing, really, but, wow,
what a difference it makes!

The usefulness of inheritance for all kinds of purposes (and not least
for EDI) is too great to ignore; it is one of the most useful and
attractive aspects of SGML.  There is no good reason not to do it in
XML.  So, how about it, ERB?

For a discussion of why architectural inheritability is overwhelmingly
important, you may want to read my (now slightly dated) paper, "SGML
Architectures: Implications and Opportunities for Industry" at
http://www.techno.com/sgmlarchitecture.html.

Best regards,

--Steve

             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jeanpa at microsoft.com  Sat Jun  7 05:50:11 1997
From: jeanpa at microsoft.com (Jean Paoli)
Date: Mon Jun  7 16:57:55 2004
Subject: Microsoft XML Parser in Java is Available
Message-ID: <78DFE33066ABD0118B9200805FD431BA5EC1EB@RED-16-MSG.dns.microsoft.com>

ANNOUNCEMENT: Microsoft XML Parser in Java is Available

I am *really* pleased to announce :

The XML Parser in Java (MSXML) from Microsoft Corporation is now
available for download from:
		http://www.microsoft.com/standards/xml/xmlparse.htm

This is the second piece of XML technology from Microsoft, the first
being 
the Channel Definition Format support in Internet Explorer 4.0.

The Microsoft XML Parser is a validating XML parser written in Java. 
Once parsed, the XML document is exposed as a tree through a simple set
of Java methods. 
We are actively working with the W3C to standardize an XML API
(See the W3C overview page for the Document Object Model
http://www.w3.org/MarkUp/DOM/. 
The DSSSL/grove Object Model is carefully studied by the DOM group).

These methods support reading and/or writing XML structures, such as the
Channel Definition Format (CDF) or other text formats based on XML. 

This version (Alpha 1.0) of the parser implements the W3C working draft
of the XML specification dated March 31, 1997
(http://www.w3.org/TR/WD-xml-961114.html) and will be revised to reflect
future W3C changes to the specifications.

The following components of the XML spec have not yet been implemented
(but will be soon) : 

		*	XML-SPACE (for control over white space
handling)
		*	XML encoding declaration (<?XML
ENCODING='EUC-JIS' ?>)
		*	Conditional sections in the DTD (INCLUDE &
IGNORE keywords)
		*	Required Markup Declaration 'RMD'

Full source code is provided, royalty free, and will be updated
frequently to fix bugs 
and to reflect future W3C changes to the specifications.(read the
Microsoft XML Parser in Java license agreement
http://www.microsoft.com/standards/xml/xmllic.htm).

Bugs should be sent to Istvan Cseri (istvanc@microsoft.com) or Chris
Lovett (clovett@microsoft.com).

Enjoy, and let us make XML a success story!

-Jean Paoli
jeanpa@microsoft.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat Jun  7 07:16:09 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:55 2004
Subject: Microsoft XML Parser in Java is Available
Message-ID: <7729@ursus.demon.co.uk>

Jean

In message <78DFE33066ABD0118B9200805FD431BA5EC1EB@RED-16-MSG.dns.microsoft.com> Jean Paoli writes:
> ANNOUNCEMENT: Microsoft XML Parser in Java is Available
> 
> I am *really* pleased to announce :

I am *really* pleased to read your announcement! (and am replying even
before downloading your parser).  This will be a tremendous boost towards
an API for XML-* modules and their interoperation.  I shan't go back to
bed until I have looked at it!
> 
[...]
> Full source code is provided, royalty free, and will be updated
> frequently to fix bugs 
> and to reflect future W3C changes to the specifications.(read the

This is a very constructive and laudable approach.

> Microsoft XML Parser in Java license agreement
> http://www.microsoft.com/standards/xml/xmllic.htm).
> 
> Bugs should be sent to Istvan Cseri (istvanc@microsoft.com) or Chris
> Lovett (clovett@microsoft.com).
> 
> Enjoy, and let us make XML a success story!

I am very pleased that this has been announced on XML-DEV as it encourages
us all to promote an open approach to software development.

[...]

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tallen at sonic.net  Sat Jun  7 20:46:57 1997
From: tallen at sonic.net (Terry Allen)
Date: Mon Jun  7 16:57:55 2004
Subject: MSXML, WF, and Validity
Message-ID: <199706071847.LAA31370@bolt.sonic.net>


Jean Paoli wrote:
| The Microsoft XML Parser is a validating XML parser written in Java. 
| Once parsed, the XML document is exposed as a tree through a simple set
| of Java methods. 

After playing with it for awhile this morning I found myself wondering
about WF and validity; I don't know if the following counts as a bug,
but it would be useful to hear what other think.

My input is:

<?XML version="1.0" encoding="UTF-8" ?>
<!doctype book [
<!element book (title, chapter+)>
<!entity foo "bar">
]>
<book><title>Palmy Days</title>
<chapter><title>One Frond at a Time</title>
<para>It was a dark and stormy night.  The crows clattered
amongst the fronds.  
</para>
<para>&foo;</para>
</chapter>
</book>

I stuck the DTD in the internal subset because I couldn't get the
parser to find an external DTD.  The output of 

  jview msxml -d palmy

is

<?XML VERSION="1.0" ENCODING="UTF-8"?>
<!DOCTYPE BOOK [
    <!ENTITY foo 'bar'>
    <!ELEMENT BOOK (TITLE,(CHAPTER,CHAPTER*))>
]>
<BOOK>
    <TITLE>
        Palmy Days
    </TITLE>
    <CHAPTER>
        <TITLE>
            One Frond at a Time
        </TITLE>
        <PARA>
            It was a dark and stormy night. The crows clattered amongst the fronds.
        </PARA>
        <PARA>
            bar
        </PARA>
    </CHAPTER>
</BOOK>

Now the declarations in the internal subset have been read (and munged),
and the foo:bar entity expansion has been performed.  Yet the instance
does not conform to the "DTD" in the internal subset, although taken
on its own it is well formed.  Is the input file "palmy" a valid
XML document?  The VC comment following [36] indicates not.  Is it
WF?  I can't find a WF comment indicating that the document must
conform to the DTD (which is reasonable, although perhaps this point
should be covered explictly).  Is MSXML only parsing "palmy" as WF?
If not, is this error recovery?

These (real, not rhetorical) questions are of interest whether or
not this is the intended behavior of MSXML.


Regards,

  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
                   http://www.sonic.net/~tallen/
    Davenport and DocBook:  http://www.ora.com/davenport/index.html
          T.A. at Passage Systems:  terry.allen[at]passage.com 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Jun  8 20:33:05 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:55 2004
Subject: Microsoft XML Parser in Java is Available
Message-ID: <7758@ursus.demon.co.uk>

There are a few minor tweaks required to run or compile MSXML in a Solaris
environment - I have posted these to JeanP.  [The filenames need to be case
sensitive and correspond the class names; the JDK is stricter on casting; and
it also requires the constants to be declared before use.]  I'd be grateful for
any pointers on Java portability, and it's a good place to re-emphasise the 
value of test data.


I've been porting JUMBO to run under J++, and running into a number of problems
that don't arise in W95 browsers.  These primarily include the use of '/' or 
'\' in addressing files, but I also have a feeling that some static 
initialisation may occur differently.  Any pointers to experience on this
or WWW pages would be valuable.

The '/' problem causes me some confusion.  When addressing a File, I
appear to end up with constructs like:
	URL context;
...
	URL u = new URL(context, "jumbo.gif");
I find I have to replace it with 
	URL u = new URL(context+File.separator+"jumbo.gif");
to get it working under J++.  The question as to when separators are governed
by URL syntax, and when by file syntax is a difficult borderline.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Jun  8 20:33:14 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:55 2004
Subject: MSXML, WF, and Validity
Message-ID: <7760@ursus.demon.co.uk>

In message <199706071847.LAA31370@bolt.sonic.net> Terry Allen writes:
> 
> Jean Paoli wrote:
> | The Microsoft XML Parser is a validating XML parser written in Java. 
> | Once parsed, the XML document is exposed as a tree through a simple set
> | of Java methods. 
> 
> After playing with it for awhile this morning I found myself wondering
> about WF and validity; I don't know if the following counts as a bug,
> but it would be useful to hear what other think.

I have worried about this as well - I may have mentioned it on the XML-WG.
I don't think it's a bug, but rather that the spec does not give a clear 
guideline on *when* validation is expected.  I am sure some ERB members will
see this discussion.

> 
> My input is:
> 
> <?XML version="1.0" encoding="UTF-8" ?>
> <!doctype book [
> <!element book (title, chapter+)>
> <!entity foo "bar">
> ]>
> <book><title>Palmy Days</title>
> <chapter><title>One Frond at a Time</title>
> <para>It was a dark and stormy night.  The crows clattered
> amongst the fronds.  
> </para>
> <para>&foo;</para>
> </chapter>
> </book>

IMO this is a WF document, but not a valid one.

> 
> I stuck the DTD in the internal subset because I couldn't get the
> parser to find an external DTD.  The output of 
> 
>   jview msxml -d palmy
> 
> is
[... normalised expanded prettyprinted output deleted...]

> 
> Now the declarations in the internal subset have been read (and munged),
> and the foo:bar entity expansion has been performed.  Yet the instance
> does not conform to the "DTD" in the internal subset, although taken
> on its own it is well formed.  Is the input file "palmy" a valid
> XML document?  The VC comment following [36] indicates not.  Is it
> WF?  I can't find a WF comment indicating that the document must
  ^^^
It's certainly WF as far as I see it.

> conform to the DTD (which is reasonable, although perhaps this point
> should be covered explictly).  Is MSXML only parsing "palmy" as WF?
> If not, is this error recovery?
> 
> These (real, not rhetorical) questions are of interest whether or
> not this is the intended behavior of MSXML.
> 
My view is based on Norbert's NXP which has a commandline switch -v
(i.e. require validation).  This is run clientside.  IOW if the document
above had been run through NXP it would have passed it as WF, but failed it
IFF the -v flag was set.

There are three possible places to request validation:
	- at author level (i.e. some instruction in the document stating that
		the document is validatable.  The ERB may wish to include this
		as a component in the XMLDecl or RMDecl (or elsewhere)
	- at human client level (e.g. -v in NXP)
	- at software/application level (i.e. this software will ONLY work
		with valid documents

Note that an internal subset may be present for other reasons than validation
(adding attribute values and types, as required for XML-LINK, for example).
Therefore I do not think the author's intentions can be deduced from the
presence of an internal subset.  Presumably a pointer (SYSTEM) to an
external DTD is likely to refer to a DTD which can be used for validation, but
I'm not sure whether this is explicit.

In summary I think that MSXML is capable of validation - I'm not clear whether
it *always* tries to validate, and if it can't decides simply to check for WF.
I think we need guidance on this.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tallen at sonic.net  Sun Jun  8 22:17:06 1997
From: tallen at sonic.net (Terry Allen)
Date: Mon Jun  7 16:57:55 2004
Subject: Re WF, V, and MSXML
Message-ID: <199706081956.MAA31488@bolt.sonic.net>


Peter Murray-Rust wrote:
>Note that an internal subset may be present for other reasons than validation
(adding attribute values and types, as required for XML-LINK, for example).
Therefore I do not think the author's intentions can be deduced from the
presence of an internal subset.  Presumably a pointer (SYSTEM) to an
external DTD is likely to refer to a DTD which can be used for validation, but
I'm not sure whether this is explicit.

Yes, I think there is a somewhat different information model in XML
than in SGML, and this parser (whether it's doing all the right things
or not) is useful for learning and thinking about the differences.
I, too, think that my "palmy" input document is invalid but WF.  Thus,
if MSXML is parsing to validate, it is (due to a bug or two) doing
error recovery (and should be fixed on this point not to do so).

I can also see some gotchas for early adopters, such as that a WF document 
that makes reference to the wrong DTD is still WF.  And the WF-parser will 
check the WFness of the element declarations (even in the right DTD) even 
if it isn't going to use them, at least in the internal subset.  Also, the 
internal subset is part of the XML document, and, as the spec is 
written, the parser must parse the subset and deliver it as part of 
the output (as MSXML does), even though the same is not true of an 
external subset.  (Right?)

Doesn't it seem as though the reasons for conveying the internal
subset information to the application (such as those you mention) 
are also reasons for extracting the same information from the external 
subset and conveying it to the application, too?  whether the document 
is dealt with as WF or not?

IOW, an SGML parser such as nsgmls combines both subsets
into a DTD and deals with information following as another unit,
the "document instance set" (if I have the terminology right, per
8879 production 2), which is the part of an SGML document entity
*following* the prologue. 

But for an XML parser, the boundaries are shifted, because
it has to deal with an XML document that *includes* the prologue
(XMLlang production 23, where "element" corresponds to the SGML 
"document instance set", I think).  I don't know whether this is a good 
idea or not, just trying to understand it as an early adopter.

(I also notice now that per productions 23 and 27, white space
after the end of the end-tag of the root element is also part
of the document, which is okay by me; but this seems 
not to be dealt with explicitly s.v. 2.8, "White Space Handling." 
I read that section to mean that such white space must be passed
to the application by a WF-parser [the language referring to
"processors which ... read the DTD" or not should be changed,
because, as we see, a WF parser must read at least the internal
subset part of the DTD], whereas a validating parser must not
pass such white space to the application.)


Regards,

  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
                   http://www.sonic.net/~tallen/
    Davenport and DocBook:  http://www.ora.com/davenport/index.html
          T.A. at Passage Systems:  terry.allen[at]passage.com 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Jun  8 23:32:38 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:55 2004
Subject: Re WF, V, and MSXML
Message-ID: <7764@ursus.demon.co.uk>

In message <199706081956.MAA31488@bolt.sonic.net> Terry Allen writes:
[...]
> 
> Yes, I think there is a somewhat different information model in XML
> than in SGML, and this parser (whether it's doing all the right things
> or not) is useful for learning and thinking about the differences.

My problem is more basic - I don't think that there are (yet) 'right and
wrong things'.  That is why I have been so keen on implementation, because it's
only when we get to this stage that the problems of the WF/V boundary come out.

> I, too, think that my "palmy" input document is invalid but WF.  Thus,
> if MSXML is parsing to validate, it is (due to a bug or two) doing
> error recovery (and should be fixed on this point not to do so).

I think this is more a question of terminology.  NXP (Norbert Mikula) is a
'validating parser', but the validation can be switched off.  This is a
client-side decision.  So with NXP 'palmy' could be either invalid or WF
according to the reader's wishes
> 
> I can also see some gotchas for early adopters, such as that a WF document 
> that makes reference to the wrong DTD is still WF.  And the WF-parser will 
                              ^^^^^^^^^^^^^^^^^^^^^
I'd agree with this, and I don't necessarily think it's wrong until the ERB
tells us it is.  Its validity is presently decidable from axioms and we are
waiting for the ERB to think about the problem.

> check the WFness of the element declarations (even in the right DTD) even 
> if it isn't going to use them, at least in the internal subset.  Also, the 
> internal subset is part of the XML document, and, as the spec is 
> written, the parser must parse the subset and deliver it as part of 
> the output (as MSXML does), even though the same is not true of an 
> external subset.  (Right?)

I don't think so.  My formal reading of the spec is that no 'output' is
defined.  [After all, processing of an XML document can be done by a human
reader :-)].  I think the ERB has been careful to say nothing about output,
implementation, APIs, etc.  My own view has been that the scope for
confusion has been sufficient (as in the present case) that guidance is 
important.  At present we do not know what documents are validatable, what
the validity criterion can be computed to be, etc.

Note that NXP and Lark do not have 'outputs', they have APIs.  NXP allows
the programmer to subclass at the Esis level, whilst lark provides a
tree of Elements.  Neither passes any DTD information.  In Lark I suspect this
is discarded - in NXP it is requires a bit of digging to extract.  NSXML comes
closer to delivering the whole grove, I think.  (It subclasses PIs and DOCTYPE 
from Element).

> 
> Doesn't it seem as though the reasons for conveying the internal
> subset information to the application (such as those you mention) 
> are also reasons for extracting the same information from the external 
> subset and conveying it to the application, too?  whether the document 
> is dealt with as WF or not?

Again, the spec (and the ERB) are unclear about conveying this information
to the application at all.

> 
> IOW, an SGML parser such as nsgmls combines both subsets
> into a DTD and deals with information following as another unit,
> the "document instance set" (if I have the terminology right, per
> 8879 production 2), which is the part of an SGML document entity
> *following* the prologue. 

nsgmls attempts to validate *every* document it receives.  XML parsers need
not.  It's not clear whether an XML parser can insist on validating every 
document.  [The spec says nothing about *parsers* - agina I have been asking
for more concrete terminology than 'processor'].
> 
> But for an XML parser, the boundaries are shifted, because
> it has to deal with an XML document that *includes* the prologue
> (XMLlang production 23, where "element" corresponds to the SGML 
> "document instance set", I think).  I don't know whether this is a good 
> idea or not, just trying to understand it as an early adopter.
> 
> (I also notice now that per productions 23 and 27, white space
> after the end of the end-tag of the root element is also part
> of the document, which is okay by me; but this seems 
> not to be dealt with explicitly s.v. 2.8, "White Space Handling." 
> I read that section to mean that such white space must be passed
> to the application by a WF-parser [the language referring to
> "processors which ... read the DTD" or not should be changed,
> because, as we see, a WF parser must read at least the internal

I am actually unclear whether a WF-only parser (e.g. Lark) has to read the
internal subset at all, other than skipping to the ']>' at the end.  If it 
*does* read and parse it, what does it do with the information.  For example,
what is the implied structure of the document in:

<!DOCTYPE FOO [
<!ATTLIST FOO XML-LINK CDATA #FIXED "SIMPLE">
]>
<FOO HREF="bar"/>

Can we assume that FOO (which has no Element declaration) has an ATTLIST as
given, and that therefore it inherits the SHOW and ACTUATE attributes?
IOW *must* a parser decorate all matching elements with the ATTLISTS in the 
internal subset?


> subset part of the DTD], whereas a validating parser must not
> pass such white space to the application.)

My confusion on this issue is well publicised :-)

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tallen at sonic.net  Mon Jun  9 01:54:56 1997
From: tallen at sonic.net (Terry Allen)
Date: Mon Jun  7 16:57:55 2004
Subject: (correction) Matching is defined
Message-ID: <199706082348.QAA08892@bolt.sonic.net>


"Match" is defined in the Terminology section, 1.3, contrary to what
I wrote.  "A string matches a grammatical production if it
belongs to the language generated by that production."  So if
"the l g by that p" means that you expand all the tokens it
contains, and an XML document is a string, then WFness applies
to the internals of prolog.  Perhaps a clause here to deal
specifically with documents would be a good idea.

Regards,

  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
                   http://www.sonic.net/~tallen/
    Davenport and DocBook:  http://www.ora.com/davenport/index.html
          T.A. at Passage Systems:  terry.allen[at]passage.com 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tallen at sonic.net  Mon Jun  9 01:55:19 1997
From: tallen at sonic.net (Terry Allen)
Date: Mon Jun  7 16:57:55 2004
Subject: Re WF, V, and MSXML
Message-ID: <199706082339.QAA08654@bolt.sonic.net>


Peter Murray-Rust replying to me to him etc.
[Terry:]
| > Yes, I think there is a somewhat different information model in XML
| > than in SGML, and this parser (whether it's doing all the right things
| > or not) is useful for learning and thinking about the differences.
| 
| My problem is more basic - I don't think that there are (yet) 'right and
| wrong things'.  That is why I have been so keen on implementation, because it's
| only when we get to this stage that the problems of the WF/V boundary come out.

Right.  That's why the IETF assigns such importance to running code.

| > I, too, think that my "palmy" input document is invalid but WF.  Thus,
| > if MSXML is parsing to validate, it is (due to a bug or two) doing
| > error recovery (and should be fixed on this point not to do so).
| 
| I think this is more a question of terminology.  NXP (Norbert Mikula) is a
| 'validating parser', but the validation can be switched off.  This is a
| client-side decision.  So with NXP 'palmy' could be either invalid or WF
| according to the reader's wishes

Agreed, but from the viewpoint of the document preparer, it is both.  MSXML
needs the switch NXP has.  I think the behavior is unintentional, but
I would be alarmed at a processor/parser (they mean the same to me in
this context) that attempted to parse for validity, and if it found
an error, silently switched to WF-parse mode.

| > I can also see some gotchas for early adopters, such as that a WF document 
| > that makes reference to the wrong DTD is still WF.  And the WF-parser will 
|                               ^^^^^^^^^^^^^^^^^^^^^
| I'd agree with this, and I don't necessarily think it's wrong until the ERB
| tells us it is.  Its validity is presently decidable from axioms and we are
| waiting for the ERB to think about the problem.

Agreed, it's just something to watch out for and perhaps to guard against
(by not reusing entity names in different DTDs, etc.)

| > check the WFness of the element declarations (even in the right DTD) even 
| > if it isn't going to use them, at least in the internal subset.  Also, the 
| > internal subset is part of the XML document, and, as the spec is 
| > written, the parser must parse the subset and deliver it as part of 
| > the output (as MSXML does), even though the same is not true of an 
| > external subset.  (Right?)
| 
| I don't think so.  My formal reading of the spec is that no 'output' is
| defined.  [After all, processing of an XML document can be done by a human
| reader :-)].  I think the ERB has been careful to say nothing about output,
| implementation, APIs, etc.  My own view has been that the scope for
| confusion has been sufficient (as in the present case) that guidance is 
| important.  At present we do not know what documents are validatable, what
| the validity criterion can be computed to be, etc.

Point taken; but the spec is not entirely clean on this point.  If the
application requests the processor to process, the processor must
inform the application of certain things.  And it is hard to get
around

"*An XML processor which does not read the DTD must always pass all 
characters in a document that are not markup through to the application.* 
An XML processor which does read the DTD must always pass all characters 
in mixed co ntent that are not markup through to the application. It may 
also choose to pass white space ocurring in element content to the 
application; if it does so, it must signal to the application that ..."
		[2.8, truncated para, emphasis added]

| Note that NXP and Lark do not have 'outputs', they have APIs.  NXP allows
| the programmer to subclass at the Esis level, whilst lark provides a
| tree of Elements.  Neither passes any DTD information.  In Lark I suspect this
| is discarded - in NXP it is requires a bit of digging to extract.  NSXML comes
| closer to delivering the whole grove, I think.  (It subclasses PIs and DOCTYPE 
| from Element).

Right.  My problem as a document preparer is that I don't know what
an application may request the processor to do, so I must guard against
any kind of failure.

 ...

| > IOW, an SGML parser such as nsgmls combines both subsets
| > into a DTD and deals with information following as another unit,
| > the "document instance set" (if I have the terminology right, per
| > 8879 production 2), which is the part of an SGML document entity
| > *following* the prologue. 
| 
| nsgmls attempts to validate *every* document it receives.  XML parsers need
| not.  It's not clear whether an XML parser can insist on validating every 
| document.  [The spec says nothing about *parsers* - agina I have been asking
| for more concrete terminology than 'processor'].
|
| > But for an XML parser, the boundaries are shifted, because
| > it has to deal with an XML document that *includes* the prologue
| > (XMLlang production 23, where "element" corresponds to the SGML 
| > "document instance set", I think).  I don't know whether this is a good 
| > idea or not, just trying to understand it as an early adopter.
...
| I am actually unclear whether a WF-only parser (e.g. Lark) has to read the
| internal subset at all, other than skipping to the ']>' at the end.  If it 
| *does* read and parse it, what does it do with the information.  For example,

The soft spot here is the first line of 2.2, where "match" is not
defined except that later in that section it "implies" a few things,
which are not apparently meant to be a complete set.  What the
WF document matches is production 23, Prolog element Misc*.  As
the processor attempting to determine WFness must look inside element to 
determine WFness, presumably the same is true of prolog.

 ... unless I determine WFness by *parsing* with a *real parser* which
the processor is not meant to be ...

| what is the implied structure of the document in:
| 
| <!DOCTYPE FOO [
| <!ATTLIST FOO XML-LINK CDATA #FIXED "SIMPLE">
| ]>
| <FOO HREF="bar"/>
| 
| Can we assume that FOO (which has no Element declaration) has an ATTLIST as
| given, and that therefore it inherits the SHOW and ACTUATE attributes?
| IOW *must* a parser decorate all matching elements with the ATTLISTS in the 
| internal subset?

No, not per XMLlang alone.  FOO's only declared attribute has as its name
the unreserved string "XML-LINK" although it uses an undeclared attribute
name "HREF".  So it is WF but not valid.

As for whether you can have attlists without element decls, 
the 2nd sentence following production 47 (emended for entity>element)
reads "At user option, an XML processor may issue a warning
if attributes are declared for an [element] type not itself
declared, but this is not an error", so the document is still WF
but not valid per XMLlang alone.

Were the XMLlink spec to contain language such that the processor is 
supposed to go out and fetch the attribute declarations implied 
by the use of the FIXED attribute (implied by the XMLlink spec, that 
is), then the document shown is not only WF but perhaps even valid!
But it doesn't, and barely talks of validity and processing
by a *processor*.

That's my take, anyway.  Maybe the SGML ERB will want to revise
the language about validity in XMLlang, or create new concepts
of validity in XMLlink.  


Regards,

  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
                   http://www.sonic.net/~tallen/
    Davenport and DocBook:  http://www.ora.com/davenport/index.html
          T.A. at Passage Systems:  terry.allen[at]passage.com 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Mon Jun  9 11:30:07 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:55 2004
Subject: Re WF, V, and MSXML
Message-ID: <7771@ursus.demon.co.uk>

In message <199706082339.QAA08654@bolt.sonic.net> Terry Allen writes:
> 
> Peter Murray-Rust replying to me to him etc.
[... and hoping the WG/ERB are reading this ...]
> [Terry:]
[...]
> 
> Right.  That's why the IETF assigns such importance to running code.

Good point.  That is why XML-DEV is important and why we need people to
create prototypes at this stage.  [Most XML-related software and documents
come into this category because the problems we are encountering may have 
implications on the language.]

[...]
> | 
> | I think this is more a question of terminology.  NXP (Norbert Mikula) is a
> | 'validating parser', but the validation can be switched off.  This is a
> | client-side decision.  So with NXP 'palmy' could be either invalid or WF
> | according to the reader's wishes
> 
> Agreed, but from the viewpoint of the document preparer, it is both.  MSXML
> needs the switch NXP has.  I think the behavior is unintentional, but
> I would be alarmed at a processor/parser (they mean the same to me in
> this context) that attempted to parse for validity, and if it found
> an error, silently switched to WF-parse mode.

I'd agree with this analysis, and haven't been silent on the issue.  IMO it 
is more important for the WG/ERB to address *this* problem than some of the 
proposed extensions.  The concept of WFness is NEW!!  It is more subtle than
people realise.  A fundamental problem is that there is no clear internal
flag in the document stating what the validity/WFness of the current document
is, is meant to be, was, etc.  As Terry says, it's particularly likely that
a WF document could (possibly erroneously) mutate into a valid one.  I am
sure that any confusion about MSXML is not intentional and is due to the issue
not be prominent in the spec.  

<PROPOSAL>
All parsers (i.e. tools that take XML documents and apply the criteria in 
XML-LANG only) should state their attitude and behaviour to WFness and validity.
</PROPOSAL>

The possible options include at least:
	- nsgmls-like.  Full validation is the only option.  Any non-valid
		dcoument is flagged and appropriate error messages or error
		action is initiated.  
	- Lark-like (at least V0.88 - I think there is another coming).  No
		validation can be attempted.  Any 'output' can only be WF or
		in error.  NOTE: what does Lark do with the internal subset?
	- NXP-like.  Validation can be switched on or off by the 'client'.
		How this is transmitted to the application is application
		dependent at present.
	- MSXML-like.  Undocumented at present.  Possibly [though Terry and I
		hope not] validating by default, and changing to WF if this
		fails.
> 
[...]
> Point taken; but the spec is not entirely clean on this point.  If the
> application requests the processor to process, the processor must
> inform the application of certain things.  And it is hard to get
> around
> 
> "*An XML processor which does not read the DTD must always pass all 
> characters in a document that are not markup through to the application.* 

Ah!  I had assumed the internal subset as 'markup' - you see it as part
of the document.  We need a ruling on this :-).  Obviously if the DTD appears
***in the processed document***, then it could be interpreted as having been
read and used for validation.

[...]
> 
> | what is the implied structure of the document in:
> | 
> | <!DOCTYPE FOO [
> | <!ATTLIST FOO XML-LINK CDATA #FIXED "SIMPLE">
> | ]>
> | <FOO HREF="bar"/>
> | 
> | Can we assume that FOO (which has no Element declaration) has an ATTLIST as
> | given, and that therefore it inherits the SHOW and ACTUATE attributes?
> | IOW *must* a parser decorate all matching elements with the ATTLISTS in the 
> | internal subset?
> 
> No, not per XMLlang alone.  FOO's only declared attribute has as its name

My mistake.  I shouldn't have brought the others in.

> the unreserved string "XML-LINK" although it uses an undeclared attribute
> name "HREF".  So it is WF but not valid.

Agreed.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From michael at textscience.com  Mon Jun  9 12:43:54 1997
From: michael at textscience.com (Michael Leventhal)
Date: Mon Jun  7 16:57:55 2004
Subject: Re WF, V, and MSXML
In-Reply-To: <7771@ursus.demon.co.uk>
Message-ID: <3.0.1.32.19970609184342.006aca40@aimnet.com>

At 10:00 AM 6/9/97 GMT, Peter Murray-Rust wrote:
>> I would be alarmed at a processor/parser (they mean the same to me in
>> this context) that attempted to parse for validity, and if it found
>> an error, silently switched to WF-parse mode.
>
>I'd agree with this analysis, and haven't been silent on the issue.  IMO it 
>is more important for the WG/ERB to address *this* problem than some of the 
>proposed extensions.  The concept of WFness is NEW!!  It is more subtle than
>people realise.  A fundamental problem is that there is no clear internal
>flag in the document stating what the validity/WFness of the current document
>is, is meant to be, was, etc.  As Terry says, it's particularly likely that
>a WF document could (possibly erroneously) mutate into a valid one.  I am
>sure that any confusion about MSXML is not intentional and is due to the
issue
>not be prominent in the spec.  

But "silently switching" is exactly the behavior that is wanted for most
output oriented operations, e.g., browsing.  WF is only new formally but
informally it has been the default mode of operation for HTML.  I don't
think a flag stating the intention of the author could ever be supposed
to actual represent the wishes of the current user of the document or that
we could expect the majority of users to understand the underlying concept.
It is up to the user of the tool to select the mode they want if a choice
exists.

Validate and switch to well-formed "silently" is a possible mode of operation.
But I agree on requesting that each application formerly state its possible
modes of operations.

>
><PROPOSAL>
>All parsers (i.e. tools that take XML documents and apply the criteria in 
>XML-LANG only) should state their attitude and behaviour to WFness and
validity.
></PROPOSAL>
>
>The possible options include at least:
>	- nsgmls-like.  Full validation is the only option.  Any non-valid
>		dcoument is flagged and appropriate error messages or error
>		action is initiated.  
>	- Lark-like (at least V0.88 - I think there is another coming).  No
>		validation can be attempted.  Any 'output' can only be WF or
>		in error.  NOTE: what does Lark do with the internal subset?
>	- NXP-like.  Validation can be switched on or off by the 'client'.
>		How this is transmitted to the application is application
>		dependent at present.
>	- MSXML-like.  Undocumented at present.  Possibly [though Terry and I
>		hope not] validating by default, and changing to WF if this
>		fails.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Mon Jun  9 12:51:24 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:55 2004
Subject: Re WF, V, and MSXML
In-Reply-To: <199706082339.QAA08654@bolt.sonic.net>
Message-ID: <rrW43CA4M9mzEwTG@light.demon.co.uk>

In message <199706082339.QAA08654@bolt.sonic.net>, Terry Allen
<tallen@sonic.net> writes
>|
>| > But for an XML parser, the boundaries are shifted, because
>| > it has to deal with an XML document that *includes* the prologue
>| > (XMLlang production 23, where "element" corresponds to the SGML 
>| > "document instance set", I think).  I don't know whether this is a good 
>| > idea or not, just trying to understand it as an early adopter.

I don't see _any_ difference between SGML and XML on this front.  SGML
parsers also have to deal with the prolog: the formal syntax of an "SGML
document entity" is:

        S
        SGML declaration,
        prolog,
        document instance set,
        'entity end' signal

(so in fact they also have to deal with the SGML declaration as well!)
The fact that the default ESIS output from the parser doesn't include
any DTD-related information shouldn't be taken to mean the parser hasn't
processed this information.

>| I am actually unclear whether a WF-only parser (e.g. Lark) has to read the
>| internal subset at all, other than skipping to the ']>' at the end.  If it 
>| *does* read and parse it, what does it do with the information.  For example,
>
>The soft spot here is the first line of 2.2, where "match" is not
>defined except that later in that section it "implies" a few things,
>which are not apparently meant to be a complete set.  What the
>WF document matches is production 23, Prolog element Misc*.  As
>the processor attempting to determine WFness must look inside element to 
>determine WFness, presumably the same is true of prolog.
>
> ... unless I determine WFness by *parsing* with a *real parser* which
>the processor is not meant to be ...

I would read the existing XML spec in a stricter spirit than you have
done.  To me, "match" means just that, i.e. that _if_ a WF document has
an internal or an external DTD, these should be parsed as though for a
valid XML document.  Any _syntactic_ errors in the DTD should be
flagged, even in 'WF' mode.  (Bear in mind that no-one is forcing WF
documents to have a DTD at all, except for entity declarations.)  If you
try to adopt a 'don't care' mode of parsing for the DTD when dealing
with WF documents, you probably create many more problems than you
solve.

The only difference is the use that is made of the DTD information: in a
WF document only the entity declarations matter to the parser.

>| what is the implied structure of the document in:
>| 
>| <!DOCTYPE FOO [
>| <!ATTLIST FOO XML-LINK CDATA #FIXED "SIMPLE">
>| ]>
>| <FOO HREF="bar"/>
>| 
>| Can we assume that FOO (which has no Element declaration) has an ATTLIST as
>| given, and that therefore it inherits the SHOW and ACTUATE attributes?
>| IOW *must* a parser decorate all matching elements with the ATTLISTS in the 
>| internal subset?
>
>No, not per XMLlang alone.  FOO's only declared attribute has as its name
>the unreserved string "XML-LINK" although it uses an undeclared attribute
>name "HREF".  So it is WF but not valid.

.. and since it is only well-formed and not valid, it cannot (in my
view) partake in any operations that require knowledge of <!ELEMENT or
<!ATTLIST declarations.  IOW, XML-LINK is not relevant to WF documents
...?

Richard Light
SGML and Museum Information Consultancy
richard@light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From cbullard at hiwaay.net  Mon Jun  9 13:23:41 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun  7 16:57:56 2004
Subject: Re WF, V, and MSXML
References: <7771@ursus.demon.co.uk>
Message-ID: <339BE74F.4015@hiwaay.net>

Peter Murray-Rust wrote:
> 
> IMO it
> is more important for the WG/ERB to address *this* problem than some of the
> proposed extensions. 

That is right.  Until the core is worked out and clearer, proposing 
extensions is premature.

> The concept of WFness is NEW!!  

No it is not.  This is the way that IADS and IDE/AS work today 
and have since 1990.  The question is, what does one do with 
the DTD.  In these products, parsing for the instance is 
internal to the product.  DTD-centric parsing is done in 
batch.  That may not be the solution people want, but it is 
one way.  Well-formedness also has precedents in Xerox systems 
of the period.  T'is new to thee, Miranda.

> <PROPOSAL>
> All parsers (i.e. tools that take XML documents and apply the criteria in
> XML-LANG only) should state their attitude and behaviour to WFness and validity.
> </PROPOSAL>

You mean you REALLY want interoperable tools?  How quaint.
 
> The possible options include at least:
>         - nsgmls-like.  Full validation is the only option.  Any non-valid
>                 dcoument is flagged and appropriate error messages or error
>                 action is initiated.

IOW, always parse using a DTD.  Does the presence of the DOCTYPE
indicate 
that one exists, and maybe, where to find it?  Is the presence of the 
DOCTYPE enough to tell the system that one ought to exist?  I don't want 
to always send a DTD. I do want to be able to use SGML techniques that 
worked in the past and still work sensibly.

>         - Lark-like (at least V0.88 - I think there is another coming).  No
>                 validation can be attempted.  Any 'output' can only be WF or
>                 in error.  NOTE: what does Lark do with the internal subset?

Nyet.

>         - NXP-like.  Validation can be switched on or off by the 'client'.
>                 How this is transmitted to the application is application
>                 dependent at present.

This is the best approach if the flag is clear to all.

>         - MSXML-like.  Undocumented at present.  Possibly [though Terry and I
>                 hope not] validating by default, and changing to WF if this
>                 fails.

Ok.

len

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Mon Jun  9 15:14:05 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:56 2004
Subject: Re WF, V, and MSXML
Message-ID: <3.0.32.19970609150843.00b53528@pop.intergate.bc.ca>

The fact that this debate can exist is kind of puzzling to me.  Check
out section 5, "Conformance".  A processor can either be validating
or non-validating.  At no point in the spec does anything say or suggest
that whether or not the processor validates has anything to do with
what is in the document being processed.  I haven't looked at MSXML
closely, but NXP's behavior is obviously correct in this respect - 
it validates or not at user request.

What am I missing? -Tim


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tallen at sonic.net  Mon Jun  9 17:08:23 1997
From: tallen at sonic.net (Terry Allen)
Date: Mon Jun  7 16:57:56 2004
Subject: Re WF, V, and MSXML  
Message-ID: <199706091508.IAA30900@bolt.sonic.net>


Peter Murray-Rust wrote re me:
| > Point taken; but the spec is not entirely clean on this point.  If the
| > application requests the processor to process, the processor must
| > inform the application of certain things.  And it is hard to get
| > around
| > 
| > "*An XML processor which does not read the DTD must always pass all 
| > characters in a document that are not markup through to the application.* 
| 
| Ah!  I had assumed the internal subset as 'markup' - you see it as part
| of the document.  We need a ruling on this :-).  Obviously if the DTD appears
| ***in the processed document***, then it could be interpreted as having been
| read and used for validation.

No, I agree it's markup; the quote is meant to establish the point that
the spec does talk about the processor sending stuff (output) to the
application (in response to your statement that the spec was neutral
on this issue).

Tim Bray asked, without specific context:
| 
| The fact that this debate can exist is kind of puzzling to me.  Check
| out section 5, "Conformance".  A processor can either be validating
| or non-validating.  At no point in the spec does anything say or suggest
| that whether or not the processor validates has anything to do with
| what is in the document being processed.  I haven't looked at MSXML
| closely, but NXP's behavior is obviously correct in this respect - 
| it validates or not at user request.
| 
| What am I missing? -Tim

Clarity in writing.  If a processor is nonvalidating, must it examine
the document for WFness?  may it?  may it not?

I understood (part of) what Peter and I were discussing to be whether and 
what the XMLlang spec requires a processor to send to an application, and
under what conditions.

MSXML sends a munged version of the infernal subset, which I first
thought must be required by the spec.  I now see it doesn't.  We
also pondered whether a processor that is nonvalidating must examine 
for WFness (a) the internal subset and, or, (b) the external subset.
I am pretty sure that (a) is required, but don't know about (b).
The spec speaks of processors that don't "read the DTD", yet the
internal subset is part of the DTD and apparently must "match" the
prolog production.  

I suggest that all passages mentioning "processors" and "DTDs" be
reviewed for consistency.  
 

Regards,
  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
                   http://www.sonic.net/~tallen/
    Davenport and DocBook:  http://www.ora.com/davenport/index.html
          T.A. at Passage Systems:  terry.allen[at]passage.com 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From clovett at microsoft.com  Tue Jun 10 04:08:09 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun  7 16:57:56 2004
Subject: WF, V, and MSXML
Message-ID: <41135C785691CF11B73B00805FD4D2D702A3F8A5@RED-17-MSG.dns.microsoft.com>


	> I stuck the DTD in the internal subset because I couldn't get
the parser 
	> to find an external DTD.  
		I think there's a problem with resolving relative URL's
when the XML file is local and this only happens 
	under certain versions of the Java VM.  I never have a problem
if the DTD is specified with a full URL.  A fix will be posted when one
is found. 

	> WF versus Validity...
		I agree with what seems to be a general consensis that
DTD compliance should be switchable.   Currently the MSXML parser
handles internal subsets the same as external DTD's, and we decided not
to try and do any error recovery, so it is possible that there are also
bugs in the MSXML validity code.  These will be fixed promptly.  

	> Outputting the internal subset...
		The thinking here is that the MSXML "Document" and
"Element" classes should represent a complete object model for tools and
applications that wish to manipulate XML documents, which means being
able to recreate a complete XML file after being manipulated.  This is
different from the traditional "filter" approach where the XML processor
is a one-way filter.  I think the "object model" approach is a good one
for the encouragement of an XML-based application development
environment.  It just so happens that the command line "msxml" tool that
we shipped with the parser (so people could easily play with the parser)
does a full dump of the XML document - which includes any internal
subset.  In fact, the Document class separates out the DTD from the XML
Data.  If you want to get the XML data only, call Document.getRoot.
Eventually if people want to build tools to manipulate the external DTD,
it should also be possible to re-publish that DTD using the Object Model
API.  Currently the Document.save method doesn't do this, but eventually
we may add that feature.  People have also requested other options on
the Document.save method, like whether to pretty-print or not.  See
http://www.w3.org/MarkUp/DOM/ for more on this topic.  


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From cbullard at hiwaay.net  Tue Jun 10 05:50:21 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun  7 16:57:56 2004
Subject: WF, V, and MSXML
References: <41135C785691CF11B73B00805FD4D2D702A3F8A5@RED-17-MSG.dns.microsoft.com>
Message-ID: <339CCEE1.6CF3@hiwaay.net>

Chris Lovett wrote:

> People have also requested other options on
> the Document.save method, like whether to pretty-print or not.  See
> http://www.w3.org/MarkUp/DOM/ for more on this topic.

I haven't seen this before. It adds a few wrinkles.

What does this mean?

"5.Events will bubble through the structural hierarchy of the document."

len bullard

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From davidsch at microsoft.com  Tue Jun 10 17:19:32 1997
From: davidsch at microsoft.com (David Schach)
Date: Mon Jun  7 16:57:56 2004
Subject: Re WF, V, and MSXML
Message-ID: <011290D45A8ACF119B8B00805FD471D6033DA955@RED-24-MSG.dns.microsoft.com>

> >
> ><PROPOSAL>
> >All parsers (i.e. tools that take XML documents and apply the
> criteria in 
> >XML-LANG only) should state their attitude and behaviour to WFness
> and
> validity.
> ></PROPOSAL>
> >
> >The possible options include at least:
> >	- nsgmls-like.  Full validation is the only option.  Any
> non-valid
> >		dcoument is flagged and appropriate error messages or
> error
> >		action is initiated.  
> >	- Lark-like (at least V0.88 - I think there is another coming).
> No
> >		validation can be attempted.  Any 'output' can only be
> WF or
> >		in error.  NOTE: what does Lark do with the internal
> subset?
> >	- NXP-like.  Validation can be switched on or off by the
> 'client'.
> >		How this is transmitted to the application is
> application
> >		dependent at present.
> >	- MSXML-like.  Undocumented at present.  Possibly [though Terry
> and I
> >		hope not] validating by default, and changing to WF if
> this
> >		fails.
> 
	[David Schach]  The XML spec seems to address this issue in
section 2.20 Required Markup Declaration. 

		In an RMD, the value NONE indicates that an XML
processor can parse the document correctly without first reading  any
part of the DTD.  The value INTERNAL indicates that the XML processor
must read and process the internal subset of the DTD, if provided, to
parse the containing document correctly.  The value ALL indicates that
the XML processor must read and process the declarations in both the
subsets of the DTD, if provided, to parse the containing document
correctly.

		...

		If no RMD is provided, an XML processor must behave as
though an RMD had been provided with the value ALL.    [David Schach]
(emphasis added) 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Tue Jun 10 23:36:06 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:56 2004
Subject: Re WF, V, and MSXML
Message-ID: <7833@ursus.demon.co.uk>

In message <011290D45A8ACF119B8B00805FD471D6033DA955@RED-24-MSG.dns.microsoft.com> David Schach writes:
[...]
> 	[David Schach]  The XML spec seems to address this issue in
> section 2.20 Required Markup Declaration. 

	My problem is with the equivalence or not of the words 'parse',
'process' and 'validate'.  I hope this isn't being seen as mindless pickiness.

> 
> 		In an RMD, the value NONE indicates that an XML
> processor can parse the document correctly without first reading  any
                ^^^^^
If RMD=NONE then the document cannot be validated.  Therefore "parse"!="validate"

> part of the DTD.  The value INTERNAL indicates that the XML processor
> must read and process the internal subset of the DTD, if provided, to
                ^^^^^^^
Presumable means extract the structure of the DTD for 'processing' the document.

> parse the containing document correctly.  The value ALL indicates that
> the XML processor must read and process the declarations in both the
                                  ^^^^^^^
i.e. interpret the DTD subset(s)

> subsets of the DTD, if provided, to parse the containing document
                                      ^^^^^
> correctly.
> 
> 		...
> 
> 		If no RMD is provided, an XML processor must behave as
> though an RMD had been provided with the value ALL.    [David Schach]
> (emphasis added) 

Here is a possible document

<?XML VERSION="1.0" RMD="INTERNAL"?> <!-- Parser, you have to parse me -->
<!DOCTYPE FOO [                      
<!ELEMENT FOO EMPTY>
<!ATTLIST FOO XYZZY CDATA #FIXED "Y2"> 
]>                  <!-- my internal subset is for adding Attvals -->
<FOO BAR="PLUGH"/>

Now, on the argument above (document is in control) the processor parses the 
document.  It cannot be valid, but does the processor try?  If yes, it fails.
The result is either a null document, *or* error recovery to WF parsing.
If the parser does not try to validate, the result is

<FOO XYZZY="Y2" BAR="PLUGH"/>

However, although the spec [5] mentions processors that validate and 
non-validate, in other places (e.g. [2.8]) it uses the phrase 'reads the 
DTD'.  This implies that there are (possibly) three classes of processor:

- a validator (which must always read the DTD)
- a busy non-validator (which reads the DTD not for validation, but for 
	extracting DTD-based markup)
- a lazy non-validator (which does not read the DTD).

The lazy non-validator will produce a different output from the busy 
non-validator, i.e.:

<FOO BAR="PLUGH"/>

The lazy non-validator could be in violation of the spec if the RMD requires
it to parse the DTD subset(s).  Maybe it parses them but throws them away
(i.e. 'does not read' == 'reads and forgets').

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jjc at jclark.com  Wed Jun 11 04:34:28 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun  7 16:57:56 2004
Subject: Re WF, V, and MSXML
Message-ID: <2.2.32.19970611021638.017716d4@jclark.com>

At 10:00 09/06/97 GMT, Peter Murray-Rust wrote:

>The possible options include at least:
>	- nsgmls-like.  Full validation is the only option.  Any non-valid
>		dcoument is flagged and appropriate error messages or error
>		action is initiated.  

The current version of nsgmls (the one in jade 0.8) supports a -wno-valid
which disables most validation.  With this option it doesn't complain about
undeclared element types and attributes.  However, 

- if you supply an attribute definition, then it will check that instances
of that attribute conform (it will of course continue parsing even if they
don't)

- if you declare a content model for an element type, then it will check
that the content of the element matches the content model, except that it
will not complain about the occurrence of any element types for which no
content model has been declared (again it will recover from errors of this
sort).

James


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Wed Jun 11 14:48:28 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:56 2004
Subject: Repeating attribute specifications
Message-ID: <gQj66CA+HpnzEw4o@light.demon.co.uk>

Hi,

Is there anything in the XML spec which corresponds to the SGML
stricture that "there can only be one attribute specification for each
attribute definition", i.e. that you can't have repeated attribute
specifications within a single start-tag?  

If not, XML will allow e.g.

<person ROLE="author" ROLE="designer">

while SGML won't.  Which would be a 'for compatibility' issue.

Richard Light
SGML and Museum Information Consultancy
richard@light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From davidsch at microsoft.com  Wed Jun 11 18:04:05 1997
From: davidsch at microsoft.com (David Schach)
Date: Mon Jun  7 16:57:56 2004
Subject: Repeating attribute specifications
Message-ID: <011290D45A8ACF119B8B00805FD471D60341356F@RED-24-MSG.dns.microsoft.com>

See Section 3.1 Start and End Tags

	Validity Constraint - Unique Att Spec:

	No attribute may appear more than once in the same start-tag.

> -----Original Message-----
> From:	Richard Light [SMTP:richard@light.demon.co.uk]
> Sent:	Wednesday, June 11, 1997 4:55 AM
> To:	xml-dev@ic.ac.uk
> Subject:	Repeating attribute specifications
> 
> Hi,
> 
> Is there anything in the XML spec which corresponds to the SGML
> stricture that "there can only be one attribute specification for each
> attribute definition", i.e. that you can't have repeated attribute
> specifications within a single start-tag?  
> 
> If not, XML will allow e.g.
> 
> <person ROLE="author" ROLE="designer">
> 
> while SGML won't.  Which would be a 'for compatibility' issue.
> 
> Richard Light
> SGML and Museum Information Consultancy
> richard@light.demon.co.uk
> 3 Midfields Walk 
> Burgess Hill
> West Sussex RH15 8JA
> U.K.
> tel. (44) 1444 232067
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Wed Jun 11 23:34:45 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:56 2004
Subject: Repeating attribute specifications
Message-ID: <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca>

>Is there anything in the XML spec which corresponds to the SGML
>stricture that "there can only be one attribute specification for each
>attribute definition", i.e. that you can't have repeated attribute
>specifications within a single start-tag?  

No.  This is legal in XML.  And in SGML, with the recent TC. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From clovett at microsoft.com  Thu Jun 12 00:32:47 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun  7 16:57:56 2004
Subject: Event Bubbling...
Message-ID: <41135C785691CF11B73B00805FD4D2D702A3F8B3@RED-17-MSG.dns.microsoft.com>

> > People have also requested other options on
> > the Document.save method, like whether to pretty-print or not.  See
> > http://www.w3.org/MarkUp/DOM/ for more on this topic.
> 
> I haven't seen this before. It adds a few wrinkles.
> 
> What does this mean?
> 
> "5.Events will bubble through the structural hierarchy of the
> document."
> 
See http://www.microsoft.com/workshop/prog/inetsdk/docs/inet0505.htm


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From bdonoghoe at spin.net.au  Thu Jun 12 01:14:53 1997
From: bdonoghoe at spin.net.au (Bill Donoghoe)
Date: Mon Jun  7 16:57:57 2004
Subject: Repeating attribute specifications
Message-ID: <199706112312.JAA19010@spin.net.au>

Hello,
        I believe the answer to the question appears in Section 3.3 of the
XML syntax spec

When more than one AttlistDecl is provided for a given element type, 
the contents of all those provided are merged. When more than
one definition is provided for the same attribute of a given 
element type, the first declaration is binding and later declarations are
ignored.
For interoperability, writers of DTDs may choose to provide at most 
one attribute-list declaration for a given element type, and at most
one attribute definition for a given attribute name. An XML processor 
may, at user option, issue a warning when more than one
attribute-list declaration is provided for a given element type, or 
more than one attribute definition for a given attribute, but this is not an
error. 

        Therefore, in the DTD you can have multiple attribute list declarations 
for an element (even multiple declarations of the same attribute).  However, in
XML documents an attribute can only occur once inside a start tag.


Example:

In the DTD the following is valid but the second declaration of the attribute
role will be ignored.

<!element person  (name, details) >
<!attlist person  role     CDATA #IMPLIED 
                  location CDATA #REQUIRED >
<!attlist person  phone    CDATA #REQUIRED
                  role     (author, designer, manager, builder) "author" >

The SGML (before the changes from the HyTime T.C. flow through) equivalent is:

<!attlist person role     CDATA #IMPLIED 
                 location CDATA #REQUIRED
                 phone    CDATA #REQUIRED >  
 
At 09:03 11/06/97 -0700, you wrote:
>See Section 3.1 Start and End Tags
>
>	Validity Constraint - Unique Att Spec:
>
>	No attribute may appear more than once in the same start-tag.
>
>> -----Original Message-----
>> From:	Richard Light [SMTP:richard@light.demon.co.uk]
>> Sent:	Wednesday, June 11, 1997 4:55 AM
>> To:	xml-dev@ic.ac.uk
>> Subject:	Repeating attribute specifications
>> 
>> Hi,
>> 
>> Is there anything in the XML spec which corresponds to the SGML
>> stricture that "there can only be one attribute specification for each
>> attribute definition", i.e. that you can't have repeated attribute
>> specifications within a single start-tag?  
>> 
>> If not, XML will allow e.g.
>> 
>> <person ROLE="author" ROLE="designer">
>> 
>> while SGML won't.  Which would be a 'for compatibility' issue.
>> 
>> Richard Light
>> SGML and Museum Information Consultancy
>> richard@light.demon.co.uk
>> 3 Midfields Walk 
>> Burgess Hill
>> West Sussex RH15 8JA
>> U.K.
>> tel. (44) 1444 232067
>> 
>> xml-dev: A list for W3C XML Developers
>> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>> To unsubscribe, send to majordomo@ic.ac.uk the following message;
>> unsubscribe xml-dev
>> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
>
>xml-dev: A list for W3C XML Developers
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To unsubscribe, send to majordomo@ic.ac.uk the following message;
>unsubscribe xml-dev
>List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
>
>
>
Bill Donoghoe                      email: bdonoghoe@acslink.net.au
Systems Analyst & SGML Consultant   
"Do you want some information or all of the data?" 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Thu Jun 12 18:02:32 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:57 2004
Subject: Repeating attribute specifications
In-Reply-To: <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca>
Message-ID: <pY6a6JAPE7nzEw5Q@light.demon.co.uk>

In message <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca>, Tim
Bray <tbray@textuality.com> writes
>>Is there anything in the XML spec which corresponds to the SGML
>>stricture that "there can only be one attribute specification for each
>>attribute definition", i.e. that you can't have repeated attribute
>>specifications within a single start-tag?  
>
>No.  This is legal in XML.  And in SGML, with the recent TC. -T.
 
The other answer I got to this question quoted the XML Lang spec
(section 3.1):

"Validity constraint - Unique Att Spec:
No attribute may appear more than once in the same start-tag."

This seemed to deal with the issue pretty conclusively: I had just
failed to look under "start-tags" while thinking about attributes ;-)

Is this all about to change with the 30 June update?

Richard Light
SGML and Museum Information Consultancy
richard@light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From davidsch at microsoft.com  Thu Jun 12 19:15:33 1997
From: davidsch at microsoft.com (David Schach)
Date: Mon Jun  7 16:57:57 2004
Subject: Repeating attribute specifications
Message-ID: <011290D45A8ACF119B8B00805FD471D60344E247@RED-24-MSG.dns.microsoft.com>

I think Tim misunderstood your question.  In the XML DTD, it is legal to
have multiple AttistDecl's for a given element type (see section 3.3).
This doesn't change the validity constraint of section 3.1.  Attributes
in tags have to be unique.

> -----Original Message-----
> From:	Richard Light [SMTP:richard@light.demon.co.uk]
> Sent:	Thursday, June 12, 1997 1:19 AM
> To:	xml-dev@ic.ac.uk
> Subject:	Re: Repeating attribute specifications
> 
> In message <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca>, Tim
> Bray <tbray@textuality.com> writes
> >>Is there anything in the XML spec which corresponds to the SGML
> >>stricture that "there can only be one attribute specification for
> each
> >>attribute definition", i.e. that you can't have repeated attribute
> >>specifications within a single start-tag?  
> >
> >No.  This is legal in XML.  And in SGML, with the recent TC. -T.
>  
> The other answer I got to this question quoted the XML Lang spec
> (section 3.1):
> 
> "Validity constraint - Unique Att Spec:
> No attribute may appear more than once in the same start-tag."
> 
> This seemed to deal with the issue pretty conclusively: I had just
> failed to look under "start-tags" while thinking about attributes ;-)
> 
> Is this all about to change with the 30 June update?
> 
> Richard Light
> SGML and Museum Information Consultancy
> richard@light.demon.co.uk
> 3 Midfields Walk 
> Burgess Hill
> West Sussex RH15 8JA
> U.K.
> tel. (44) 1444 232067
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Thu Jun 12 20:01:01 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:57 2004
Subject: Repeating attribute specifications
Message-ID: <7923@ursus.demon.co.uk>

In message <011290D45A8ACF119B8B00805FD471D60344E247@RED-24-MSG.dns.microsoft.com> David Schach writes:
> I think Tim misunderstood your question.  In the XML DTD, it is legal to
> have multiple AttistDecl's for a given element type (see section 3.3).
> This doesn't change the validity constraint of section 3.1.  Attributes
> in tags have to be unique.

I think I have misunderstood the answers as well :-)  I'd be grateful for a 
very simple explanation.

I assumed that the multiple attributes was so that if (say) 

<!ATTLIST FOO BAR CDATA "BAZ">

occurs in the external DTD and

<!ATTLIST FOO BAR CDATA "XYZZY">

occurs in the internal subset
then this is now legal whereas it wasn't before.  But what is now the default
value of BAR? I assumed it was the later declaration ("XYZZY").  Please
disabuse me if this is wrong.  [I assume that 

<FOO BAR="abc" BAR="xyz">

is illegal, still.  If not we have some software to rewrite.]

	P.


> 
> > -----Original Message-----
> > From:	Richard Light [SMTP:richard@light.demon.co.uk]
> > Sent:	Thursday, June 12, 1997 1:19 AM
> > To:	xml-dev@ic.ac.uk
> > Subject:	Re: Repeating attribute specifications
> > 
> > In message <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca>, Tim
> > Bray <tbray@textuality.com> writes
> > >>Is there anything in the XML spec which corresponds to the SGML
> > >>stricture that "there can only be one attribute specification for
> > each
> > >>attribute definition", i.e. that you can't have repeated attribute
> > >>specifications within a single start-tag?  
> > >
> > >No.  This is legal in XML.  And in SGML, with the recent TC. -T.
> >  
> > The other answer I got to this question quoted the XML Lang spec
> > (section 3.1):
> > 
> > "Validity constraint - Unique Att Spec:
> > No attribute may appear more than once in the same start-tag."
> > 
> > This seemed to deal with the issue pretty conclusively: I had just
> > failed to look under "start-tags" while thinking about attributes ;-)
> > 
> > Is this all about to change with the 30 June update?
> > 
> > Richard Light
> > SGML and Museum Information Consultancy
> > richard@light.demon.co.uk
> > 3 Midfields Walk 
> > Burgess Hill
> > West Sussex RH15 8JA
> > U.K.
> > tel. (44) 1444 232067
> > 
> > xml-dev: A list for W3C XML Developers
> > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> > To unsubscribe, send to majordomo@ic.ac.uk the following message;
> > unsubscribe xml-dev
> > List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Thu Jun 12 20:24:02 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:57 2004
Subject: Repeating attribute specifications
Message-ID: <3.0.32.19970612111857.00a54820@pop.intergate.bc.ca>

At 06:53 PM 12/06/97 GMT, Peter Murray-Rust wrote:
>[I assume that 
><FOO BAR="abc" BAR="xyz">
>is illegal, still.  If not we have some software to rewrite.]

Yes, it still is.  Yes, I screwed up.  Sigh.  As Michael pointed
out, we have a spec bug in that this is a WFC, not a VC.  -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From clovett at microsoft.com  Thu Jun 12 21:18:51 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun  7 16:57:57 2004
Subject: Re WF, V, and MSXML
Message-ID: <41135C785691CF11B73B00805FD4D2D702A3F8B9@RED-17-MSG.dns.microsoft.com>


	Regading all the discussion about the RMD attribute and
switching validation on and off and error recovery and so on....

	The reason MSXML doesn't implement RMD yet is because there are
problems with the RMD=IGNORE concept since ignoring the DTD can result
in different data being given to the application - which generally is a
bad thing.  The spec says it is an error to specify RMD=IGNORE if the
DTD contains any declarations of:
		1) attributes with default values, if elements to which
these attributes apply appear in the document instance without
specifying values for these attributes, or
		2) entities. (other than the built in entities), if
references to those entities appear in the document instance, or
		3) element types with element content, if white space
occurs in the document instance directly within any instance of those
types.

	The problem is that if the parser ignores the DTD, how can it
detect #1 above ?  Also, the white space handling can be ambiguous.

	So, MSXML currently takes the following approach:
		- RMD attribute is not implmented yet, so if a DTD is
there it uses it.  
		- If an error is found it stops.  No error recovery is
attempted.
		- If you don't want validation, remove the DTD.  
		- It is ok to not define some of the elements in the
DTD.  This simply means that in the same document there is certain data
that you want to guarantee to be correct, and other data that is more
unknown in structure (but still well-formed).  This is simply a side
effect of being able to parse a document without a DTD.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From paul at arbortext.com  Thu Jun 12 23:49:50 1997
From: paul at arbortext.com (Paul Grosso)
Date: Mon Jun  7 16:57:57 2004
Subject: Repeating attribute specifications
Message-ID: <3.0.32.19970612174836.006d473c@pophost.arbortext.com>

At 18:53 1997 06 12 GMT, Peter Murray-Rust wrote:
>I assumed that the multiple attributes was so that if (say) 
>
><!ATTLIST FOO BAR CDATA "BAZ">
>
>occurs in the external DTD and
>
><!ATTLIST FOO BAR CDATA "XYZZY">
>
>occurs in the internal subset
>then this is now legal whereas it wasn't before.  But what is now the default
>value of BAR? I assumed it was the later declaration ("XYZZY").  Please
>disabuse me if this is wrong. 

Your answer seems to be in 3.3 of the XML-lang spec:

When more than one AttlistDecl is provided for a given element type, the 
contents of all those provided are merged. When more than one definition is 
provided for the same attribute of a given element type, the first 
declaration is binding and later declarations are ignored. For 
interoperability, writers of DTDs may choose to provide at most one 
attribute-list declaration for a given element type, and at most one 
attribute definition for a given attribute name. An XML processor may, at 
user option, issue a warning when more than one attribute-list declar ation 
is provided for a given element type, or more than one attribute definition 
for a given attribute, but this is not an error. 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From davidsch at microsoft.com  Thu Jun 12 23:57:39 1997
From: davidsch at microsoft.com (David Schach)
Date: Mon Jun  7 16:57:57 2004
Subject: Repeating attribute specifications
Message-ID: <011290D45A8ACF119B8B00805FD471D603459316@RED-24-MSG.dns.microsoft.com>

Per section 3.3

	When more than one AttlistDecl is provided for a given element
type, the contents of all those provided are merged.  When more than one
definition is provided for the same attribute of a given element type,
the first declaration is binding and the later declarations are ignored.

In your example, the definition in the internal DTD, <!ATTLIST FOO BAR
CDATA "XYZZY">  is processed first so it takes precedence over the
definition in the external DTD.

> -----Original Message-----
> From:	Peter@ursus.demon.co.uk [SMTP:Peter@ursus.demon.co.uk]
> Sent:	Thursday, June 12, 1997 11:53 AM
> To:	xml-dev@ic.ac.uk
> Subject:	RE: Repeating attribute specifications
> 
> In message
> <011290D45A8ACF119B8B00805FD471D60344E247@RED-24-MSG.dns.microsoft.com
> > David Schach writes:
> > I think Tim misunderstood your question.  In the XML DTD, it is
> legal to
> > have multiple AttistDecl's for a given element type (see section
> 3.3).
> > This doesn't change the validity constraint of section 3.1.
> Attributes
> > in tags have to be unique.
> 
> I think I have misunderstood the answers as well :-)  I'd be grateful
> for a 
> very simple explanation.
> 
> I assumed that the multiple attributes was so that if (say) 
> 
> <!ATTLIST FOO BAR CDATA "BAZ">
> 
> occurs in the external DTD and
> 
> <!ATTLIST FOO BAR CDATA "XYZZY">
> 
> occurs in the internal subset
> then this is now legal whereas it wasn't before.  But what is now the
> default
> value of BAR? I assumed it was the later declaration ("XYZZY").
> Please
> disabuse me if this is wrong.  [I assume that 
> 
> <FOO BAR="abc" BAR="xyz">
> 
> is illegal, still.  If not we have some software to rewrite.]
> 
> 	P.
> 
> 
> > 
> > > -----Original Message-----
> > > From:	Richard Light [SMTP:richard@light.demon.co.uk]
> > > Sent:	Thursday, June 12, 1997 1:19 AM
> > > To:	xml-dev@ic.ac.uk
> > > Subject:	Re: Repeating attribute specifications
> > > 
> > > In message <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca>,
> Tim
> > > Bray <tbray@textuality.com> writes
> > > >>Is there anything in the XML spec which corresponds to the SGML
> > > >>stricture that "there can only be one attribute specification
> for
> > > each
> > > >>attribute definition", i.e. that you can't have repeated
> attribute
> > > >>specifications within a single start-tag?  
> > > >
> > > >No.  This is legal in XML.  And in SGML, with the recent TC. -T.
> > >  
> > > The other answer I got to this question quoted the XML Lang spec
> > > (section 3.1):
> > > 
> > > "Validity constraint - Unique Att Spec:
> > > No attribute may appear more than once in the same start-tag."
> > > 
> > > This seemed to deal with the issue pretty conclusively: I had just
> > > failed to look under "start-tags" while thinking about attributes
> ;-)
> > > 
> > > Is this all about to change with the 30 June update?
> > > 
> > > Richard Light
> > > SGML and Museum Information Consultancy
> > > richard@light.demon.co.uk
> > > 3 Midfields Walk 
> > > Burgess Hill
> > > West Sussex RH15 8JA
> > > U.K.
> > > tel. (44) 1444 232067
> > > 
> > > xml-dev: A list for W3C XML Developers
> > > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> > > To unsubscribe, send to majordomo@ic.ac.uk the following message;
> > > unsubscribe xml-dev
> > > List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> > 
> > xml-dev: A list for W3C XML Developers
> > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> > To unsubscribe, send to majordomo@ic.ac.uk the following message;
> > unsubscribe xml-dev
> > List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> > 
> > 
> 
> -- 
> Peter Murray-Rust, domestic net connection
> Virtual School of Molecular Sciences
> http://www.vsms.nottingham.ac.uk/
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Jun 13 00:14:38 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:57 2004
Subject: Re WF, V, and MSXML
Message-ID: <7934@ursus.demon.co.uk>

In message <41135C785691CF11B73B00805FD4D2D702A3F8B9@RED-17-MSG.dns.microsoft.com> Chris Lovett writes:
> 
> 	Regading all the discussion about the RMD attribute and
> switching validation on and off and error recovery and so on....

I think MSXML has taken a reasonable position given the ambiguities...

> 
> 	The reason MSXML doesn't implement RMD yet is because there are
> problems with the RMD=IGNORE concept since ignoring the DTD can result

Agreed.
[I'm working from XML-lang-970331, which doesn't use RMD="IGNORE".   Is this
the same as "NONE"?]

> in different data being given to the application - which generally is a
> bad thing.  The spec says it is an error to specify RMD=IGNORE if the
^^^^^^^^^^^
I would have said it was always a bad thing!

> DTD contains any declarations of:
> 		1) attributes with default values, if elements to which
> these attributes apply appear in the document instance without
> specifying values for these attributes, or
> 		2) entities. (other than the built in entities), if
> references to those entities appear in the document instance, or
> 		3) element types with element content, if white space
> occurs in the document instance directly within any instance of those
> types.
> 
> 	The problem is that if the parser ignores the DTD, how can it
> detect #1 above ?  Also, the white space handling can be ambiguous.

Agreed.  I think the ERB have to consider this.  I cannot see how a parser
(even with RMD="NONE") may not read the DTD.  I think the option is really
related only to #3.

> 
> 	So, MSXML currently takes the following approach:
> 		- RMD attribute is not implmented yet, so if a DTD is
> there it uses it.  
...........^^^^

This is an ambigous word :-)  It can mean either the creation of the proper 
document content and/or validation.

> 		- If an error is found it stops.  No error recovery is
> attempted.

:-)

> 		- If you don't want validation, remove the DTD.  


Ah, but you cannot use entities or default attribute values.

> 		- It is ok to not define some of the elements in the
> DTD.  This simply means that in the same document there is certain data
> that you want to guarantee to be correct, and other data that is more
> unknown in structure (but still well-formed).  This is simply a side
> effect of being able to parse a document without a DTD.

This implies partial validation, which we don't have.  There is no reason
for defining any ELEMENTs if the document is not validated (and the element
content not analysed).

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From davidsch at microsoft.com  Fri Jun 13 04:27:39 1997
From: davidsch at microsoft.com (David Schach)
Date: Mon Jun  7 16:57:57 2004
Subject: Entities in Attribute Values
Message-ID: <011290D45A8ACF119B8B00805FD471D603463877@RED-24-MSG.dns.microsoft.com>

Because entity references are allowed inside ot attribute values, it is
not possible to store an unmodified URL with data in an attribute.  For
example, the following XML is not valid because the '&''s are not
escaped inside of SELF's HREF value.  

	<CHANNEL HREF="http://someserver/comics/">
	<TITLE>Daily Comics</TITLE>

	<!-- HEF's value is invalid in SELF because the &'s aren't
escaped as &amp; -->

	<SELF
HREF="http://someserver/scripts/oleisapi2.dll/comics.custom.cdf?comics=o
n&dilbert=on&calvin=on&peanuts=on" />

	</CHANNEL>

This makes it inconvenient to store URL's in XML files.  Would it anyone
be interested in changing entity processing to fix this?

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From aray at q2.net  Fri Jun 13 06:51:45 1997
From: aray at q2.net (Arjun Ray)
Date: Mon Jun  7 16:57:57 2004
Subject: Entities in Attribute Values
In-Reply-To: <011290D45A8ACF119B8B00805FD471D603463877@RED-24-MSG.dns.microsoft.com>
Message-ID: <Pine.LNX.3.95.970613005347.24555C-100000@q2.net>


On Thu, 12 Jun 1997, David Schach wrote:

> Because entity references are allowed inside ot attribute values, it is
> not possible to store an unmodified URL with data in an attribute.  For
> example, the following XML is not valid because the '&''s are not
> escaped inside of SELF's HREF value.  

This is a problem only if '&' *must* be the field separator. Why not
something else,, like ';' ?

> 	<SELF
> HREF="http://someserver/scripts/oleisapi2.dll/comics.custom.cdf?comics=o
> n&dilbert=on&calvin=on&peanuts=on" />
> 
> This makes it inconvenient to store URL's in XML files.  Would it anyone
> be interested in changing entity processing to fix this?

IMHO, there's no need for that. Or, at any rate, there shouldn't be. Using
'&' as a field separator in "query URLs" is a historical artefact of lack
of RTFM. The problem was recognized reasonably early too, and a fix was
proposed, but no HTML browser implementor of, ah, consequence ever got a
Round Tuit. 

>From RFC 1866, Section 8.2.1 "The form-urlencoded Media Type":

           NOTE - The URI from a query form submission can be
            used in a normal anchor style hyperlink.
            Unfortunately, the use of the `&' character to
            separate form fields interacts with its use in SGML
            attribute values as an entity reference delimiter.
            For example, the URI `http://host/?x=1&y=2' must be
            written `<a href="http://host/?x=1&#38;y=2"' or `<a
            href="http://host/?x=1&amp;y=2">'.

            HTTP server implementors, and in particular, CGI
            implementors are encouraged to support the use of
            `;' in place of `&' to save users the trouble of
            escaping `&' characters this way.
 
We're not committed to perpetauting mistakes, are we?


Arjun


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtigue at datachannel.com  Fri Jun 13 06:57:47 1997
From: jtigue at datachannel.com (John Tigue)
Date: Mon Jun  7 16:57:57 2004
Subject: CDF DTD
Message-ID: <33A0D377.7FE9707F@datachannel.com>

I've got an incomplete DTD for CDF. I'd like to check some CDF files for
validity. Does anyone know of a complete version?

--
John Tigue
Programmer
jtigue@datachannel.com
DataChannel (http://www.datachannel.com)
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 316 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970613/1595bd60/vcard.vcf
From jjc at jclark.com  Fri Jun 13 14:43:21 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun  7 16:57:57 2004
Subject: Re WF, V, and MSXML
Message-ID: <199706131242.GAA21549@jclark.com>

>The reason MSXML doesn't implement RMD yet is because there are
>problems with the RMD=IGNORE concept since ignoring the DTD can result
>in different data being given to the application - which generally is a
>bad thing. The spec says it is an error to specify RMD=IGNORE if the
>DTD contains any declarations of:
>1) attributes with default values, if elements to which
>these attributes apply appear in the document instance without
>specifying values for these attributes, or

>The problem is that if the parser ignores the DTD, how can it
>detect #1 above ?

Obviously it can't.  If a parser wants to fully validate an XML document it
has to read the entire DTD.  One of the things it must validate, if the
document has RMD=IGNORE, is that the DTD could be ignored without changing
the data the application received.  A parser that is not validating, on the
other hand, can choose to take advantage of the RMD decl and not parse the
DTD.  Provided that the document has been validated, the non-validating
parser will be guaranteed to get the same results as the validating parser.

James


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From davidsch at microsoft.com  Fri Jun 13 18:21:58 1997
From: davidsch at microsoft.com (David Schach)
Date: Mon Jun  7 16:57:57 2004
Subject: Entities in Attribute Values
Message-ID: <011290D45A8ACF119B8B00805FD471D60346D795@RED-24-MSG.dns.microsoft.com>

The use of the & as the CGI separator character is a well established
convention and unlikely to change.  It will continue whether we support
it or not.

> -----Original Message-----
> From:	Arjun Ray [SMTP:aray@q2.net]
> Sent:	Thursday, June 12, 1997 10:03 PM
> To:	xml-dev@ic.ac.uk
> Subject:	Re: Entities in Attribute Values
> 
> 
> 
> On Thu, 12 Jun 1997, David Schach wrote:
> 
> > Because entity references are allowed inside ot attribute values, it
> is
> > not possible to store an unmodified URL with data in an attribute.
> For
> > example, the following XML is not valid because the '&''s are not
> > escaped inside of SELF's HREF value.  
> 
> This is a problem only if '&' *must* be the field separator. Why not
> something else,, like ';' ?
> 
> > 	<SELF
> >
> HREF="http://someserver/scripts/oleisapi2.dll/comics.custom.cdf?comics
> =o
> > n&dilbert=on&calvin=on&peanuts=on" />
> > 
> > This makes it inconvenient to store URL's in XML files.  Would it
> anyone
> > be interested in changing entity processing to fix this?
> 
> IMHO, there's no need for that. Or, at any rate, there shouldn't be.
> Using
> '&' as a field separator in "query URLs" is a historical artefact of
> lack
> of RTFM. The problem was recognized reasonably early too, and a fix
> was
> proposed, but no HTML browser implementor of, ah, consequence ever got
> a
> Round Tuit. 
> 
> From RFC 1866, Section 8.2.1 "The form-urlencoded Media Type":
> 
>            NOTE - The URI from a query form submission can be
>             used in a normal anchor style hyperlink.
>             Unfortunately, the use of the `&' character to
>             separate form fields interacts with its use in SGML
>             attribute values as an entity reference delimiter.
>             For example, the URI `http://host/?x=1&y=2' must be
>             written `<a href="http://host/?x=1&#38;y=2"' or `<a
>             href="http://host/?x=1&amp;y=2">'.
> 
>             HTTP server implementors, and in particular, CGI
>             implementors are encouraged to support the use of
>             `;' in place of `&' to save users the trouble of
>             escaping `&' characters this way.
>  
> We're not committed to perpetauting mistakes, are we?
> 
> 
> Arjun
> 
> 
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Fri Jun 13 21:06:36 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:57 2004
Subject: Repeating attribute specifications
In-Reply-To: <7923@ursus.demon.co.uk>
Message-ID: <PxStMCAMYQozEw26@light.demon.co.uk>

In message <7923@ursus.demon.co.uk>, Peter Murray-Rust
<Peter@ursus.demon.co.uk> writes
>
>I assumed that the multiple attributes was so that if (say) 
>
><!ATTLIST FOO BAR CDATA "BAZ">
>
>occurs in the external DTD and
>
><!ATTLIST FOO BAR CDATA "XYZZY">
>
>occurs in the internal subset
>then this is now legal whereas it wasn't before.  But what is now the default
>value of BAR? I assumed it was the later declaration ("XYZZY").  Please
>disabuse me if this is wrong.

Yes, you're right, but the reason why should be made clear.  (It is
_not_ because it's "the later declaration"!):

The attribute-list declaration for element type FOO:

<!ATTLIST FOO BAR CDATA "XYZZY">

is read _first_ because it is in the internal DTD subset, which is
processed before the external DTD subset.  So the attribute definition
for the element type BAR takes precedence over that given in the other
attribute-list declaration for FOO:

<!ATTLIST FOO BAR CDATA "BAZ">

because "the first declaration is binding and later declarations are
ignored".  Note that it is the _whole_ attribute definition:

BAR CDATA "XYZZY"

which is used, not just the default value as you suggest.  (The two
attribute declarations might have specified different attribute types.)

Sorry to have caused confusion in the first place!

Richard Light
SGML and Museum Information Consultancy
richard@light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From cbullard at hiwaay.net  Sat Jun 14 03:01:22 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun  7 16:57:57 2004
Subject: Entities in Attribute Values
References: <011290D45A8ACF119B8B00805FD471D60346D795@RED-24-MSG.dns.microsoft.com>
Message-ID: <33A1E667.49B8@hiwaay.net>

David Schach wrote:
> 
> The use of the & as the CGI separator character is a well established
> convention and unlikely to change.  It will continue whether we support
> it or not.

The use of the & character is a well established convention and 
was before the query URL designers made their mistake.  It will 
continue to be so.  Impasse.

It happens in all cases of failure to RTFM.  The question now 
is what to do about it.  Since as Arjun has shown, it is a 
documented mistake, now is the time to fix that.

len bullard

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From indigo at MIT.EDU  Sat Jun 14 08:01:23 1997
From: indigo at MIT.EDU (Hyung-Jin Kim)
Date: Mon Jun  7 16:57:58 2004
Subject: hi!
Message-ID: <9706140601.AA10887@MIT.MIT.EDU>

I'm new to this list so I apologize if this question has been answered already:

I was wondering if anyone knew of an parser that made well-formed XML files
from HTML files.  I know of a few tools that can DETECT mal-formed tags in
HTML (i.e. weblint) but is there a tool that will do the conversion?
Thanks!  Please reply directly to me.

-jim

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~	
Hyung-Jin Kim						407 Memorial Dr.	
M.I.T.		              ,,,			Cambridge, MA 02139
Cambridge, MA                (o o)			(617)494-9907
~~~~~~~~~~~~~~~~~~~~~~~~~~oOOo(_)oOOo~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat Jun 14 11:05:44 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:58 2004
Subject: hi!
Message-ID: <7993@ursus.demon.co.uk>


In message <9706140601.AA10887@MIT.MIT.EDU> Hyung-Jin Kim writes:
> I'm new to this list so I apologize if this question has been answered already:

We have not had enough discussion about HTML on this list - and I, for one,
would like version(s) of XMLised DTDs and documents.
> 
> I was wondering if anyone knew of an parser that made well-formed XML files
> from HTML files.  I know of a few tools that can DETECT mal-formed tags in
> HTML (i.e. weblint) but is there a tool that will do the conversion?
> Thanks!  Please reply directly to me.

Mal-formed HTML (i.e. non-conforming SGML) is outside the scope of this list 
:-).  However, converting legal HTML (i.e. conforming SGML) to XML is a valid
activity and it could be useful to get feedback.  It normally requires a 
DTD (for example <!Element body o o (%body.content;)> means that <BODY> tags
are frequently omitted.  There is also the question of what to do with EMPTY
tags such as <HR>.  Does it matter if they are rendered as
<HR/>
or, say
<HR>
</HR>
?
I convinced myself that it did, in that the first has no child, while the
second could have a PCDATA child of value "\n" - at least in WF documents?
What is its value in <HR></HR>?

Could someone more authoritative give an overview of the XML-isation of HTML?
I need HT(X)ML to provide the text sections for CML...

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat Jun 14 12:50:37 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:58 2004
Subject: hi!
Message-ID: <8005@ursus.demon.co.uk>

Welcome,

In message <v01540b0aafc81699ccd6@[137.111.90.66]> ross@mpce.mq.edu.au (Ross Moore) writes:
[...]
> 
> Currently I'm putting the finishing touches on the latest version of LaTeX2HTML.

This is a noble effort.  
> 
> Later this year I hope to tackle   LaTeXML  for which I would like to be
> able to use
> existing DTDs as much as possible --- especially for portions of  MathML ---
> rather than having to write my own.

I have always admired the (La)TeX virtual community of volunteers and presumably
they will be keen to learn about XML and how it applies to LaTeX.  In which case
this represents a significant pool of potential XML-friendly hackers :-)

I'm thinking as I write, but it seems as if there should be 'a' LaTeX DTD 
(possibly modular), which interoperates with the MathML DTD.  I think it's 
important to keep them distinct because there are many people who don't use
LaTeX for maths, but as a general authoring tool.  Since MathML specifically
mentions TeX as a NOTATION, and as isomorphic to mathML in some parts, the
clear separation of all components (LaTeXML, MathML, TeX) is critical.

> 
> Having a reliable  HTML --> XML  ought to be an option too.
> 
> Indeed this would probably be the easiest way to go for a first working version,
> given the effort that has already gone into  LaTeX2HTML .

I'd agree.  LaTeX is an excellent tool, but it doesn't have the full structuring
power of XML unless it's specifically thought of at the start.  I speak from
experience as I wrote a complex book in LaTeX, with outputs as *.dvi, *.html,
and several implied conditional sections.  That was before I discovered the 
point of SGML - I spent many midnights writing programs to restructure the
book :-(
> 
> Ultimately a scheme will be needed whereby (partial) DTDs can be
> constructed automatically
> from any  \newenvironment  commands that the user devises for the LaTeX
> typeset version.

Yes - I think that a current LaTeX user can probably devise structuring
like this that makes the transformation much easier.  Among the things that are
difficult to convert are paragraph/line breaks (when not explicitly marked up)

> 
> 
> I'd love to hear from anyone else interested in:
> 
>   1.  converting existing LaTeX documents into  XML ;

I'd agree that LaTeX->HTML/XML is a useful start.  One discussion would
be whether one had to have a DTD that supported all constructs in the LaTeX
manual, or whether there was a more generic DIV-like container. Another would
be how to support user-defined macros.  Also, would you work on the authored
document, or some later normalised/expanded version (I've lost touch with
Latex2html, but I assume that it works on some normalised version which has
lost the author's macros).

For scientific technical documents this is a highly desirable goal :-)

> 
>   2.  using LaTeX syntax as a front-end to XML for documents on the Web .

Do you mean transforming XML documents into LaTeX (I tend to think of this
as a back-end) or as a way of authoring XML documents using LaTeX?  The latter
is rather similar to (1).  The second will require a transformation engine
which most people would approach through DSSSL styleheets, I imagine.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tikvas at agentsoft.com  Sun Jun 15 07:55:17 1997
From: tikvas at agentsoft.com (Tikva Schmidt)
Date: Mon Jun  7 16:57:58 2004
Subject: White spaces in Dtd.
Message-ID: <33A3808F.2841@agentsoft.com>

I'm a developer at AgentSoft Ltd. We create tools for  Web
Automation and are now trying to make our inteligant agents work 
with XML.
   In trying to create a Dtd parser I came across several unclear things
for example the usage of white spaces in the dtd.It seems like the
grammer has set rules for where space are allowed or needed,and when
they can be replaced by an entity refering to space.I also thought the
dtd was supposed to be easy to parse.I came across something which
is either a mistake or an unclear design.For example the rule for
elementdecl is  " '<!ELEMENT' S %Name %S %contentspec S? '>'  "
this looks like an entity reference for the %S following the name would 
have to be directly after the name with no space between them. This
makes parsing more dificult for machines and the human eye.Perhaps there 
is a mistake in defining the meaning of %a,or perhaps the % shouldn't
appear before the S... 
 
   What should I expect for $S ???

      Tikva Schmidt.

--------------------------------------------------------------------
Tikva Schmidt.
email: tikvas@agentsoft.co.il
corp:  Agentsoft Ltd.          http://www.agentsoft.co.il
Phone: 972-2-6480573
---------------------------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Jun 15 13:56:53 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:58 2004
Subject: White spaces in Dtd.
Message-ID: <8028@ursus.demon.co.uk>

In message <33A3808F.2841@agentsoft.com> "Tikva Schmidt" writes:
> I'm a developer at AgentSoft Ltd. We create tools for  Web
> Automation and are now trying to make our inteligant agents work 
> with XML.
>    In trying to create a Dtd parser I came across several unclear things
> for example the usage of white spaces in the dtd.It seems like the
> grammer has set rules for where space are allowed or needed,and when
> they can be replaced by an entity refering to space.I also thought the
> dtd was supposed to be easy to parse.I came across something which
.........................^^^^^^^^^^^^^

I sympathise with this :-).  Parameter Entities (PEs) initially gave parser
writers and the ERB/WG a lot of problems.  The current rules (I refer to 970331)
are simpler than initially, but I must admit that I don't find that particular
part of the spec easy to understand.  I think it's fair to say that
**apart from PEs** the DTD is easy to parse.  It may also be that the current
rules for PE substitution can be described in a simple way and I just haven't
picked this up.

I am sure that you will get answers from people more knowledgeable than me,
but also remember that a revision of the spec is due on July 1.  What is in
it is determined by the ERB, but I would be surprised if there were not 
clarifications relating to PEs :-).

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From antheae at wrox.com  Mon Jun 16 17:46:59 1997
From: antheae at wrox.com (Anthea Elston)
Date: Mon Jun  7 16:57:58 2004
Subject: Developing a book on XML
Message-ID: <c=GB%a=_%p=Wrox_Press%l=WROX2-970616154530Z-587@mail.wrox.co.uk>

Hi

I'm a development editor with Wrox Press, based in Birmingham. I head up
a team which tries to produce books on the latest developments in
programming, particularly web based programming. XML looks like being
the next big thing, so if anyone out there is interested in writing a
Programmer's Reference book on XML, please contact me for further
details.

Anthea


Anthea Elston
Wrox Press Ltd, 30 Lincoln Road, Olton, Birmingham
UK Tel: 0121 706 6826
http://www.wrox.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Fri Jun 13 21:06:48 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:58 2004
Subject: Repeating attribute specifications
In-Reply-To: <7923@ursus.demon.co.uk>
Message-ID: <PxStMCAMYQozEw26@light.demon.co.uk>

In message <7923@ursus.demon.co.uk>, Peter Murray-Rust
<Peter@ursus.demon.co.uk> writes
>
>I assumed that the multiple attributes was so that if (say) 
>
><!ATTLIST FOO BAR CDATA "BAZ">
>
>occurs in the external DTD and
>
><!ATTLIST FOO BAR CDATA "XYZZY">
>
>occurs in the internal subset
>then this is now legal whereas it wasn't before.  But what is now the default
>value of BAR? I assumed it was the later declaration ("XYZZY").  Please
>disabuse me if this is wrong.

Yes, you're right, but the reason why should be made clear.  (It is
_not_ because it's "the later declaration"!):

The attribute-list declaration for element type FOO:

<!ATTLIST FOO BAR CDATA "XYZZY">

is read _first_ because it is in the internal DTD subset, which is
processed before the external DTD subset.  So the attribute definition
for the element type BAR takes precedence over that given in the other
attribute-list declaration for FOO:

<!ATTLIST FOO BAR CDATA "BAZ">

because "the first declaration is binding and later declarations are
ignored".  Note that it is the _whole_ attribute definition:

BAR CDATA "XYZZY"

which is used, not just the default value as you suggest.  (The two
attribute declarations might have specified different attribute types.)

Sorry to have caused confusion in the first place!

Richard Light
SGML and Museum Information Consultancy
richard@light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at Eng.Sun.COM  Tue Jun 17 08:19:35 1997
From: Jon.Bosak at Eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:57:58 2004
Subject: Reminder: XML online
Message-ID: <199706170617.XAA00821@boethius.eng.sun.com>

I announced this about three months ago and am just repeating it for
the benefit of anyone who recently subscribed to the list.

If you are looking for generated XML to test your XML application, you
can find it at our Sun documentation server, docs.sun.com.  This site
(not yet widely publicized) exists primarily to serve out HTML
generated on the fly from our SolBook (DocBook) database of Solaris
manuals.  To see it operating in normal mode, just point your Web
browser at http://docs.sun.com.

In addition to its normal HTML output, our AnswerBook2 team has rigged
docs.sun.com to generate an unsophisticated but copious alternative
XML data stream if you know how to ask for it.

HOW TO GET XML

The AnswerBook2 (ab2) manuals on docs.sun.com are organized into
several large categories (alluser, sysadmin, etc.) with a number of
books in each catagory.  Thus, the Solaris Advanced User's Guide is
referred to in URLs as /ab2/alluser/ADVOSUG.  Two forms of XML access
are currently supported: TOCs and document chunks.  TOCs are accessed
via the @xmlToc template, and chunks are accessed via the @xmlChunk
template.  The @xmlToc template always shows a table of contents down
to the chapter level, no matter what level it is invoked at.

Some examples:

1. To get a chapter-level TOC of the entire contents of the server:

   http://docs.sun.com/ab2/@xmlToc

2. To get a chapter-level TOC of the manuals in the alluser category:

   http://docs.sun.com/ab2/alluser/@xmlToc

3. To get a chapter-level TOC of the Solaris Advanced User's Guide:

   http://docs.sun.com/ab2/alluser/ADVOSUG/@xmlToc

4. To get a particular chapter from the manual (as listed in the TOC):

   http://docs.sun.com/ab2/alluser/ADVOSUG/@xmlChunk/1120


Jon

----------------------------------------------------------------------
 Jon Bosak, Online Information Technology Architect, Sun Microsystems
----------------------------------------------------------------------
     2550 Garcia Ave., MPK17-101, Mountain View, California 94043
 Davenport Group::SGML Open::NCITS V1::ISO/IEC JTC1/SC18/WG8::W3C XML
   If a man look sharply and attentively, he shall see Fortune; for
   though she be blind, yet she is not invisible.  -- Francis Bacon
----------------------------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tikvas at agentsoft.com  Tue Jun 17 08:50:07 1997
From: tikvas at agentsoft.com (Tikva Schmidt)
Date: Mon Jun  7 16:57:58 2004
Subject: White spaces in Dtd.
References: <199706161530.KAA91668@tigger.cc.uic.edu>
Message-ID: <33A63054.3AB@agentsoft.com>

C M Sperberg-McQueen wrote:
> 
> On Sun, 15 Jun 1997 08:41:35, "Tikva Schmidt" <tikvas@agentsoft.com>
> wrote:
> 
> >   In trying to create a Dtd parser I came across several unclear things
> >for example the usage of white spaces in the dtd.It seems like the
> >grammer has set rules for where space are allowed or needed,and when
> >they can be replaced by an entity refering to space.I also thought the
> >dtd was supposed to be easy to parse.I came across something which
> >is either a mistake or an unclear design.For example the rule for
> >elementdecl is  " '<!ELEMENT' S %Name %S %contentspec S? '>'  "
> >this looks like an entity reference for the %S following the name would
> >have to be directly after the name with no space between them. This
> >makes parsing more dificult for machines and the human eye.Perhaps there
> >is a mistake in defining the meaning of %a,or perhaps the % shouldn't
> >appear before the S...
> >
> >   What should I expect for $S ???
> 
> Thanks for the observation.  If the S following Name is replaced by
> a parameter entity reference, it should *not* be required to come
> immediately after the Name.  Declarations of the form
> 
>   <!ELEMENT foo %xo; (#PCDATA) >
> 
> should be legal.  The intention, in allowing this particular S to be
> parameterized, is to make it possible to parameterize the tag
> omissibility indications needed in most production Full-SGML DTDs
> while being able to use the same DTD source also for XML.  It was
> added very late, and in my haste I made a mistake.  (Tim is wholly
> blameless in this.)
> 
> We should be able to fix this error in the next release of the spec.
> 
> -C. M. Sperberg-McQueen

  Thanks .
 
  This mean the following examples are legal as well.
     <!ELEMENT foo%xo; (#PCDATA) >   
     <!ELEMENT %foo;%xo;%cont; >
  Is that going to change?   
  

           Tikva.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Tue Jun 17 13:21:52 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:58 2004
Subject: XML, HTML and LaTeX2HTML
Message-ID: <8093@ursus.demon.co.uk>

Just to add my own interest in this (having written a book in LaTeX and used
LaTeX2HTML.  That was 3yrB4XML, so now I would use XML :-)).

LaTeX is an established and powerful authoring tool, especially for scientific
mathematical and technical documents.  TeX has been unrivalled as a 
typesetting language in these disciplines. LaTeX2HTML (which Ross is looking 
after) is a useful tool for publishing HTML.  TeX provides a widely suported 
output format (*.dvi) for many systems.  So my interests are:

what is the a role for LaTeX2XML (sic)?

what is the role for XML2TeX?

and has anyone been developing tools in this area?

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From norm at berkshire.net  Tue Jun 17 15:58:53 1997
From: norm at berkshire.net (Norman Walsh)
Date: Mon Jun  7 16:57:58 2004
Subject: XML, HTML and LaTeX2HTML
In-Reply-To: Peter@ursus.demon.co.uk's message of Tue, 17 Jun 1997 10:42:00 GMT
References: <8093@ursus.demon.co.uk>
Message-ID: <6906-Tue17Jun1997095643-0400-norm@berkshire.net>

> what is the a role for LaTeX2XML (sic)?

Conversion of legacy to XML? ;-)  

> what is the role for XML2TeX?
> 
> and has anyone been developing tools in this area?

JadeTeX provides a TeX backend for XML documents.  I wrote a
suite of tools for doing SGML publishing that would do SGML (and
trivially XML) to LaTeX, but I've abandoned them in favor of
Jade.  Faster, more portable, and easier to explain ;-)

                                        Cheers,
                                          norm
-- 
Norman Walsh <nwalsh@arbortext.com> | Whatever you do may seem
Senior Application Analyst          | insignificant, but it is most
ArborText, Inc. (www.arbortext.com) | important that you do it -- Ghandi
413.549.3868 Voice/FAX              | 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From davidsch at microsoft.com  Tue Jun 17 18:33:55 1997
From: davidsch at microsoft.com (David Schach)
Date: Mon Jun  7 16:57:58 2004
Subject: XML online and Entities
Message-ID: <011290D45A8ACF119B8B00805FD471D6034D890E@RED-24-MSG.dns.microsoft.com>

I noticed that XMLToc contains unescaped &'s inside of PCData.  This is
legal SGML but prohibited in XML per section 2.4.  I think this example
shows the need to revisit the entity expansion rules in XML.

> -----Original Message-----
> From:	Jon.Bosak@Eng.Sun.COM [SMTP:Jon.Bosak@Eng.Sun.COM]
> Sent:	Monday, June 16, 1997 11:18 PM
> To:	xml-dev@ic.ac.uk
> Subject:	Reminder: XML online
> 
> I announced this about three months ago and am just repeating it for
> the benefit of anyone who recently subscribed to the list.
> 
> If you are looking for generated XML to test your XML application, you
> can find it at our Sun documentation server, docs.sun.com.  This site
> (not yet widely publicized) exists primarily to serve out HTML
> generated on the fly from our SolBook (DocBook) database of Solaris
> manuals.  To see it operating in normal mode, just point your Web
> browser at http://docs.sun.com.
> 
> In addition to its normal HTML output, our AnswerBook2 team has rigged
> docs.sun.com to generate an unsophisticated but copious alternative
> XML data stream if you know how to ask for it.
> 
> HOW TO GET XML
> 
> The AnswerBook2 (ab2) manuals on docs.sun.com are organized into
> several large categories (alluser, sysadmin, etc.) with a number of
> books in each catagory.  Thus, the Solaris Advanced User's Guide is
> referred to in URLs as /ab2/alluser/ADVOSUG.  Two forms of XML access
> are currently supported: TOCs and document chunks.  TOCs are accessed
> via the @xmlToc template, and chunks are accessed via the @xmlChunk
> template.  The @xmlToc template always shows a table of contents down
> to the chapter level, no matter what level it is invoked at.
> 
> Some examples:
> 
> 1. To get a chapter-level TOC of the entire contents of the server:
> 
>    http://docs.sun.com/ab2/@xmlToc
> 
> 2. To get a chapter-level TOC of the manuals in the alluser category:
> 
>    http://docs.sun.com/ab2/alluser/@xmlToc
> 
> 3. To get a chapter-level TOC of the Solaris Advanced User's Guide:
> 
>    http://docs.sun.com/ab2/alluser/ADVOSUG/@xmlToc
> 
> 4. To get a particular chapter from the manual (as listed in the TOC):
> 
>    http://docs.sun.com/ab2/alluser/ADVOSUG/@xmlChunk/1120
> 
> 
> Jon
> 
> ----------------------------------------------------------------------
>  Jon Bosak, Online Information Technology Architect, Sun Microsystems
> ----------------------------------------------------------------------
>      2550 Garcia Ave., MPK17-101, Mountain View, California 94043
>  Davenport Group::SGML Open::NCITS V1::ISO/IEC JTC1/SC18/WG8::W3C XML
>    If a man look sharply and attentively, he shall see Fortune; for
>    though she be blind, yet she is not invisible.  -- Francis Bacon
> ----------------------------------------------------------------------
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Tue Jun 17 18:52:29 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:58 2004
Subject: HTML2_X.DTD
Message-ID: <ZRxUZCAO+npzEwP+@light.demon.co.uk>

Hi,

I'm probably not the only person to have done this, but I had a go at
XML-izing the HTML 2.0 DTD.  Most of the job was straightforward
(although a recent exchange suggests that I would have been better
advised to leave the tag omission rules in as parameter entities!).

However, two issues that remain are the use of '&' in the content model
for <HEAD>, and the liberal use of inclusion and exclusion exceptions.

Both are invalid in XML, and neither can be trivially re-mapped to an
XML-compliant equivalent.  Is anyone else interested in this sort of
issue?  Any thoughts on how these problems should be addressed?  

I don't want to waste bandwidth by copying the whole DTD, but if anyone
wants it, I'll happily forward a copy offline.  Here are the relevant
sections:

1) This is the relevant fragment for the first issue (the '&' content
models have not been changed):

<![ %HTML.Recommended; [
        <!ENTITY % head.extra "">
]]>
<!ENTITY % head.extra "& NEXTID?">

<!ENTITY % head.content "TITLE & ISINDEX? & BASE? %head.extra;">

<!ELEMENT HEAD      (%head.content;)>

2) ... and this goes on to show a couple of the exceptions:

<!-- +(META|LINK) exception removed -->

<!-- <HEAD>     Document head   -->

<!ELEMENT TITLE      (#PCDATA)>
<!-- -(META|LINK) exception removed -->

These are the others (all of them, I think):

<!ENTITY % A.content   "(%heading;|%text;)*">

<!ELEMENT A         %A.content;>
<!-- -(A) exception removed -->

...

<!ELEMENT FORM     %body.content;>
<!-- -(FORM) +(INPUT|SELECT|TEXTAREA) exceptions removed -->

...

<!ELEMENT SELECT     (OPTION+)>
<!-- -(INPUT|SELECT|TEXTAREA) exception removed -->

...

<!ELEMENT TEXTAREA     (#PCDATA)>
<!-- -(INPUT|SELECT|TEXTAREA) exception removed. "*" removed from
content
     model. -->


Richard Light
SGML and Museum Information Consultancy
richard@light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lee at sq.com  Tue Jun 17 19:30:25 1997
From: lee at sq.com (lee@sq.com)
Date: Mon Jun  7 16:57:58 2004
Subject: HTML2_X.DTD
Message-ID: <9706171730.AA22656@sqrex.sq.com>

Richard Light <richard@light.demon.co.uk> wrote:

> 1) This is the relevant fragment for the first issue (the '&' content
> models have not been changed):
> 
> <!ELEMENT HEAD      (TITLE & ISINDEX? & BASE?)>
> <!-- +(META|LINK) exception removed -->

Well, I originally wrote the & content model for HTML 2.0.

We had to have a model that reflected the idea that you could have
* zero or one BASE
* zero or one ISINDEX
* exactly one TITLE
* any number of META elements interspersed in any order.

You could try
    META*,
    (
	(ISINDEX, META* TITLE) | (TITLE, META*, ISINDEX?)
    )
    META*

but this is ambiguous in SGML and requires lookahead, because if you get
a META after a TITLE, you don't know if there is goiong to be an ISINDEX
following.

The following might work:

    META*, (
	(ISINDEX, META* TITLE, META*) |
	(TITLE, META*, (ISINDEX, META*)?)
    )

but this doesn't allow for BASE or NEXTID.
I am not sure how to write a content model for HTML's HEAD in XML
that allows for all the things you might want to put in it.

The trouble is that & isn't a very good way to say "I want one of these
anywhere in this soup", because it can connect any two expressions of
arbitrary complexity.

The best thing to do is probably

    <!Element Head
	(TITLE|META|BASE|ISINDEX|NEXTID|....)*
    >

and require the application to do the additional checking.

Lee


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jenglish at crl.com  Tue Jun 17 21:42:45 1997
From: jenglish at crl.com (Joe English)
Date: Mon Jun  7 16:57:59 2004
Subject: HTML2_X.DTD 
In-Reply-To: <ZRxUZCAO+npzEwP+@light.demon.co.uk> 
References: <ZRxUZCAO+npzEwP+@light.demon.co.uk>  
Message-ID: <199706171927.AA04297@mail.crl.com>


Richard Light <richard@light.demon.co.uk> wrote:

> I'm probably not the only person to have done this, but I had a go at
> XML-izing the HTML 2.0 DTD. [...] 
> 
> However, two issues that remain are the use of '&' in the content model
> for <HEAD>, and the liberal use of inclusion and exclusion exceptions.
> 
> Both are invalid in XML, and neither can be trivially re-mapped to an
> XML-compliant equivalent.  Is anyone else interested in this sort of
> issue?  Any thoughts on how these problems should be addressed?  


For the HEAD content model:

	(TITLE & ISINDEX? & BASE?) +(META|LINK)

you can get rid of the inclusion exceptions by changing this to:

	( (meta|link)*, 
		(   (TITLE, 	(meta|link)*)
		  & (ISINDEX,	(meta|link)*)?
		  & (BASE,	(meta|link)*)?  ) )


then use the standard transformation on AND groups to get:


    <!ENTITY % head.misc "(META|LINK)*" >
    <!ENTITY % title 	"(TITLE, %head.misc;)">
    <!ENTITY % isindex	"(ISINDEX, %head.misc;)">
    <!ENTITY % base 	"(BASE, %head.misc;)">

    <!ELEMENT HEAD
	( %head.misc;,
	  (   (%title;,  (  (%isindex; , (%base;)?)
			  | (%base;    , (%isindex;)?))?)
	    | (%isindex;,(  (%title;   , (%base;)?)
			  | (%base;    , %title;)))
	    | (%base;,   (  (%title;   , (%isindex;)?)
			  | (%isindex; , %title;))) ) )   >


(A question of my own: Why does SP complain about e.g., "%base;?"
but not "(%base;)?"  I can't find the reason for this in the Standard.)

Addition of NEXTID, SCRIPT, and STYLE is left as an excercise to 
the reader (GAAAH!).

Or, more sensibly, you can follow Naggum's First Law of AND groups:
If the order doesn't matter, you might as well pick one and stick
with it:

    <!ELEMENT HEAD  (BASE?, TITLE, ISINDEX?, META*, LINK*) >

In this case the order does matter to some degree, since there 
are metadata schemes which require groups of METAs and LINKs
to appear in a certain order, so this is probably better:

    <!ELEMENT HEAD  (BASE?, TITLE, ISINDEX?, (META|LINK)*) >


This is stricter than HTML 2, but most HTML will need to be
modified anyway to be XMLized.

Inclusion and exclusion exceptions have to be treated on a
case-by-case basis.  The exclusion exceptions in HTML 2.0 are
used primarily to limit recursion (e.g., to make sure that an
"A" element can't appear inside another "A"), and in some cases
to undo the effects of inclusion exceptions (e.g., on TITLE and
SELECT to undo the inclusions on HEAD and FORM, respectively).

For the FORM elements you should do what HTML 3.2 does: Instead of
making (INPUT|SELECT|TEXTAREA) inclusions on the FORM element and
then excluding them from SELECT and TEXTAREA, just add them to the '%text;'
parameter entity so they can appear anywhere in content.  (That they 
must appear inside a FORM element is still enforced, but as an 
application convention rather than by the DTD).

Once the inclusions are taken care of, all the exclusions can be 
safely removed, since this yields a less restrictive DTD.


--Joe English

  jenglish@crl.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lee at sq.com  Tue Jun 17 21:57:00 1997
From: lee at sq.com (lee@sq.com)
Date: Mon Jun  7 16:57:59 2004
Subject: HTML2_X.DTD
Message-ID: <9706171956.AA26341@sqrex.sq.com>

Joe English wrote:
> Or, more sensibly, you can follow Naggum's First Law of AND groups:
> If the order doesn't matter, you might as well pick one and stick
> with it:
> 
>     <!ELEMENT HEAD  (BASE?, TITLE, ISINDEX?, META*, LINK*) >
> 
> In this case the order does matter to some degree, since there 
> are metadata schemes which require groups of METAs and LINKs
> to appear in a certain order, so this is probably better:
> 
>     <!ELEMENT HEAD  (BASE?, TITLE, ISINDEX?, (META|LINK)*) >

It turns out that some of the widely used HTML authoring tools (no,
not HoTMetaL!) automatically add one or more adverts for their
manufacturers by adding META elements, usually immediately after the
title or right before it or, in at least one case (Microsoft's) on
either side of the title.

I stand by
    <!ELEMENT HEAD  (BASED ((BASE|TITLEISINDEX|META|LINK)*) >

Lee

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dlapeyre at mulberrytech.com  Tue Jun 17 23:19:11 1997
From: dlapeyre at mulberrytech.com (Deborah Aleyne Lapeyre)
Date: Mon Jun  7 16:57:59 2004
Subject: Call for Participation SGML/XML'97
Message-ID: <v03020901afccae085520@DialupEudora>

Yet Another CALL FOR PARTICIPATION (YACP)

I beg the indulgence of this list to post this call.  Not all of you are on
the regular SGML mailing lists and I wanted to make sure the word got out.
If you've seen it, this announcement is just like all the others, delete
with my apologies.

Otherwise, please pass this announcement along to your staff, your friends,
and any mailing lists you feel might appreciate the news.

Then come to SGML/XML'97 in Washington in December and help make this year
the biggest and most exciting SGML conference ever!

--Debbie Lapeyre

-----------------------------------------------------
****** Call for Participation for SGML/XML'97 *******
-----------------------------------------------------
Soliciting presentations on SGML and XML theory, tools, techniques, and
experience for the annual SGML technical conference.

WHEN:    December 8-11, 1997

WHERE:   Sheraton Washington Hotel, Washington D.C. USA
                 (near the zoo and on Metro's Red line)

SPONSOR: Graphic Communications Association (GCA)

WHAT:    Request for proposals to speak, give a poster,
         present an evening session, or participate
         in the New Technology Nursery

HOW:     Submit proposals via HTML form at
               http://www.mulberrytech.com/sgml97
         or in SGML according to the submission DTD
         and sent via email to:
                sgml97@mulberrytech.com
         Guidelines for Submission and the DTD for are
         available by email: sgml97@mulberrytech.com
         or at http://www.mulberrytech.com/sgml97

(If you do not have access to the Web, cannot create
a proposal in SGML, or need to ftp the DTD, contact
Tommie Usdin by phone at +1 301/231-6934, or by fax
at +1 301/231-6935.)

SCHEDULE:
         Proposals Due...............30 JUN, 1997
         Speakers Notified...........30 AUG, 1997
         Preliminary Program.........15 SEPT, 1997
         Full papers due.............17 OCT, 1997
         Poster abstracts due........21 NOV, 1997

QUESTIONS: Email to sgml97@mulberrytech.com or call
           Tommie Usdin +1 301/231-6930

MORE INFORMATION: For participation details and
         current information on the conference, see
              http://www.mulberrytech.com/sgml97
         To receive an Advance Program and Registration
         Information when they are available, send
         email to sgml97@gca.org or call the Graphic
         Communications Association at +1 703/519-8160
         or 1-888-SGMLGCA (1-888/746-5422).

------------  End SGML/XML'97 Announcement -----------

=====================================================================
               SGML/XML'97 Conference Committee
Chair:     B. Tommie Usdin, Mulberry Technologies
Co-Chairs: Deborah A. Lapeyre, Mulberry Technologies
           C. M. Sperberg-McQueen, University of Illinois at Chicago
Email:     sgml97@mulberrytech.com
Phone:     301/231-6930  Fax: 301/231-6935

Registration & Vendor Information:  Marion Elledge, GCA, 703/519-8160
=====================================================================

======================================================================
Deborah A. Lapeyre                     Phone: 301/231-6933
Mulberry Technologies, Inc.            Fax: 301/231-6935
6010 Executive Blvd., Suite 608     E-mail: dalapeyre@mulberrytech.com
Rockville, MD  20852                  WWW: http://www.mulberrytech.com
======================================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at Eng.Sun.COM  Wed Jun 18 06:26:46 1997
From: Jon.Bosak at Eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:57:59 2004
Subject: XML online and Entities
In-Reply-To: <011290D45A8ACF119B8B00805FD471D6034D890E@RED-24-MSG.dns.microsoft.com> (message from David Schach on Tue, 17 Jun 1997 09:27:10 -0700)
Message-ID: <199706180425.VAA08044@boethius.eng.sun.com>

[David Schach:]

| I noticed that XMLToc contains unescaped &'s inside of PCData.  This
| is legal SGML but prohibited in XML per section 2.4.

Yup.  This is (as we say in the software business) a known bug in the
process used to compile the DynaText binaries from the DocBook SGML
source.  It will get fixed the next time the books are rebuilt, which
unfortunately may be a little while.

| I think this example shows the need to revisit the entity expansion
| rules in XML.

Maybe.  Or maybe it just means that we have to live with a slightly
buggy test bed for the moment.  The team in charge of this is in the
toils of Solaris 2.6 finalization and probably won't be able to deal
with this for the next month or so.

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From michael at textscience.com  Wed Jun 18 16:05:12 1997
From: michael at textscience.com (Michael Leventhal)
Date: Mon Jun  7 16:57:59 2004
Subject: Comercial XML editor recommendations
In-Reply-To: <7353@ursus.demon.co.uk>
Message-ID: <3.0.1.32.19970618220714.007bd7f0@aimnet.com>

Grif has announced that it will have XML-related extensions in the next
release of our HTML editor Symposia Pro and Symposia Doc+ at the end
of this month.  While Symposia is a commercial-grade product the XML
extensions are primarily designed to give our customers the opportunity
to begin experimenting with XML.  Our hope is that this will help to
give a larger audience a concrete idea of what XML is all about.
We have chosen to introduce XML in an HTML product in the same spirit,
we think, with which Yuri Rubinsky introduced SGML to HTML users in his
book "SGML on the Web".  And in the spirit of the undertaking we
cordially invite your comments on the proposed XML extensions to our
product.

1.  Read, either off the Web or locally, edit, and create well-formed
    XML documents. The DTD, if present, is not read.

    The document should use the ASCII character set and HTML
    character entities.  UTF-8 encodings above 127 will be preserved
    but may not display correctly.

2.  Save a document either as XML or HTML format with respect to
    the syntax of empty tags and other syntactical differences.
    Both types of saved documents will be ASCII with HTML character
    entities except that UTF-8 encodings that were present in the text
    as it was read in will be unchanged.

    Note that nothing prevents the user from mixing HTML and XML,
    as is currently done in many applications already on the Web.
    But the user must decide which one it is when the document
    is saved.

3.  Create new element and attribute definitions. These are
    "definitions" in the simple syntax implied by the concept
    of "well-formedness", not DTD fragments.  These definitions
    may be saved in project folders and used in documents at
    will.

4.  Add new XML elements and attributes, either from a set 
    stored in a project folder or ad-hoc.

5.  Create CSS stylesheets, and CSS definitions for any element, 
    HTML or XML.  Symposia uses CSS as its own stylesheet language
    and will display XML CSS specifications correctly.  In effect,
    it is an XML browser for the Web, albeit without certain
    functionality such as JavaScript interpretation.

We have decided not to offer XML-LINK in this version even though
we have completed an early implementation.

Michael Leventhal
______________________________________________________________________
  Michael Leventhal           Internet  : http://www.grif.fr
  G R I F , S. A.             Email     : Michael.Leventhal@grif.fr
  VP, Technology              Telephone : 510-444-2962
  1800 Lake Shore Ave Ste 14  Fax       : 510-444-1672
  Oakland, California  94606  France    : (011) 33 1 30121430 (fr US)
______________________________________________________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From davidsch at microsoft.com  Wed Jun 18 19:16:36 1997
From: davidsch at microsoft.com (David Schach)
Date: Mon Jun  7 16:57:59 2004
Subject: XML online and Entities
Message-ID: <011290D45A8ACF119B8B00805FD471D603500B30@RED-24-MSG.dns.microsoft.com>

This kind of bug will be common because SGML allows the & to be used
this way. I think this difference in entity processing unnecessarily
complicates SGML to XML conversion.

> -----Original Message-----
> From:	Jon.Bosak@Eng.Sun.COM [SMTP:Jon.Bosak@Eng.Sun.COM]
> Sent:	Tuesday, June 17, 1997 9:25 PM
> To:	xml-dev@ic.ac.uk
> Subject:	Re: XML online and Entities
> 
> [David Schach:]
> 
> | I noticed that XMLToc contains unescaped &'s inside of PCData.  This
> | is legal SGML but prohibited in XML per section 2.4.
> 
> Yup.  This is (as we say in the software business) a known bug in the
> process used to compile the DynaText binaries from the DocBook SGML
> source.  It will get fixed the next time the books are rebuilt, which
> unfortunately may be a little while.
> 
> | I think this example shows the need to revisit the entity expansion
> | rules in XML.
> 
> Maybe.  Or maybe it just means that we have to live with a slightly
> buggy test bed for the moment.  The team in charge of this is in the
> toils of Solaris 2.6 finalization and probably won't be able to deal
> with this for the next month or so.
> 
> Jon
> 
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Wed Jun 18 19:26:58 1997
From: nmikula at edu.uni-klu.ac.at (Norbert Mikula)
Date: Mon Jun  7 16:57:59 2004
Subject: XML online and Entities
References: <011290D45A8ACF119B8B00805FD471D603500B30@RED-24-MSG.dns.microsoft.com>
Message-ID: <33A81A4E.59E2@edu.uni-klu.ac.at>

David Schach wrote:
> This kind of bug will be common because SGML allows the & to be used
> this way. I think this difference in entity processing unnecessarily
> complicates SGML to XML conversion.

Rather this way, than having to deal with context
dependencies in a language :) The ERB did a great 
job when thinking about how to define the language to
ease the construction of lightweight and fast parsers 
(in a contemporary fashion).

-- 
Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Wed Jun 18 20:38:24 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:59 2004
Subject: XML online and Entities
Message-ID: <3.0.32.19970618113611.00a4de30@pop.intergate.bc.ca>

At 10:09 AM 18/06/97 -0700, David Schach wrote:
>This kind of bug will be common because SGML allows the & to be used
>this way. I think this difference in entity processing unnecessarily
>complicates SGML to XML conversion.

Yes, this kind of bug will be common.  The chance of changing this
in XMl-lang is very small.  One of the bogosities that make SGML parsers 
hard to write is that entity references are nontrivial to recognize.  When
I'm teaching XMl, it's *so nice* to be able to say: all markup without
exception starts with '<' or '&', and anything that starts with
'<' or '&', without exception, is markup.  End of story.  Users like it,
programmers like it.

So, what's the solution?  If all else fails, a postprocessor that
takes <a href="/cgi-bin/frobnisticate?a1=mona;a2=lisa;a3=overdrive">
and changes the ;'s to &'s on the way off to the server?

 -Tim

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtigue at datachannel.com  Thu Jun 19 03:28:18 1997
From: jtigue at datachannel.com (John Tigue)
Date: Mon Jun  7 16:57:59 2004
Subject: XML Java API Standardization
Message-ID: <33A88B52.9FCC78DC@datachannel.com>

Now that the number of XML processor implementations is increasing
rapidly, I would like to continue the subject of API standardization. I
have written a document which discusses the issue and presents an
informal proposal which continues the discussion of API standardization
for Java.

The document is located at:
http://www.datachannel.com/ChannelWorld/XML/dev

The first goal is to find a lowest common denominator for the current
implementations and abstract that to a set of interfaces such that a
developer could use this new API independent of an underlying
implementation of the XML processor and/or invest in learning the
particular benefits a specific implementation provides.

I hope the site will serve as a convenience to the community and I will
maintain it as a summary of what is going on in this list. Any feedback
would be greatly appreciated. This is a work in progress. The greater
the contributions, the better it will serve its purpose.

--
John Tigue
Programmer
jtigue@datachannel.com
DataChannel (http://www.datachannel.com)
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 316 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970619/b229574b/vcard.vcf
From tbray at textuality.com  Thu Jun 19 06:54:44 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
Message-ID: <3.0.32.19970618215226.00a4dde0@pop.intergate.bc.ca>

At 06:28 PM 18/06/97 -0700, John Tigue wrote:

I would like to say that modulo a few quibbles, it seems that John's
proposal is sensible.  It would send a REALLY STRONG message to the
world if all the XML parsers just happened to interoperate
effortlessly.  So I hereby commit to changing Lark's interface to
be compatible with this, after it's been kicked around here for a while
and any technical gotchas have been aired out.

I'd love to see similar commitments from the other parser builders.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Thu Jun 19 09:02:46 1997
From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
References: <33A88B52.9FCC78DC@datachannel.com>
Message-ID: <33A95563.71CA@edu.uni-klu.ac.at>

John Tigue wrote:
> 
> Now that the number of XML processor implementations is increasing
> rapidly, I would like to continue the subject of API standardization. I
> have written a document which discusses the issue and presents an
> informal proposal which continues the discussion of API standardization
> for Java.
> 
> The document is located at:
> http://www.datachannel.com/ChannelWorld/XML/dev
<snip/>

Following John's original posting and Tim's "call for 
commitments (CFC)", I first want to say that I applaud 
John for having started this initiative.

I certainly will contribute to this effort as much 
as I can. I also will be happy to modify NXP so that
it follows a *standardized* and well-designed API. 

I hereby also invite all users of NXP to express
on this list what experience they have made 
with my approach to this issue. You application
developers are the real experts on this. Please 
share with us your thoughts !

-- 
Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lex at www.copsol.com  Thu Jun 19 16:00:24 1997
From: lex at www.copsol.com (Alex Milowski)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
In-Reply-To: <33A88B52.9FCC78DC@datachannel.com> from "John Tigue" at Jun 18, 97 06:28:50 pm
Message-ID: <199706191357.IAA14480@copsol.com>

> 
> Now that the number of XML processor implementations is increasing
> rapidly, I would like to continue the subject of API standardization. I
> have written a document which discusses the issue and presents an
> informal proposal which continues the discussion of API standardization
> for Java.
> 
> The document is located at:
> http://www.datachannel.com/ChannelWorld/XML/dev
> 
> The first goal is to find a lowest common denominator for the current
> implementations and abstract that to a set of interfaces such that a
> developer could use this new API independent of an underlying
> implementation of the XML processor and/or invest in learning the
> particular benefits a specific implementation provides.
> 
> I hope the site will serve as a convenience to the community and I will
> maintain it as a summary of what is going on in this list. Any feedback
> would be greatly appreciated. This is a work in progress. The greater
> the contributions, the better it will serve its purpose.

After having read the above document, I like to say: "You missed one!"

The DSSSL Developer's Toolkit covers some of what the above document is trying
to address (actually more since it is standardizing DSSSL).  I developed this 
toolkit to be standardized and serve as a standard DSSSL API.  

The dsssl.grove package is intended to provide standardized programatic access 
to groves--the result of processing an SGML document.  IMHO, it would be ideal 
if XML processors could produce a grove that a DSSSL processor could use.

What is not contained in the current DSSSLTK distribution but will be in the
next is a standardize parser interface.  That is, access to some implementation
that can be told to parse some system identifier and produce a grove.

Also, note that in DSSSLTK there is a construct called a "Grove Constructor".
This interface provides a means for groves to be build on different 
implementation technologies and used by the same parser without changing
the interface.  It is different than the "event handler" model but it shares
some similarities.  

Essentially, the parser is abstracted from grove construction.  Hence, you can
build groves in databases as well as in-memory or whatever technology you 
choose without changing the parser.

Also, all constructs in the DSSSLTK are based on interfaces.  This allows
different inheritance hierarchies to be used within the same distribution or
for different class libraries to be mixed without getting into multiple
inheritance issues.  A node in a grove must implement two interfaces: node and 
its specific class.

For example, an Element node *must* implement the dsssl.grove.node and
dsssl.grove.Element interface.

Remember, the DSSSL standard *has* a data model for SGML that can be pruned
to provide a "lowest common denominator" data model for XML.

Full source code and javadoc are available in the DSSSLTK distribution
located at:

   http://www.copsol.com/products/

This is start at standardization for DSSSL from my point of view.  I
put this distribution together to allow others to contribute and create a 
standard API governed by some "higher body" and not Copernican Solutions or
myself.

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at Eng.Sun.COM  Thu Jun 19 16:29:59 1997
From: Jon.Bosak at Eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:00 2004
Subject: Tcl XML parser
Message-ID: <199706191427.HAA00683@boethius.eng.sun.com>

A Tcl-based package for parsing XML documents and DTDs has just been
made available.  See

   http://tcltk.anu.edu.au/XML/

Jon

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Thu Jun 19 16:57:46 1997
From: nmikula at edu.uni-klu.ac.at (Norbert Mikula)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
References: <199706191357.IAA14480@copsol.com>
Message-ID: <33A948D6.41C6@edu.uni-klu.ac.at>

Alex Milowski wrote:
> The dsssl.grove package is intended to provide standardized programatic access
> to groves--the result of processing an SGML document.  IMHO, it would be ideal
> if XML processors could produce a grove that a DSSSL processor could use.

Alex,

I certainly agree, that a (complete) grove is probably
the most powerful and complete way of accessing
a documents data.

I am not convinced however, that it is always necessary
to built a grove.

My view on this is :

-----------------------------------------------
-               application                   -
----------------------------------            -
-   grove/tree builder           -            -
-----------------------------------------------
-  event stream (Esis++)                      -
----------------------------------      NXP   -     
-     core parser                             -
-----------------------------------------------

You can always built a more powerful layer
on top of an event stream. Furthermore we
should also consider the work of the DOM
group. Their results will have a considerable
impact on our work as well.

If we can provide a flexible low level
layer, we can always add more fancy and
specialized post-processors on top
of it.

-- 
Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lex at www.copsol.com  Thu Jun 19 17:57:04 1997
From: lex at www.copsol.com (Alex Milowski)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
In-Reply-To: <33A948D6.41C6@edu.uni-klu.ac.at> from "Norbert Mikula" at Jun 19, 97 04:57:26 pm
Message-ID: <199706191554.KAA14554@copsol.com>

> Alex Milowski wrote:
> > The dsssl.grove package is intended to provide standardized programatic access
> > to groves--the result of processing an SGML document.  IMHO, it would be ideal
> > if XML processors could produce a grove that a DSSSL processor could use.
> 
> Alex,
> 
> I certainly agree, that a (complete) grove is probably
> the most powerful and complete way of accessing
> a documents data.
> 
> I am not convinced however, that it is always necessary
> to built a grove.

In my experience, the need to have a grove is often more the case then the
need to have an event stream.  I rarely have built SGML applications where
a grove did not simplify the processing.  Event streams are good for
extracting simple information or processing documents in a linear.  I like
to view a document as a data structure that I can manipulate.

I would be open to having a two-tiered API where there was an event oriented
API.  In fact, the GroveConstructor interface could be considered to be
this kind of low level API.

I we don't standardize grove access, we will all have to build our own
grove implementations at some point in time.

In addition, can you imagine the possibilities if simple applets could
turn around to a server, load a grove, and receive structured information
rather than name value pairs?  We need to address issues beyond "quick
browsing/processing" in a standardized API.

So, essentially, I agree.  It is not always necessary to have a grove.  It
a complex application, it is most certainly necessary.  Hence, we should
standardize that as well.

> 
> My view on this is :
> 
> -----------------------------------------------
> -               application                   -
> ----------------------------------            -
> -   grove/tree builder           -            -
> -----------------------------------------------
> -  event stream (Esis++)                      -
> ----------------------------------      NXP   -     
> -     core parser                             -
> -----------------------------------------------

I can envision a similar but more complete structure:

     DSSSL
  Application
----------------
-  DSSSL API   -      Complex Application
------------------------------------------------
-              Standard Grove API              -
------------------------------------------------
-             Grove Implementation             -
-           (Implementation dependent)         -
------------------------------------------------
-                  Grove Builder API           -  Simple Application
---------------------------------------------------------------------
-                            Event Stream API                       -
---------------------------------------------------------------------
-                               Parser API                          -
---------------------------------------------------------------------
-                           Parser Implementation                   -
-                         (Implementation dependent)                -
---------------------------------------------------------------------

The DSSSLTK covers most of the above with the except of an event oriented
API.  The GroveConstructor interface is really the Grove Builder API in
the above diagram.

Hence, what should be standardized is:

* Parser API
* Event Stream API
* Grove Builder API
* Standard Grove API
* DSSSL API

> You can always built a more powerful layer
> on top of an event stream. Furthermore we
> should also consider the work of the DOM
> group. Their results will have a considerable
> impact on our work as well.

Yes, but only if we properly componentize the APIs.

I'm not so certain about DOM.  It would be nice if it was a little more
open of a process.  DOM essentially means grove to me.

> If we can provide a flexible low level
> layer, we can always add more fancy and
> specialized post-processors on top
> of it.

Yes, but I want to standardize the "specialized" processors.

For example, consider the situation where as an application developer
you could be assured that a grove implementation is available in a 
browser framework.  In that situation, you could deliver an applet (or
whatever) that loaded a grove but relied on the browser to provide the
infrastructure for "knowing" how to load/store/etc. a grove.

We are developing and standardizing infrastructure as well as APIs.

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ebaatz at barbaresco.East.Sun.COM  Thu Jun 19 17:57:47 1997
From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
Message-ID: <libSDtMail.9706191156.20772.ebaatz@barbaresco>

Alex Milowski wrote:

> ...it would be ideal if XML processors could produce a grove...

Alex,

Do you mean that the only output of an XML parser would be a grove?

My use of XML is very lightweight and, from my position of minimal
knowledge about groves, seems like I would have to pay some
price in processing time or system resources for an XML parser to
produce a grove for one of my "documents" when some very simple
output would do.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From peter at techno.com  Thu Jun 19 18:24:06 1997
From: peter at techno.com (Peter Newcomb)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
Message-ID: <199706191620.MAA14915@exocomp.techno.com>

> Date: Thu, 19 Jun 1997 16:57:26 +0200
> From: Norbert Mikula <nmikula@edu.uni-klu.ac.at>
> 
> Alex Milowski wrote:
> > The dsssl.grove package is intended to provide standardized programatic access
> > to groves--the result of processing an SGML document.  IMHO, it would be ideal
> > if XML processors could produce a grove that a DSSSL processor could use.
> 
> I certainly agree, that a (complete) grove is probably
> the most powerful and complete way of accessing
> a documents data.
> 
> I am not convinced however, that it is always necessary
> to built a grove.

  [snip]

> You can always built a more powerful layer
> on top of an event stream. Furthermore we
> should also consider the work of the DOM
> group. Their results will have a considerable
> impact on our work as well.
> 
> If we can provide a flexible low level
> layer, we can always add more fancy and
> specialized post-processors on top
> of it.

I believe it it is important not only to design the low-level
interface such that a grove (or other-high level interface) can be
implemented on top of it, but also to design the low-level interface
such that _it_ (at least the relevant portions of it: i.e. the event
stream and associated classes) can be implemented on top of a grove
interface.

Another concern I have is that the terminology used for the two
interfaces (low and high) be consistent.  A programmer who learns one
interface should not have to learn a different vocabulary in order to
use the other.  This is also true across languages: a person using
an XML parser in Java should not have to learn a different vocabulary
in order to use an XML parser from C++ or Perl.

As the SGML property set has already been published (in DSSSL, and
soon in the HyTime 2nd Edition) and is in use, I suggest that it be
used as a terminology reference for new SGML and XML interface
design.

-peter

--
Peter Newcomb                           TechnoTeacher, Inc.
233 Spruce Avenue                       P.O. Box 23795
Rochester, NY 14611-4041 USA            Rochester, New York 14692-3795 USA
+1 716 529 4303 (home)                  +1 716 464 8696 (direct)
+1 716 755 8698 (cell)                  +1 716 271 0796 (main)
+1 716 529 4304 (fax)                   +1 716 271 0129 (fax)
peter@petes-house.rochester.ny.us       peter@techno.com
http://www.petes-house.rochester.ny.us  http://www.techno.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ebaatz at barbaresco.East.Sun.COM  Thu Jun 19 18:57:52 1997
From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
Message-ID: <libSDtMail.9706191256.30200.ebaatz@barbaresco>

John Tigue,

Although it isn't part of the abstract XML parsing issue, I was
struck by your proposal's use of streams for input.  Given the
Unicode-ness of Java and XML's explicit support for multiple
character encodings, the JDK 1.1 Reader class seems like a
perfect fit for XML parsers.  They are more efficient than byte
streams, and they handle conversion between local encodings
and Unicode.  With streams, it seems like your document also
needs to proscribe how Unicode characters (or other multi-byte
encodings) are encoded in byte streams or how to identify the
encoding of a stream, so I can use the same input for different
parsers.

P.S.  Yes, my current XML "documents" include characters outside
of Latin-1, so I have to convert them before passing them through
the parser I've been using, NXP.


Eric Baatz
Sun Microsystems Laboratories
2 Elizabeth Drive, MS UCHL03-207                 (508) 442-0257
Chelmsford, MA 01824                        fax: (508) 250-5067
USA                                    Internet: eric.baatz@east.sun.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lex at www.copsol.com  Thu Jun 19 19:21:55 1997
From: lex at www.copsol.com (Alex Milowski)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
In-Reply-To: <libSDtMail.9706191156.20772.ebaatz@barbaresco> from "Eric Baatz - Sun Microsystems Labs BOS" at Jun 19, 97 11:56:09 am
Message-ID: <199706191719.MAA14647@copsol.com>

> Alex Milowski wrote:
> 
> > ...it would be ideal if XML processors could produce a grove...
> 
> Alex,
> 
> Do you mean that the only output of an XML parser would be a grove?

Well, no, not necessarily.  I think that such an API standardization should
also standardize grove production/use.  I would like to be able to 
guarantee that any conformant XML/Java/API environment is able to produce
groves if I *need* them.

> My use of XML is very lightweight and, from my position of minimal
> knowledge about groves, seems like I would have to pay some
> price in processing time or system resources for an XML parser to
> produce a grove for one of my "documents" when some very simple
> output would do.

Yes, you pay *some* price.  There is a point in which the grove-based
processing paradigm is far more efficient than event oriented for more complex
tasks.  The definition of "more complex" isn't that big of a leap.  Simply
put:  If you want to do *any* non-linear processing of XML, you are going to
find groves *far* easier and potentially, with SDQL (Standard Document
Query Language -- from DSSSL), it may be more efficient than building
ancillary data structures in addition to the events being received.

In a previous e-mail, I detailed an API architecture that I think would
work.  Essentially, it is DSSSLTK with another couple of APIs on the
bottom of the stack.  In my development, I made the design decision that 
groves were what I needed to standardize since everything in DSSSL is groves.
I'm certain willing to add to this and standardize everything before that
as well.

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Thu Jun 19 19:38:56 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
Message-ID: <3.0.32.19970619103633.00886180@pop.intergate.bc.ca>

If you want a full-featured API that is going to interoperate for
SGML and XML docs as well, the grove is the only way to go, so there
is no need to have this discussion here on that subject.  What we're
trying to do is, specifically for the case of Java XML processors, which
evidence would suggest are going to be large in number and relatively
lightweight, is simply to give them some shared machinery as regards
elements and attributes.

For this kind of purpose, I think the grove formalism is massive 
overkill; right now people can whip off XML parsers in a week, if
we require them to master grove plans and property sets and so on,
we're tripling the amount of time that has to be invested.

At 12:20 PM 19/06/97 -0400, Peter Newcomb wrote:
>As the SGML property set has already been published (in DSSSL, and
>soon in the HyTime 2nd Edition) and is in use, I suggest that it be
>used as a terminology reference for new SGML and XML interface
>design.

This is part of the problem; last time I looked, the SGML property
set was over 75 pages in length, and most of what it contains is
just not interesting for XML parsers.  

If we could just agree, specifically for Java, how to talk to a
few basic things (Element, Attribute, etc), this would be a huge
step forward. -Tim


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lex at www.copsol.com  Thu Jun 19 20:02:55 1997
From: lex at www.copsol.com (Alex Milowski)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
In-Reply-To: <3.0.32.19970619103633.00886180@pop.intergate.bc.ca> from "Tim Bray" at Jun 19, 97 10:36:42 am
Message-ID: <199706191800.NAA14675@copsol.com>

Ok, I'm going to write about the "vision" thing... so you have been warned! ;-)
 
> If you want a full-featured API that is going to interoperate for
> SGML and XML docs as well, the grove is the only way to go, so there
> is no need to have this discussion here on that subject.  What we're
> trying to do is, specifically for the case of Java XML processors, which
> evidence would suggest are going to be large in number and relatively
> lightweight, is simply to give them some shared machinery as regards
> elements and attributes.
> 
> For this kind of purpose, I think the grove formalism is massive 
> overkill; right now people can whip off XML parsers in a week, if
> we require them to master grove plans and property sets and so on,
> we're tripling the amount of time that has to be invested.

Agreed.  My real point is that we have to have a vision for where such
APIs are going.  The absolute *last* thing I want to have happen is to get
a low-level parser/event API and not be able to implement the more basic
grove on top of that.  Hence we need a vision of where such API are going
and what they will grow into.

I see a parser and event API as being the foundation of a much larger set of
APIs for XML, SGML, and DSSSL.

In light of this, here are some of my requirements:

1. The API should be componentized such that parser access and configuration
   is separated by event delivery and use.

2. Event APIs should be constructed in a way such that new properties of
   events and new events can be delivered within the same interface.  This
   will allow support of additional grove plans within the same interface.

3. There is a minimal set of grove plans from a DSSSL perspective that we 
   should conform to.  (I have a good idea of what these grove plans are but 
   I don't have the DSSSL spec in-front of me).  These grove plans will help
   define what events to deliver and what properties the events should have.

Suggestions:

1. Interfaces (sub-typing) is a preferred way to deliver such APIs.  We do
   not want to enforce an inheritance hierarchy.   Also, interfaces can easily 
   be made cross-language.

2. We should define the APIs within a reference architecture(s) rather than 
   just focusing on the communication between a parser and an arbitrary 
   application.  By using many common architectures we can understand the
   use-case scenarios for the API.  This is a similar exercise the the
   CRC cards in object-oriented design.

   
> At 12:20 PM 19/06/97 -0400, Peter Newcomb wrote:
> >As the SGML property set has already been published (in DSSSL, and
> >soon in the HyTime 2nd Edition) and is in use, I suggest that it be
> >used as a terminology reference for new SGML and XML interface
> >design.
> 
> This is part of the problem; last time I looked, the SGML property
> set was over 75 pages in length, and most of what it contains is
> just not interesting for XML parsers.  
> 
> If we could just agree, specifically for Java, how to talk to a
> few basic things (Element, Attribute, etc), this would be a huge
> step forward. -Tim

Yes, but we should start with the DSSSL specification.  Not to mention this 
*yet* again today, but the DSSSLTK implements about five grove plans.  I'll get
the list tomorrow when I have the reference information on hand and post
it here.

If we spend the time working from the DSSSL grove specification, we can 
ensure grove production.

If you want/need a more readable grove specification, try the grove guide
that I built.  The HTML version is at:

http://www.copsol.com/sgmlimpl/standards/gguide.html

and more generally at:

http://www.copsol.com/sgmlimpl/standards/

The grove guide re-orients the SGML property set from DSSSL in the 
opposite way that it is specified.  In the DSSSL standard, each grove
plan is listed and within the grove plan either new classes are defined
or properties are added to previously defined classes.  In the grove
guide, each class is defined and the properties are listed by grove
plan.

What we design and engineer today and label a standard may stick around longer
than we expect.  We shouldn't take too minimalist of an approach.  My
compromise is for a *reasonable* solution that has growth of the API
designed into it.

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Thu Jun 19 23:52:00 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:00 2004
Subject: XML Java API Standardization
Message-ID: <8169@ursus.demon.co.uk>

In message <33A88B52.9FCC78DC@datachannel.com> jtigue@datachannel.com (John Tigue) writes:
[...]
> 
> Now that the number of XML processor implementations is increasing
> rapidly, I would like to continue the subject of API standardization. I
> have written a document which discusses the issue and presents an
> informal proposal which continues the discussion of API standardization
> for Java.
> 
> The document is located at:
> http://www.datachannel.com/ChannelWorld/XML/dev

This is a really first-class approach to the subject and I welcome it.
John has taken the time to summarise 4 parsers including his/datachannel's own
and this is an excellent starting point.  As an 'xmlProcessorConsumer' JUMBO
will adopt this approach as soon as I work it out.

> 
> The first goal is to find a lowest common denominator for the current
> implementations and abstract that to a set of interfaces such that a
> developer could use this new API independent of an underlying
> implementation of the XML processor and/or invest in learning the
> particular benefits a specific implementation provides.
> 
> I hope the site will serve as a convenience to the community and I will
> maintain it as a summary of what is going on in this list. Any feedback
> would be greatly appreciated. This is a work in progress. The greater
> the contributions, the better it will serve its purpose.

This is really great.  I'm in a rush, but at this stage
standard terminology for the XML-related terms (Element, Attribute, etc.) and
standard terminology for the Java-related stuff (Strea, Factory) etc.
is exactly what is required.

	P.


> 
> --
> John Tigue
> Programmer
> jtigue@datachannel.com
> DataChannel (http://www.datachannel.com)
> 206-462-1999
> 
> 
> --------------C967F0FA5C31930ED3CBF135
> Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf"
> Content-Transfer-Encoding: 7bit
> Content-Description: Card for John Tigue
> Content-Disposition: attachment; filename="vcard.vcf"
> 
> begin:          vcard
> fn:             John Tigue
> n:              Tigue;John
> org:            Datachannel
> adr:            10020 Main St.;;#205;Bellevue;WA;98004;USA
> email;internet: jtigue@datachannel.com
> tel;work:       462-1999
> tel;home:       498-4708
> x-mozilla-cpt:  ;0
> x-mozilla-html: FALSE
> end:            vcard
> 
> 
> --------------C967F0FA5C31930ED3CBF135--
> 
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From peter at techno.com  Fri Jun 20 00:53:59 1997
From: peter at techno.com (Peter Newcomb)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Java API Standardization
In-Reply-To: <3.0.32.19970619103633.00886180@pop.intergate.bc.ca> (message
	from Tim Bray on Thu, 19 Jun 1997 10:36:42 -0700)
Message-ID: <199706192250.SAA15086@exocomp.techno.com>

> For this kind of purpose, I think the grove formalism is massive 
> overkill; right now people can whip off XML parsers in a week, if
> we require them to master grove plans and property sets and so on,
> we're tripling the amount of time that has to be invested.

I am not suggesting that writers of parsers learn or implement
anything about groves.  I am suggesting that the writers of the
standard XML parser interface should learn and use the names defined
by the SGML property set for those things (i.e. elements, attributes,
etc.) that XML and SGML have in common.

> At 12:20 PM 19/06/97 -0400, Peter Newcomb wrote:
> >As the SGML property set has already been published (in DSSSL, and
> >soon in the HyTime 2nd Edition) and is in use, I suggest that it be
> >used as a terminology reference for new SGML and XML interface
> >design.
> 
> This is part of the problem; last time I looked, the SGML property
> set was over 75 pages in length, and most of what it contains is
> just not interesting for XML parsers.  

The SGML property set source (the 75+ pages of SGML) is best read by a
machine and formatted for human consumption.  Alex's grove guide is an
example of this.

Also, the parts of the SGML property set that do not apply to XML
parsers are easily pruned.  A grove plan that specifies this pruning
is in the works, but as a start, try ignoring everything after the
first three modules (baseabs, prlgabs0, and instabs).

I've temporarily created a browseable rendition of these modules (and
only these modules) at "http://www.techno.com/~peter/sgml-esis/".  The
HTML generation software I used to do this is not quite done yet, but
I hope these pages will be useful anyway.  The most notable problem is
that I have not yet written the code to generate descriptive pages for
modules.

(If there are other problems, or you have suggestions on how to
improve the format or enhance its usefulness, please tell me; I'll be
using this software to produce browseable renditions of the complete
SGML and HyTime property sets for the upcoming HyTime user's group
site.)

-peter

--
Peter Newcomb                           TechnoTeacher, Inc.
233 Spruce Avenue                       P.O. Box 23795
Rochester, NY 14611-4041 USA            Rochester, New York 14692-3795 USA
+1 716 529 4303 (home)                  +1 716 464 8696 (direct)
+1 716 755 8698 (cell)                  +1 716 271 0796 (main)
+1 716 529 4304 (fax)                   +1 716 271 0129 (fax)
peter@petes-house.rochester.ny.us       peter@techno.com
http://www.petes-house.rochester.ny.us  http://www.techno.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Jun 20 01:58:08 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Java API Standardization
Message-ID: <8223@ursus.demon.co.uk>

In message <199706191800.NAA14675@copsol.com> lex@www.copsol.com (Alex Milowski) writes:
> Ok, I'm going to write about the "vision" thing... so you have been warned! ;-)

[... lots of other contributions and the
    vision thing read, hopefully understood, and snipped...]

> 
> What we design and engineer today and label a standard may stick around longer
> than we expect.  We shouldn't take too minimalist of an approach.  My
> compromise is for a *reasonable* solution that has growth of the API
> designed into it.

We have had a previous discussion on this list (ca. 2+ months ago) and we got
quite close to getting an API and then everyone went off to other things.  It's
even more urgent now, because if we don't close on something, then in 2 more 
months there will be 16 incompatible parsers ... [a tcl one was anounced today,
and we can assume there are others which don't come near XML-DEV...]

It seems as if out of the spectrum of possibilities at one end is a 
'golden grove' solution where every possible property set, etc. is included.
And at the other the reasonable minimum is close to what John has put up.  There
is also something 'in the middle' which may be more difficult to hit precisely.

I do not want to have a say in this (I don't even know what a grove *is* - 
even after having had it explained more than once), but I'll try to work
with whatever comes out.  However, I think we ought to aim for something within
the next few days or we run the risk of losing momentum again.  

I am particularly impressed by the spirit of collaboration in this discussion
and the willingness of current authors to recraft their code (OK they have
to do it again on July 1 anyway :-).  If we can agree where in the spectrum
we wish to end up (there could be more than one place, as suggested), then
it may take a few days to flesh out the details.  Would it be reasonable to aim
for having something roughly concurrent with July 1??

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eliot at isogen.com  Fri Jun 20 04:18:06 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Java API Standardization
Message-ID: <3.0.32.19970619210506.00b4cefc@swbell.net>

At 11:49 PM 6/19/97 GMT, Peter Murray-Rust wrote:
>I do not want to have a say in this (I don't even know what a grove *is* - 
>even after having had it explained more than once), 

A grove is nothing more than a directed graph of objects whose classes and
properties are formally defined in a "property set", where a "property set"
is nothing more than an object schema definition defined according the
(small set of) rules defined in the Property Set Definition Requirements
annex of the (very soon to be released) HyTime standard (Second Edition).

The only thing that distinguishes a grove from any other graph-based object
representation is a few unique object characteristics that happen to make
representing SGML documents a lot easier.

So saying "you should have a grove" is really saying "you should make your
in-memory data structures follow the object schema defined by the SGML
property set."  There's really not that much to it.  There's no reason to
duplicate the person years of work that have gone in to defining the SGML
property set, unless you enjoy the exercise of beating your head against
that wall.

The *only* reason groves are called groves, instead of "the directed graph,
in-memory representation of a parsed SGML document" is that we didn't want
to have to keep saying the latter.

If it helps, substitute "parse tree" for "grove" and you'll be close enough
to the truth that it won't matter for the purpose of discussion.

Cheers,

E.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From akirkpatrick at ims-global.com  Fri Jun 20 14:15:19 1997
From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Java API Standardization
Message-ID: <E0wf2aw-0000yO-00@punch.ic.ac.uk>

Tim,

For this kind of purpose, I think the grove formalism is massive
overkill; right now people can whip off XML parsers in a week, if
we require them to master grove plans and property sets and so on,
we're tripling the amount of time that has to be invested.

I think it's true to say that soon all SGML/XML applications will be
working on the parse tree (or grove) rather than the raw events.
With this in mind, I would be quite willing to wait two weeks longer
for a good XML grove API (after all, we've been waiting years for
SGML tools :)

If nothing comes up, I will have to write my own "parse tree builder"
and you can bet it won't be compatible with anyone else's, beyond
the simple notions of "element" and "tree".

This is part of the problem; last time I looked, the SGML property
set was over 75 pages in length, and most of what it contains is
just not interesting for XML parsers.

As someone's already said, we need to define the reduced property
set for XML and make it easier to understand.

If we could just agree, specifically for Java, how to talk to a
few basic things (Element, Attribute, etc), this would be a huge
step forward. -Tim

I don't agree. XML isn't just a "Web thing". It has the potential to
change the way applications communicate and store information.
The XML is actually unimportant, it is the structure (represented
in memory by the parse tree) which counts. We need common
APIs to manipulate and query the structure.

An interesting discussion either way...
Alfie.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ddb at criinc.com  Sat Jun 21 01:35:51 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Java API Standardization
Message-ID: <3.0.32.19970620163449.00a5bda0@mailhost.criinc.com>

>I've temporarily created a browseable rendition of these modules (and
>only these modules) at "http://www.techno.com/~peter/sgml-esis/".  The
>HTML generation software I used to do this is not quite done yet, but
>I hope these pages will be useful anyway.  The most notable problem is
>that I have not yet written the code to generate descriptive pages for
>modules.

sounds familiar.

-derek

--------------------------------------------------------------
ddb@criinc.com || software-engineer || www/sgml/java/perl/etc.
  "Just go that way, really fast. When something gets 
      in your way, turn."  --  _Better_Off_Dead_

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From peat at erols.com  Sat Jun 21 15:10:54 1997
From: peat at erols.com (Peat)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Java API Standardization
Message-ID: <199706211310.JAA17653@smtp2.erols.com>

If the document is very large, and the parser is required to maintain the
grove, we would then require the parser to also then include some type of
defined memory management.  Can this be a problem, where different parsers
implement resource management differently?

I would think if this burden is on the application layer, then knowledge of
the application can be used to optimize resources.

Grove standardization is a good idea.  Any ideas on how the grove
standardization can be implemented up one layer?

- Bruce Peat
  peat@erols.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat Jun 21 17:59:05 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Java API Standardization
Message-ID: <8335@ursus.demon.co.uk>

In message <199706211310.JAA17653@smtp2.erols.com> "Peat" writes:
> If the document is very large, and the parser is required to maintain the
> grove, we would then require the parser to also then include some type of
> defined memory management.  Can this be a problem, where different parsers
> implement resource management differently?

This is an important point and one which I've been conscious of but ignored so
far.  JUMBO is quite large (with all the MOL classes in there's about half a 
megabyte of classes and I have had outOfmem failures with large files (ca.
1 Mbyte legacy input and translation into a tree).  I don't know whether there 
is  a generic solution to this.  I tried to run the garbage collector (JDK1.02)
occasionally and this helps, but since parser and browser and document all have
to be in memory then large docs are a problem.

Presumably in an application subtrees can be saved to disk (serialized?)
> 
> I would think if this burden is on the application layer, then knowledge of
> the application can be used to optimize resources.

I would think that if the author uses entities, then knowledge of the entity
structure would help.  In the browser the entities could be treated as 
'pointers' and resolved only when required.
> 
> Grove standardization is a good idea.  Any ideas on how the grove
> standardization can be implemented up one layer?
                                     ^^  ???  ^^^

Again, I reiterate that I'd like to see something concrete in a few days and
not to lose the momentum again.  

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From clloyd at gorge.net  Sat Jun 21 20:01:28 1997
From: clloyd at gorge.net (Chris Lloyd)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Java API - An Idea(*)
Message-ID: <01BC7E32.22CDABA0@chaosmobile.com.chaos>


-----Original Message-----
From:	Peter Murray-Rust [SMTP:Peter@ursus.demon.co.uk]
Sent:	Saturday, June 21, 1997 11:31 AM
To:	xml-dev@ic.ac.uk
Subject:	Re: XML Java API Standardization

In message <199706211310.JAA17653@smtp2.erols.com> "Peat" writes:
> If the document is very large, and the parser is required to maintain the
> grove, we would then require the parser to also then include some type of
> defined memory management.  Can this be a problem, where different parsers
> implement resource management differently?

Memory management issues shouldn't be an issue in the API standardization. If you are using a parser that cannot serialize the tree, then you are certainly going to be limited by memory. If you are using an object database to implement the grove, then you don't have size limitations but speed may become an issue.

This is an important point and one which I've been conscious of but ignored so
far.  JUMBO is quite large (with all the MOL classes in there's about half a 
megabyte of classes and I have had outOfmem failures with large files (ca.
1 Mbyte legacy input and translation into a tree).  I don't know whether there 
is  a generic solution to this.  I tried to run the garbage collector (JDK1.02)
occasionally and this helps, but since parser and browser and document all have
to be in memory then large docs are a problem.

Presumably in an application subtrees can be saved to disk (serialized?)
> 
> I would think if this burden is on the application layer, then knowledge of
> the application can be used to optimize resources.

I would think that if the author uses entities, then knowledge of the entity
structure would help.  In the browser the entities could be treated as 
'pointers' and resolved only when required.

Yes this is how other groves have been implemented

> 
> Grove standardization is a good idea.  Any ideas on how the grove
> standardization can be implemented up one layer?
                                     ^^  ???  ^^^


I'm just entering this thread so I don't know what solutions have been discussed. There is already an API to draw from in the DSSSL spec and a definition of the SGML property set which gives us a common language to work from. The problem is that an XML API to a grove should be simple with a small interface and should leverage the object-oriented power and syntax of Java.

Personally, when working with groves I find some abstractions very useful in an API. I would rather have an API based on iterators than one based on a set of navigation function calls.  I'm talking about navigating the grove rather than building the grove. An iterator API would be extremely simple, well abstracted and more inline with patterns of C++ and Java programming than the SDQL API found in DSSSL. They could also maintain an adherence to the syntax of the SGML property set.

Here is an example although my naming syntax probably does not correspond to the SGML property set here.

// Assuming we have a object provided by the parser that is a grove, instantiate an iterator and navigate to the first element that is a TITLE tag

// A Factory is an object that defines what SGML/XML constructs the iterator knows how to iterate. It provides the grove iterator with a different node iterator for each property node that it knows how to walk.

ForwardGroveIterator XMLIter(OurGrove, XMLPropertySetFactory(), StartNodePropertyHandle);

While(XMLIter++ != XMLIter.end())
{
	XMLBaseProperty Prop = XMLIter.Object(); // in C++ we would use the dereference operator like this XMLBaseProperty Prop = *XMLIter;
If (Prop.GetClass() == Element.Class) // is this an element?
{
Element aElement = Prop; // lets convert the property from a base class object to it's concrete class 
// Now we have an element object and can call all it's member functions
		if (Element.GetIdent() == String("TITLE"))
		break;
}
}

// OK lets instantiate a new iterator to walk back up to the root of the grove
// use the copy constructor to produce a reverse iterator from our x and functions of individual properties in the grove. Hence we can use the SGML property set or another property set with the same code.
6.) Iterators work well in different memory models and garbage collection schemes.
7.) Iterators, Factories, and Algorithmns can be combined in very powerful and flexible ways.
8.) Finally, Iterators are fun!!

Chris Lloyd
clloyd@gorge.net

Again, I reiterate that I'd like to see something concrete in a few days and
not to lose the momentum again.  

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtauber at jtauber.com  Sat Jun 21 20:49:36 1997
From: jtauber at jtauber.com (James K. Tauber)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Property Set
Message-ID: <01BC7EB7.10467860.jtauber@jtauber.com>


About a month ago I started making a list of those classes and properties 
from the SGML Property Set that were appropriate to XML. I got through 
about half of the classes the first night and then didn't touch it until 
now.

With all this talk about XML groves, it is worth me finishing off the list?

James K. Tauber


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat Jun 21 21:10:10 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Property Set
Message-ID: <8346@ursus.demon.co.uk>

In message <01BC7EB7.10467860.jtauber@jtauber.com> "James K. Tauber" writes:
> 
> About a month ago I started making a list of those classes and properties 
> from the SGML Property Set that were appropriate to XML. I got through 
> about half of the classes the first night and then didn't touch it until 
> now.
> 
> With all this talk about XML groves, it is worth me finishing off the list?

In my grove-illiterate opinion, yes!  The PropertySet is a sword of Damocles
hanging over these discussions.  It's clear that we can't have all 70+ 
properties.  IF (and I hope it's not a big IF) we can agree on a subset
of the property set then we don't have this problem dissipating the discussion
every time we get close :-)

James Clark came up with a grove subset about 3 months back (have a look in
March xml-dev) in response to one of my typical blunderings for information.  
It looked simple (I can't tell if it was comprehensive) and imagine it
is fairly close to what we require. Unfortunately no-one seemed to take it
up.  So do JamesC and JamesT converge on a common solution?  If they do, why 
not freeze this as an alpha version of the propertySubSet...??

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtauber at jtauber.com  Sat Jun 21 22:17:15 1997
From: jtauber at jtauber.com (James K. Tauber)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Property Set
Message-ID: <01BC7EC3.70CDF6C0.jtauber@jtauber.com>


On Saturday, June 21, 1997 1:01 PM, Peter Murray-Rust 
[SMTP:Peter@ursus.demon.co.uk] wrote:
> In my grove-illiterate opinion, yes!  The PropertySet is a sword of Damocles
> hanging over these discussions.  It's clear that we can't have all 70+
> properties.  IF (and I hope it's not a big IF) we can agree on a subset
> of the property set then we don't have this problem dissipating the
> discussion every time we get close :-)

It shouldn't be a big IF at all. Deciding what to rip out isn't too difficult. 
The only problem lies in agreeing on how to do the additional classes (like 
XMLDECL) needed and how (or if) the properties should be modularised.

> James Clark came up with a grove subset about 3 months back (have a look in
> March xml-dev) in response to one of my typical blunderings for information.

I'll go back and check that. JamesC would be in a MUCH better position to write 
an XML property set than me!

James 'the other James' Tauber :-)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From clloyd at gorge.net  Sat Jun 21 22:38:32 1997
From: clloyd at gorge.net (Chris Lloyd)
Date: Mon Jun  7 16:58:01 2004
Subject: REPOST: XML Java API-an idea
Message-ID: <01BC7E48.096832A0@chaosmobile.com.chaos>

This is a repost because some of the original post was clipped.

I'm just entering this thread so I don't know what solutions have been discussed. 
There is already an API to draw from in the DSSSL spec and a definition of the 
SGML property set which gives us a common language to work from. 
The problem is that an XML API to a grove should be simple with a 
small interface and should leverage the object-oriented power and syntax of Java.

Personally, when working with groves I find some abstractions very useful in an API. 
I would rather have an API based on iterators than one based on a set of navigation function calls.  I'm talking about navigating the grove rather than building the grove. An iterator API would be extremely simple, well abstracted and more inline with patterns of C++ and Java programming than the SDQL API found in DSSSL. They could also maintain an adherence to the syntax of the SGML property set.

Here is an example although my naming syntax probably does not correspond to 
the SGML property set here.

// Assuming we have a object provided by the parser that is a grove, 
instantiate an iterator and navigate to the first element that is a TITLE tag

// A Factory is an object that defines what SGML/XML constructs the iterator 
knows how to iterate. It provides the grove iterator with a different node 
iterator for each property node that it knows how to walk.

ForwardGroveIterator XMLIter(OurGrove, XMLPropertySetFactory(), StartNodePropertyHandle);

While(XMLIter++ != XMLIter.end())
{
	XMLBaseProperty Prop = XMLIter.Object(); // in C++ we would use the dereference operator like this XMLBaseProperty Prop = *XMLIter;
	If (Prop.GetClass() == Element.Class) // is this an element?
	{
		Element aElement = Prop; // lets convert the property from a base class object to it's concrete class 
		// Now we have an element object and can call all it's member functions
		if (Element.GetIdent() == String("TITLE"))
			break;
	}
}

// OK lets instantiate a new iterator to walk back up to the root of the grove
// use the copy constructor to produce a reverse iterator from our forward iterator
ReverseGroveIterator XMLReverseIter(XMLIter);

While(XMLReverseIter++ != XMLReverseIter.end())
{
	// do stuff here
}


The navigation itself is not the same as defined in SDQL but the property set 
could be made to conform to the SGML property set. This might offer a compromise. 

The factory concept is very powerful because extending an iterator is as simple 
as adding a new factory class and a nodeiterator class for each new property 
being added to the grove. If someone wanted to inherit from the XML property set 
and put metadata in their grove, they could easily extend the functionality 
of the base iterators to support their new properties. 
Because the iterator class has a small interface, It's easy to plug and play 
new iterators into existing code. You can read more about iterators and 
factories in Design Patterns, Addison Wesley, Gamma, Helm, Johnson, Vlissides.

Once we have the appropriate iterators then we can create an API of Functions 
and Algorthimns maybe based on SDQL that can do higher-level operations like this

// Find the first parent object that is an element
Algorithmn::find( ReverseIter, classid<Element>()); // C++ sytax with templates
Algorithmn::find( ReverseIter, classid(ELEMENT)); // Java sytax without templates

// Find the first object that is an element and whose name is TITLE
if (Algorithmn::find( ReverseIter, AND(classid<Element>(), name("TITLE"))))
{
	Element aElementFound = *ReverseIter; // get the element and use it
}

Why we need iterators
1.) Iterators hide the details of how a grove is actually linked together, whether is memory or in a object database, etc.. 
2.) Iterators have the same iterface regardless of the types of properties in the grove
3.) Iterators are extensible and can provide read-only functionality as well as read-write functionality
4.) Iterators are a well know and accepted design pattern and are 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtigue at datachannel.com  Sun Jun 22 01:24:38 1997
From: jtigue at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:01 2004
Subject: JAX [was: XML Java API Standardization]
Message-ID: <33AC62E1.55731DC9@datachannel.com>

I have updated the site which discusses XML Java API Standardization in
order to reflect the feedback of the last few days.

The site is located at:
http://www.datachannel.com/channelworld/xml/dev/

The most significant change has been the inclusion of event stream
stuff. Event streams being lower level then parse trees they can't be
ignored. DSSSL grove work is being studied for its relavant influence in
terminology and future work. I'd like to leave the actual grove work for
a later version. So the work has been repositioned as the lowest level
(event streams) plus some of the next level (parse tree but not full
grove).

Also it seems the best thing to do would be to target JDK 1.1 because is
has java.io.Reader which makes Unicode and internationalization much
easier. JDK 1.0.2 will also supported but "depreciated." Every
implementation has been building some sort of UnicodeInputStream and it
seems that Reader is the way to go. I want JAX now but I don't want to
blow the i18n stuff.

I have become tired of typing "XML Java API Standardization" so I
propose we rename it to "Java API for XML" or JAX for short. If anyone
has a better idea I'd like to hear it.


--
John Tigue
Programmer
jtigue@datachannel.com
DataChannel (http://www.datachannel.com)
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 316 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970622/4387dca0/vcard.vcf
From cbullard at hiwaay.net  Sun Jun 22 02:27:56 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun  7 16:58:01 2004
Subject: JAX [was: XML Java API Standardization]
References: <33AC62E1.55731DC9@datachannel.com>
Message-ID: <33AC7165.C@hiwaay.net>

John Tigue wrote:
> 
>
> I have become tired of typing "XML Java API Standardization" so I
> propose we rename it to "Java API for XML" or JAX for short. If anyone
> has a better idea I'd like to hear it.

That's great.  With Java Jumpin' Beans we can now add Java Jumpin' JAX.

i love it.

len

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Sun Jun 22 12:51:43 1997
From: digitome at iol.ie (Digitome Ltd.)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Java API Standardization
Message-ID: <199706221051.LAA26889@mail.iol.ie>

(I am not a Java person so I don't know the syntax for doing the following
in Java.
I presume it is possible. I think the approach might be useful though so
here goes)

The idea is to 

1) have a textual representation of an XML document as a Python program
2) be able to re-create textual representations of XML document structures
as Python programs

The following is a Python representation of a simple XML doc:-

from XMLStructures import *

x = XMLTree (
        XMLElement("FOO",(("ATTR1","VALUE1"),("ATTR2","VALUE2")),
         (
          XMLElement("BAR",(),())
         )
        )
)

The nice thing about this is that it is both data file and parser rolled
into one.
A simple "import" statement  recreates the in-memory representation of this 
data structure. Having created/manipulated an XMLTree the textual
representation can be created with a single print statement:-

x = XMLTree (
        XMLElement("FOO",(("ATTR1","VALUE1"),("ATTR2","VALUE2")),
         (
          XMLElement("BAR",(),())
         )
        )
)

# Change x here
# ...
print x

XMLTree (XMLElement ("FOO",(('ATTR1', 'VALUE1'), ('ATTR2',
'VALUE2')),XMLElement ("BAR",(),())))

1) Such structures give an immediate API in the form of Lispy list
processing stuff.
2) Such structures allow parsers to be compared / checked for correct
interpretation of XML.
3) Such structures give developers something to aim at when developing XML
markup
aware tools.

Just in case anyone is interested, here is the Python code for the classes :-

class XMLTree:
	def __init__ (self,r):
		self.root = r
	def __repr__ (self):
		return "XMLTree (%s)" % (self.root,)

class XMLElement:
	def __init__(self,gi,attlist,children):
		self.GI = gi
		self.XMLAttlist = attlist
		self.Children = children
	def __repr__(self):
		return "XMLElement (\"%s\",%s,%s)" % (self.GI,self.XMLAttlist,self.Children)

class XMLPcdata:
	def __init__(self,dat):
		self.data = dat
	def __repr__(self):
		return self.data
Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Sun Jun 22 12:51:49 1997
From: digitome at iol.ie (Digitome Ltd.)
Date: Mon Jun  7 16:58:01 2004
Subject: JAX
Message-ID: <199706221051.LAA26896@mail.iol.ie>

JAX is Irish slang for toilet! :-(

Sean

Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Jun 22 15:43:16 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:01 2004
Subject: XML Java API Standardization
Message-ID: <8361@ursus.demon.co.uk>

In message <199706221051.LAA26889@mail.iol.ie> digitome@iol.ie (Digitome Ltd.) writes:
> (I am not a Java person so I don't know the syntax for doing the following
> in Java.

Just to reassure the membership - XML-DEV is not Java-only - anything goes :-)

[...]
> The following is a Python representation of a simple XML doc:-
> 
> from XMLStructures import *
> 
> x = XMLTree (
>         XMLElement("FOO",(("ATTR1","VALUE1"),("ATTR2","VALUE2")),
>          (
>           XMLElement("BAR",(),())
>          )
>         )
> )
> 
> The nice thing about this is that it is both data file and parser rolled
> into one.
> 

Presumably this is similar to a serialised object (except that I believe
that Java serialisation will not give a very readable file.)

A possible attraction of serialised XML objects (e.g. at grove level) is that
they would read into memory more rapidly, bother because no parsing was
required and presumably because there are tricks for allocating memory.
Obviously different parsers/applications would have different serialisations
but if we had a standard grove it *might* be possible to have agreed
serialisations of it.  Or is this off track?

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From peat at erols.com  Sun Jun 22 16:14:30 1997
From: peat at erols.com (Peat)
Date: Mon Jun  7 16:58:02 2004
Subject: JAX
Message-ID: <199706221414.KAA15189@smtp2.erols.com>

Oh oh, We can't have that !!!   Here is a suggestion as an alternative..

XAPI-J  pronounced "Zapi-J",  which allows for XAPI-C or XAPI-Prolog, etc.
and therefore extendible for whatever language which comes down the line.

- Bruce Peat

----------
> From: Digitome Ltd. <digitome@iol.ie>
> To: xml-dev@ic.ac.uk
> Subject: JAX
> Date: Sunday, June 22, 1997 6:26 AM
> 
> JAX is Irish slang for toilet! :-(
> 
> Sean
> 
> Sean Mc Grath
> 
> sean@digitome.com
> Digitome Electronic Publishing
> http://www.digitome.com
> 
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtigue at datachannel.com  Sun Jun 22 19:00:39 1997
From: jtigue at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:02 2004
Subject: XAPI-J [was: JAX]
References: <199706221414.KAA15189@smtp2.erols.com>
Message-ID: <33AD5A34.13777A1E@datachannel.com>

For the benefit of XML conversations in Ireland, let's change to XAPI.
Now Extensible Markup has an extensible API name.

I'm focused on XAPI-J; is there any work in other language that I should
be aware of?


> Oh oh, We can't have that !!!   Here is a suggestion as an
> alternative..
>
> XAPI-J  pronounced "Zapi-J",  which allows for XAPI-C or XAPI-Prolog,
> etc.
> and therefore extendible for whatever language which comes down the
> line.

<snip>

> > JAX is Irish slang for toilet! :-(

<snip>

--
John Tigue
Programmer
jtigue@datachannel.com
DataChannel (http://www.datachannel.com)
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 316 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970622/1566f95b/vcard.vcf
From tbray at textuality.com  Sun Jun 22 21:08:12 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:02 2004
Subject: MCF again
Message-ID: <3.0.32.19970622120552.00a799d0@pop.intergate.bc.ca>

MCF is Meta Content Framework, an application of XML proposed by
Netscape.  The drafts have been heavily reworked based on early feedback,
check the spec out at:

 http://www.textuality.com/mcf/NOTE-MCF-XML.html

If (like a lot of other people) you found MCF a little daunting first
time around, you might want to check out the new tutorial at:

 http://www.textuality.com/mcf/MCF-tutorial.html

I understand this is now going to migrate over to a just-now-forming
new working group in W3C that is going to try to co-ordinate all the
disparate metadata activities.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From steve at cs.anu.edu.au  Mon Jun 23 01:56:53 1997
From: steve at cs.anu.edu.au (Steven Ball)
Date: Mon Jun  7 16:58:02 2004
Subject: XML Java API Standardization 
In-Reply-To: Your message of "Sun, 22 Jun 1997 11:26:23 +0100."
             <199706221051.LAA26889@mail.iol.ie> 
Message-ID: <199706222356.JAA09014@tcltk.anu.edu.au>

> (I am not a Java person so I don't know the syntax for doing the following
> in Java...)

I'm no Java-phile either ;-)

> The idea is to 
> 
> 1) have a textual representation of an XML document as a Python program
> 2) be able to re-create textual representations of XML document structures
> as Python programs

I've done essentially the same thing for Tcl.  My XML parser emits a
"Heirarchical Tcl List Representation" of an XML document.  For example:

set doc {<?XML VERSION="1.0"?>
<!DOCTYPE MEMO SYSTEM "memo.dtd">
<MEMO REF="1234">
<TO>Audience</TO>
<FROM>Steve</FROM>
<MESSAGE>This is XML!</MESSAGE>
</MEMO>}

XML::parse $doc

returns ==>

parse:pi ?XML {VERSION 1.0} {}
parse:pi !DOCTYPE {SYSTEM memo.dtd} {} 
parse:elem MEMO {REF 1234} {
    parse:elem TO {} {
        parse:text Audience {} {}
    }
    parse:elem FROM {} {
        parse:text Steve {} {}
    }
    parse:elem MESSAGE {} {
        parse:text {This is XML!} {} {}
    }
}

(above has been edited slightly for email-readability)

This representation has two features: it can be easily manipulated
as a list, especially with the dummy arguments to parse:pi and parse:text,
and it can be passed to the `eval' command for execution - the element contents
are themselves scripts.

> 1) Such structures give an immediate API in the form of Lispy list
> processing stuff.
> 2) Such structures allow parsers to be compared / checked for correct
> interpretation of XML.
> 3) Such structures give developers something to aim at when developing XML
> markup aware tools.

Agreed, and the similarity of our (independent) approaches is noteworthy.
My only comment is that (2) is modulo list syntax.

Cheers,
Steve Ball


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at Eng.Sun.COM  Mon Jun 23 07:29:24 1997
From: Jon.Bosak at Eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:02 2004
Subject: XAPI-J [was: JAX]
In-Reply-To: <33AD5A34.13777A1E@datachannel.com> (jtigue@datachannel.com)
Message-ID: <199706230527.WAA04374@boethius.eng.sun.com>

[John Tigue:]

| For the benefit of XML conversations in Ireland, let's change to XAPI.
| Now Extensible Markup has an extensible API name.

Great.

| I'm focused on XAPI-J; is there any work in other language that I
| should be aware of?

There's already a validating XML parser in Tcl, and versions in other
languages are bound to follow.  Even so (and even trying to compensate
for my bias as a Sun employee), I think that there is, and is going to
be, such a powerful connection between XML and Java on the Web that
the default name for the Java XML API should be simply XAPI ("zappy"),
and all other versions should use the qualified names (Tcl-XAPI or
whatever).

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jeanpa at microsoft.com  Mon Jun 23 07:38:10 1997
From: jeanpa at microsoft.com (Jean Paoli)
Date: Mon Jun  7 16:58:02 2004
Subject: XML-Data
Message-ID: <78DFE33066ABD0118B9200805FD431BA932987@RED-16-MSG.dns.microsoft.com>

I am pleased to present XML-Data, a Position Paper from Microsoft.
XML-Data is an application of XML for exchanging 
structured data and metadata on the Internet. 
This position paper is sent to multiple working groups
in the W3C dealing with this subject (XML, meta-data)
and we expect this paper to be discussed and improved
by these working groups.
The current proposal needs namespaces and uses the Layman/Bray
proposal.

The URL of this paper (on the Microsoft site) will be posted tomorrow.
-Jean Paoli

----------------
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN">
<html>

<head>
<meta http-equiv="Content-Type"
content="text/html; charset=iso-8859-1">
<meta name="Template"
content="C:\MSOffice\Templates\Letters &amp; Faxes\VFPSPEC97.dot">
<meta name="GENERATOR" content="Microsoft FrontPage 2.0">
<title>XML-Data</title>
</head>

<body bgcolor="#FFFFFF" text="#000000" link="#0000EE"
vlink="#551A8B" alink="#FF0000">

<p align="right"><font size="4"><b>XML-Data.html</b></font> </p>
<p><font size="4"><b>Position Paper from Microsoft<br>20 June 1997
</b></font>
</p>


<h1 align="center">XML-Data</h1>


<dl>
    <dt>Authors: </dt>
    <dd><a href="mailto:andrewl@microsoft.com">Andrew Layman</a>,
        Microsoft Corporation<br>
        <a href="mailto:jeanpa@microsoft.com">Jean Paoli</a>,
        Microsoft Corporation<br>
        <a href="mailto:sjd@eps.inso.com"><font size="3">Steve De
        Rose</font></a><font size="3">, Inso Corporation</font><br>
        <a href="mailto:ht@cogsci.ed.ac.uk">Henry S. Thompson</a>,
        University of Edinburgh <br>
        </dd>
    <dt>Acknowledgements:</dt>
    <dd><font size="3">We thank </font><a
        href="mailto:paul@arbortext.com"><font size="3">Paul
        Grosso</font></a><font size="3"> (Arbortext), </font><a
        href="mailto:sca@eps.inso.com"><font size="3">Sharon
        Adler</font></a><font size="3"> (Inso Corporation), </font><a
        href="mailto:alb@eps.inso.com"><font size="3">Anders
        Berglund</font></a><font size="3"> (Inso Corporation), </font><a
        href="mailto:fcha@ais.Berger-Levrault.fr">Fran�ois
        Chahuneau</a> (AIS/Berger-Levrault),<font color="#0000FF"
        size="2" face="Arial"> </font><font size="3">and </font><a
        href="mailto:edwardj@microsoft.com"><font size="3">Edward
        Jung</font></a><font size="3"> (Microsoft) for their help
        and contributions to this proposal.</font></dd>
</dl>

<p>Copyright (c) 1997 Microsoft Corp. <br>
</p>

<hr>

<h2 align="left">Abstract</h2>

<p align="left">This document provides the specification for
exchanging structured and networked data on the Web. This
specification uses XML, the Extensible Markup Language for
describing data as well as data about data. We expect this
specification to be useful for a wide range of applications such
as describing database transfers, digital signatures or
remotely-located web resources.</p>

<h2 align="left">1. Introduction</h2>

<p><font color="#000000" size="3">The Internet holds the
potential to integrate all information in a global network (with
many private but integrated domains). The Internet promises
access to information any time and, with wireless technology,
anywhere. Today, however, the Internet is merely an <i>access
medium </i>to text and pictures. To actualize the Internet's
potential, we need to add intelligent search, data exchange,
adaptive presentation, and personalization. The Internet must go
beyond setting an information <em>access</em> standard, and must
set an information <i>understanding </i>standard, which means: a
standard way of representing data so that software can better
search, move, display, and otherwise manipulate information
currently hidden in contextual obscurity.</font></p>

<p><font color="#000000" size="3">XML is an important step in
this direction. It offers a standard syntax for textual structure
of tagged data, based on extensive industry and theoretical
experience. Its lexical format easily depicts a tree structure. A
tree is a natural format that is richer than a simple flat list,
yet (compared to a generalized graph) also respectful of
cognitive and data processing requirements for economy and
simplicity. </font></p>

<p><font color="#000000" size="3">Looking at this point in more
detail, there are several ways of structuring data. One is a flat
tagging system. In this system, sets of keywords are applied to
data elements. This is a simple form of data structure, but it
does not capture any relationships between the keywords.</font></p>

<p><font color="#000000" size="3">A more advanced means of
structuring information is a tree. A tree allows expression of
subsumption, containment, or any other single (contextual)
relationship such as &quot;manages.&quot; Trees correspond to
object-oriented class hierarchies, file system hierarchies,
organizational hierarchies and so forth. Trees are relatively
easy to understand and to construct. Trees are efficient to
process, and there is a linear (<em>e.g.</em> textual) structure
that a program can parse incrementally, and determine when it is
finished. This makes trees particularly useful as a transmission
format for asynchronous, distributed systems such as the
Internet, and also for display purposes where the single
relationship (usually visual containment) enables incremental
display.</font></p>

<p><font color="#000000" size="3">A still more elaborate
structure is a directed graph. A graph allows expression of
arbitrary binary relationships, that is, many relationships
between two things. A graph can express subsumption, containment,
and any number of other relationships simultaneously. It is
therefore a superset of a tree. This makes graphs very expressive
for real-world semantics, but it also makes them harder to
understand, more difficult to construct, and less efficient to
process than trees. There is no efficient linear (<em>e.g.</em>
textual) structure of a graph that can be incrementally
processed. Therefore, while they are particularly useful for
representing (and instrumenting) the complete semantics of a
system, they are typically not suitable for transmission,
display, or immediate processing.</font></p>

<p><font color="#000000" size="3">The tree structure is proved
broadly implementable and easy to deploy, not just in theory but
also widely in practice. Industrial implementations, in the SGML
community and elsewhere, demonstrate its intrinsic quality and
industrial strength, e.g. aircraft (ATA), automotive (J2008),
banking (OFX), and semiconductors (Pinnacles PCIS).</font></p>

<p><font color="#000000" size="3">This proposal shows how to add
a single convention to XML so that graph arcs are easily added
into a lexical tree structure, without requiring decomposition of
tree format into a &quot;lowest common denominator&quot;
nodes-and-arcs structure. (For a quick look at the difference,
see the </font><a href="#XML-Data-vs-MCF"><font color="#000000"
size="3">XML-Data versus MCF in XML comparison</font></a><font
color="#000000" size="3">.)</font></p>

<p><font color="#000000" size="3">XML-Data consists of a
collection of related technologies. First, it unifies lexical
trees with graph structures. Second, it builds on this to define
a representation for schemata based on XML instance syntax. It
offers a mechanism to organize element types into a hierarchy,
and proposes a small set of basic types. Finally, it adds
facilities for lexical typing and proposes a small collection of
lexical types.</font></p>

<p><font color="#000000" size="3">XML-Data can encode the
content, semantics and schemata for a gamut of cases, from simple
and prosaic to complex and sophisticated:</font></p>

<ul>
    <li><font color="#000000" size="3">An ordinary document</font></li>
    <li><font color="#000000" size="3">A structured record, such
        as a appointment record or purchase order</font></li>
    <li><font color="#000000" size="3">An object, with data and
        methods</font></li>
    <li><font color="#000000" size="3">A data record, such as the
        result set of a query</font></li>
    <li><font color="#000000" size="3">Information in a database
        or a web site (<em>e.g. </em>CDF)</font></li>
    <li><font color="#000000" size="3">Graphical presentation
(<em>e.g.</em>
        an application user interface)</font></li>
    <li><font color="#000000" size="3">Upper ontology (standard
        schema entities and types)</font></li>
    <li><font color="#000000" size="3">UberWeb (all the links
        between information and people on the web)</font></li>
</ul>

<p><font color="#000000" size="3">The resulting flexibility of a
single homogenous data representation system allows any reader to
uniformly determine the structural semantics of a data element.
Information can then be reused for new purposes and in novel
contexts. For example, a record from a database of restaurants
and a record from a client contact database might be reused in
the context of an appointment, say in setting a lunch date with a
client. The relationships between the restaurant and contact data
do not reside in the schema data described by either database
individually, but are extensions defined by the instance of the
appointment.</font></p>

<p><font color="#000000" size="3">This proposal, building on the
earlier <em>Web Collections in XML </em>proposal, shows how to
use a single syntax for a broad range of data, using that syntax
for data and schemata, permitting the expressiveness of graph
data when such power is required, but retaining the benefits of
lexical trees.</font></p>

<h2 align="left">2. Examples of XML-Data</h2>

<h3><font size="4" face="Times New Roman"><code>Data</code></font></h3>

<p><font size="4" face="Times New Roman"><code>The following
example shows a simple order from a bookstore for several books,
a record, and a cup of coffee.</code></font></p>

<pre><code>&lt;ORDER&gt;
  &lt;SOLD-TO&gt;
    &lt;PERSON&gt;&lt;LASTNAME&gt;<strong>Layman</strong>&lt;/PERSON&gt;
            &lt;FIRSTNAME&gt;<strong>Andrew</strong>&lt;/FIRSTNAME&gt;
    &lt;/PERSON&gt;
  &lt;/SOLD-TO&gt;
  &lt;SOLD-ON&gt;<strong>19970317</strong>&lt;/SOLD-ON&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>5.95</strong>&lt;/PRICE&gt;
    &lt;BOOK&gt;
      &lt;TITLE&gt;<strong>Number, the Language of
Science</strong>&lt;/TITLE&gt;
      &lt;AUTHOR&gt;<strong>Dantzig, Tobias</strong>&lt;/AUTHOR&gt;
    &lt;/BOOK&gt;
  &lt;/ITEM&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>12.95</strong>&lt;/PRICE&gt;
    &lt;BOOK&gt;
      &lt;TITLE&gt;<strong>Introduction to Objectivist
Epistemology</strong>&lt;/TITLE&gt;
      &lt;AUTHOR&gt;<strong>Rand, Ayn</strong>&lt;/AUTHOR&gt;
    &lt;/BOOK&gt;
  &lt;/ITEM&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>12.95</strong>&lt;/PRICE&gt;
    &lt;RECORD&gt;

&lt;TITLE&gt;&lt;COMPOSER&gt;<strong>Tchaikovsky's</strong>&lt;/COMPOSER
&gt;<strong> First Piano Concerto</strong>&lt;/TITLE&gt;
      &lt;ARTIST&gt><strong>Janos</strong>&lt;/ARTIST&gt;
    &lt;/RECORD&gt;
  &lt;/ITEM&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>1.50</strong>&lt;/PRICE&gt;
    &lt;COFFEE&gt;
      &lt;SIZE&gt;<strong>small</strong>&lt;/SIZE&gt;
      &lt;STYLE&gt;<strong>cafe macchiato</strong>&lt;/STYLE&gt;
    &lt;/COFFEE&gt;
  &lt;/ITEM&gt;
&lt;/ORDER&gt;</code></pre>

<p><font size="4" face="Times New Roman"><code>XML-Data is
flexible enough to encode heterogeneous structures, for example
books, records and coffee all within one sales order. These
different kinds of items do not need to all have the same
internal parts. For example, books have titles, coffee generally
doesn't. XML-Data allows values to be expressed as element
content (for example the book titles shown) or with a <em>value</em>
attribute (for example the author and artist elements).
Properties of elements can be expressed as attributes (e.g. size
and style of coffee) or as sub-elements (e.g. author, artist).
XML-Data can appear in separate documents or within other
documents (such as HTML pages).</code></font></p>

<h3><font size="4" face="Times New Roman"><code>Data about Other
Data</code></font></h3>

<p><font size="4" face="Times New Roman"><code>XML-Data is
suitable for complex, self-contained data structures such as the
book order, and also for information such as the </code></font><a
href="http://www.microsoft.com/standards/cdf-f.htm"><code>Channel
Definition Format</code></a><code>, </code><font size="4"
face="Times New Roman"><code>which describes remotely-located web
resources, many of which are themselves data:</code></font></p>

<pre><code>&lt;CHANNEL&gt;
  &lt;ITEM
HREF=&quot;<strong>http://www.zoosports.com/intro.htm</strong>&quot;
level=&quot;<strong>2</strong>&quot;
precache=&quot;<strong>NO</strong>&quot;&gt;
    &lt;A
HREF=&quot;<strong>http://www.zoosports.com/page1.htm</strong>&quot;&gt;
<strong>This is a link to page 1.</strong>&lt;/A&gt;
    &lt;TITLE&gt;<strong>Welcome to ZooSports!</strong>&lt;/TITLE&gt;
    &lt;ABSTRACT&gt;<strong>ZooSports articles, news, and promotional
offers</strong>&lt;/ABSTRACT&gt;
  &lt;/ITEM&gt;
  &lt;SCHEDULE ENDDATE=&quot;<strong>1994-11-05</strong>&quot;&gt;
    &lt;INTERVALTIME DAY=&quot;<strong>1</strong>&quot;/&gt;
    &lt;EARLIESTTIME HOUR=&quot;<strong>12</strong>&quot;/&gt;
    &lt;LATESTTIME HOUR=&quot;<strong>18</strong>&quot;/&gt;
  &lt;/SCHEDULE&gt;
&lt;/CHANNEL&gt;</code></pre>

<h3><font size="4" face="Times New Roman"><code>PICS-NG
Labels</code></font></h3>

<p><font size="4" face="Times New Roman"><code>XML-Data can
express PICS-NG Labels</code></font><font size="5"
face="Times New Roman"><code>:</code></font></p>

<p><font size="4" face="Times New Roman"><code>(This uses the
</code></font><a
href="http://www.w3.org/XML/Group/9705/namespace.htm"><font
size="4" face="Times New Roman"><code>Layman-Bray proposal for
namespaces</code></font></a><font size="4" face="Times New
Roman"><code>.)</code></font></p>

<pre><code>&lt;xml&gt;
  &lt;xml:schema&gt;
    &lt;namespaceDcl
href=&quot;<strong>http://purl.org/Schemas</strong>&quot;
name=&quot;<strong>purl</strong>&quot;/&gt;
    &lt;namespaceDcl
href=&quot;<strong>http://www.foo.com</strong>&quot;
name=&quot;<strong>foo</strong>&quot;/&gt;
  &lt;/xml:schema&gt;
  &lt;xml:data&gt;
    &lt;purl:description1
href=&quot;<strong>http://purl.color.org/document.html</strong>&quot;&gt
;
      &lt;title&gt;<strong>Light and Dark: A study of
color</strong>&lt;/title&gt;
      &lt;subject&gt;&lt;LCSH&gt;
          &lt;for&gt;<strong>Color and Color
Palettes</strong>&lt;/for&gt;&lt;/LCSH&gt; &lt;/subject&gt;
      &lt;author&gt; &lt;foo:author&gt;
                            &lt;name&gt;<strong>John
Smith</strong>&lt;/name&gt

&lt;affiliation&gt;<strong>thedarkside</strong>&lt;/affiliation&gt;

&lt;email&gt;<strong>john@thedarkside</strong>&lt;/email&gt;&lt;/foo:aut
hor&gt;
               &lt;foo:author&gt;
                            &lt;name&gt;<strong>Smith, Jane
Q.</strong>&lt;/name&gt

&lt;affiliation&gt;<strong>thelightregion</strong>&lt;/affiliation&gt;

&lt;email&gt;<strong>jane@thelightregion</strong>&lt;/email&gt;&lt;/foo:
author&gt;&lt;/purl:description1&gt;
  &lt;/xml:data&gt;
&lt;/xml&gt;</code></pre>

<h3><font size="4" face="Times New Roman"><code>Digital
Signatures, Security &amp;Authentication</code></font></h3>

<p><font size="4" face="Times New Roman"><code>Returning to the
bookstore example, this is the same order with a digital
signature added. The structured nature of XML-Data makes it easy
to sign whole elements or parts of them.</code></font></p>

<pre><code>&lt;ORDER&gt;
  &lt;dsig:DSIG&gt;

&lt;MANIFEST&gt><strong>80183589575795589189518915</strong>&lt;/MANIFEST
&gt;
    &lt;SIG
href=&quot;<strong>http://XYX/Joe@company.com</strong>&quot;/&gt;
  &lt;/dsig:DSIG&gt;
  &lt;SOLD-TO&gt;
    &lt;PERSON&gt;&lt;LASTNAME&gt><strong>Layman</strong>&lt;/PERSO&gt;
            &lt;FIRSTNAME&gt><strong>Andrew</strong>&lt;/FIRSTNAME&gt;
    &lt;/PERSON&gt;
  &lt;/SOLD-TO&gt;
  &lt;SOLD-ON&gt><strong>19970317&lt;/SOL</strong>&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>5.95</strong>&lt;/PRICE&gt;
    &lt;BOOK&gt;
      &lt;TITLE&gt;<strong>Number, the Language of
Science</strong>&lt;/TITLE&gt;
      &lt;AUTHOR&gt;<strong>Dantzig, Tobias</strong>&lt;/AUTHOR&gt;
    &lt;/BOOK&gt;
  &lt;/ITEM&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>12.95</strong>&lt;/PRICE&gt;
    &lt;BOOK&gt;
      &lt;TITLE&gt;<strong>Introduction to Objectivist
Epistemology</strong>&lt;/TITLE&gt;
      &lt;AUTHOR&gt;<strong>Rand, Ayn</strong>&lt;/AUTHOR&gt;
    &lt;/BOOK&gt;
  &lt;/ITEM&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>12.95</strong>&lt;/PRICE&gt;
    &lt;RECORD&gt;

&lt;TITLE&gt;&lt;COMPOSER&gt;<strong>Tchaikovsky's</strong>&lt;/COMPOSER
&gt;<strong> First Piano Concerto</strong>&lt;/TITLE&gt;
      &lt;ARTIST&gt><strong>Janos</strong>&lt;/ARTIST&gt;
    &lt;/RECORD&gt;
  &lt;/ITEM&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>1.50</strong>&lt;/PRICE&gt;
    &lt;COFFEE&gt;
      &lt;SIZE&gt;<strong>small</strong>&lt;/SIZE&gt;
      &lt;STYLE&gt;<strong>cafe macchiato</strong>&lt;/STYLE&gt;
    &lt;/COFFEE&gt;
  &lt;/ITEM&gt;
&lt;/ORDER&gt;</code></pre>

<h3><font size="4" face="Times New Roman"><code>Database
Information</code></font></h3>

<p><font size="4" face="Times New Roman"><code>While XML-Data can
represent complex structures, it can also represent simple ones,
for example a simple list of database records:</code></font></p>

<pre><code>&lt;BOOK-MASTER-LIST&gt;
  &lt;BOOK id=&quot;book1&quot;&gt;
    &lt;TITLE&gt;<strong>Number, the Language of
Science</strong>&lt;/TITLE&gt;
    &lt;AUTHOR&gt><strong>Dantzig, Tobias</strong>&lt;/AUTHOR&gt;
  &lt;/BOOK&gt;

  &lt;BOOK id=&quot;book2&quot;&gt;
    &lt;TITLE&gt;<strong>Introduction to Objectivist
Epistemology</strong>&lt;/TITLE&gt;
    &lt;AUTHOR&gt><strong>Rand, Ayn</strong>&lt;/AUTHOR&gt;
  &lt;/BOOK&gt;

  &lt;BOOK id=&quot;book3&quot;&gt;
    &lt;TITLE&gt;<strong>I, The Jury</strong>&lt;/TITLE&gt;
    &lt;AUTHOR&gt><strong>Spillane, Mickey</strong>&lt;/AUTHOR&gt;
  &lt;/BOOK&gt;

  &lt;BOOK id=&quot;book4&quot;&gt;
    &lt;TITLE&gt;<strong>Half Magic</strong>&lt;/TITLE&gt;
    &lt;AUTHOR&gt><strong>Eager, Edward</strong>&lt;/AUTHOR&gt;
  &lt;/BOOK&gt;

  &lt;BOOK id=&quot;book5&quot;&gt;
    &lt;TITLE&gt;<strong>QED</strong>&lt;/TITLE&gt;
    &lt;AUTHOR&gt><strong>Feynmann, Richard P.</strong>&lt;/AUTHOR&gt;
  &lt;/BOOK&gt;
&lt;BOOK-MASTER-LIST&gt;</code></pre>

<h3><font size="4" face="Times New Roman"><code>Graph
Structures</code></font></h3>

<p><font size="4" face="Times New Roman"><code>An XML-Data
element may include links to resources outside the immediate
tree. When it meets application needs, this <em>href</em>
facility can be used to break up a single structure into multiple
parts, with relations among them indicated by Universal Resource
Identifier (URI) links. The references can be local or remote. In
this example, they are inventory records from the database table
we just looked at.</code></font></p>

<pre><code>&lt;ORDER id=&quot;order1&quot;&gt;
   &lt;dsig:DSIG&gt;

&lt;MANIFEST&gt><strong>80183589575795589189518915</strong>&lt;/MANIFEST
&gt;
     &lt;SIG
href=&quot;<strong>http://XYX/Joe@company.com</strong>&quot;/&gt;
   &lt;/dsig:DSIG&gt;
   &lt;SOLD-TO&gt;

&lt;PERSON&gt;&lt;LASTNAME&gt><strong>Layman</strong>&lt;/PERSO&gt;
              &lt;FIRSTNAME&gt><strong>Andrew</strong>&lt;/FIRSTNAME&gt;
      &lt;/PERSON&gt;
    &lt;/SOLD-TO&gt;
    &lt;SOLD-ON&gt<strong>19970317&lt;</strong>&lt;/SOLD-ON&gt;
    &lt;ITEM
href=&quot;<strong>http://bigbookstore.com/data/bookmaster?XML-XPTR=book
1</strong>&quot;&gt;
      &lt;PRICE&gt;5.95&lt;/PRICE&gt;
    &lt;/ITEM&gt
    &lt;ITEM
href=&quot;<strong>http://bigbookstore.com/data/bookmaster?XML-XPTR=book
2</strong>&quot;&gt;
      &lt;PRICE&gt;12.95&lt;/PRICE&gt;
    &lt;/ITEM&gt
    &lt;ITEM
href=&quot;<strong>http://bigbookstore.com/data/musicmaster?XML-XPTR=cd1
</strong>&quot;&gt;
      &lt;PRICE&gt;12.95&lt;/PRICE&gt;
    &lt;/ITEM&gt
    &lt;ITEM&gt;
      &lt;PRICE&gt;1.50&lt;/PRICE&gt;
      &lt;COFFEE&gt;
        &lt;SIZE&gt;<strong>small</strong>&lt;/SIZE&gt;
        &lt;STYLE&gt;<strong>cafe macchiato</strong>&lt;/STYLE&gt;
      &lt;/COFFEE&gt;
    &lt;/ITEM&gt;
&lt;/ORDER&gt;</code></pre>

<p><font size="4" face="Times New Roman"><code>Notice that each
of the ITEM elements establishes a relationship between the ORDER
and a BOOK, and that the <em>relationship itself</em>
has attributes, in this case the price at which the book was
sold. Relations can have attributes, can contain elements and the
process can be carried to any needed level of detail.</code></font></p>

<h3><font size="4" face="Times New Roman"><code>Discontiguous
Information (propertyOf)</code></font></h3>

<p><font size="4" face="Times New Roman"><code>Information about
an element can be contained in the element, but also can sit
outside it. For example, the following applies a digital
signature to a sales order without actually modifying the
order:</code></font></p>

<pre><code>&lt;dsig:DSIG&gt;
  &lt;xml:propertyOf
href=&quot;<strong>http://bigbookstore.com/data/orders?XML-XPTR=order1&q
uot;/&gt;</strong>
  &lt;MANIFEST
&gt;<strong>80183589575795589189518915</strong>&lt;/MANIFEST&gt;
  &lt;SIG
href=&quot;<strong>http://XYX/Joe@company.com</strong>&quot;/&gt;
&lt;/dsig:DSIG&gt;</code></pre>

<h3><font size="4" face="Times New
Roman"><code>Schema</code></font></h3>

<p><font size="4" face="Times New Roman"><code>Every data object,
such as a purchase order, contains certain parts, such as
sold-to, sold-on date, items, etc. We can write a formal
description of what these parts are and which are allowed where.
This is called a &quot;schema&quot; and is written using a form
of XML-Data:</code></font></p>

<pre><code>&lt;xml:schema ID=&quot;BookOrderSchema&quot;&gt;
  &lt;!-- This schema is digitally signed. Schemas are a form of data,
       so they, too, can be signed. --&gt;
  &lt;dsig:DSIG&gt;
    &lt;MANIFEST
&gt;<strong>*(&amp;#&amp;$&amp;@*$&amp;%*&amp;@*$&amp;$*@</strong>&lt;/M
ANIFEST&gt;
    &lt;SIG
href=&quot;<strong>http://XYX/Jane@company.com</strong>&quot;/&gt;
  &lt;/dsig:DSIG&gt;

  &lt;!-- Here are all the element types, their contents,
       attributes and relations. --&gt;
  &lt;elementType id=&quot;<strong>ORDER</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#SOLD-TO</strong>&quot;/&gt;
    &lt;relation href=&quot;<strong>#SOLD-ON</strong>&quot;/&gt;
    &lt;relation href=&quot;<strong>#ITEM</strong>&quot;
occurs=&quot;<strong>STAR</strong>&quot;/&gt;
  &lt;/elementType&gt;
  &lt;relationType id=&quot;<strong>SOLD-TO</strong>&quot;&gt;
    &lt;elt href=&quot;<strong>#PERSON</strong>&quot;/&gt;
  &lt;/relationType&gt;
  &lt;relationType id=&quot;<strong>SOLD-ON</strong>&quot;&gt;  
    &lt;pcdata/&gt;
    &lt;!-- Date is YYYYMMDD --&gt;
    &lt;attribute name=&quot;<strong>lextype</strong>&quot;
default=&quot;<strong>DATE.ISO8061</strong>&quot;
presence=&quot;<strong>fixed</strong>&quot;/&gt;
  &lt;/relationType&gt;
  &lt;elementType id=&quot;<strong>PERSON</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#LASTNAME</strong>&quot;/&gt;
    &lt;relation href=&quot;<strong>#FIRSTNAME</strong>&quot;/&gt;
  &lt;/elementType&gt;
  &lt;elementType id=&quot;<strong>LASTNAME</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/elementType&gt;
  &lt;elementType id=&quot;<strong>FIRSTNAME</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/elementType&gt;
  &lt;relationType id=&quot;<strong>PRICE</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;
  &lt;relationType id=&quot;<strong>ITEM</strong>&quot;&gt;
    &lt;any/&gt;
    &lt;relation href=&quot;<strong>#PRICE</strong>&quot;/&gt;
    &lt;range href=&quot;<strong>#BOOK</strong>&quot;/&gt;
    &lt;range href=&quot;<strong>#RECORD</strong>&quot;/&gt;
    &lt;range href=&quot;<strong>#COFFEE</strong>&quot;/&gt;
  &lt;/relationType&gt;
  &lt;elementType id=&quot;<strong>BOOK</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#TITLE</strong>&quot;/&gt;
    &lt;relation href=&quot;<strong>#AUTHOR</strong>&quot;/&gt;
  &lt;/elementType&gt;
  &lt;elementType id=&quot;<strong>RECORD</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#TITLE</strong>&quot;/&gt;
    &lt;relation href=&quot;<strong>#ARTIST</strong>&quot;/&gt;
  &lt;/elementType&gt;
  &lt;relationType id=&quot;<strong>SIZE</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;
  &lt;relationType id=&quot;<strong>STYLE</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;
  &lt;elementType id=&quot;<strong>COFFEE</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#SIZE</strong>&quot;/&gt;
    &lt;relation href=&quot;<strong>#STYLE</strong>&quot;/&gt;
  &lt;/elementType&gt;
  &lt;elementType id=&quot;<strong>TITLE</strong>&quot;&gt;
    &lt;mixed&gt;&lt;elt
href=&quot;<strong>#COMPOSER</strong>&quot;/&gt;&lt;/mixed&gt;
  &lt;/elementType&gt;
  &lt;relationType id=&quot;<strong>AUTHOR</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;
  &lt;relationType id=&quot;<strong>ARTIST</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;
  &lt;relationType id=&quot;<strong>COMPOSER</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;
&lt;/xml:schema&gt;</code></pre>

<h3><font size="4" face="Times New Roman"><code>Type
Extension</code></font></h3>

<p><font size="4" face="Times New Roman"><code>Sometimes some
elements are variants of others, in which case we can organize
the element types into a genus-species hierarchy using the
<em>extends</em>
attribute:</code></font></p>

<pre><code>&lt;xml:schema ID=&quot;<strong>ArtSchema</strong>&quot;&gt;
  &lt;elementType id=&quot;<strong>artistic-work</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#TITLE</strong>&quot;/&gt;
  &lt;/elementType&gt;
  &lt;elementType id=&quot;<strong>BOOK</strong>&quot;
extends=&quot;<strong>#artistic-work</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#AUTHOR</strong>&quot;/&gt;
  &lt;/elementType&gt;
  &lt;elementType id=&quot;<strong>RECORD</strong>&quot;
extends=&quot;<strong>#artistic-work</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#ARTIST</strong>&quot;/&gt;
    &lt;relation href=&quot;<strong>#COMPOSER</strong>&quot;
occurs=&quot;<strong>OPTIONAL</strong>&quot;/&gt;
  &lt;/elementType&gt;
  &lt;relationType id=&quot;<strong>AUTHOR</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;
  &lt;relationType id=&quot;<strong>COMPOSER</strong>&quot;
extends=&quot;<strong>#AUTHOR</strong>&quot;/&gt;
  &lt;relationType id=&quot;<strong>ARTIST</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;
&lt;/xml:schema&gt;</code></pre>

<p><font size="4" face="Times New Roman"><code>Here we see that
books and records are both types of artistic work, and that a
composer is a type of author.</code></font></p>

<h3><font size="4" face="Times New Roman"><code>Schema
Extension</code></font></h3>

<p><font size="4" face="Times New Roman"><code>We can use also
use this ability to customize a schema that has useful features,
but which is too general. In this example, we show a general
schema for orders, then another one that is customized for our
bookstore:</code></font></p>

<pre><code>&lt;xml:schema
ID=&quot;<strong>GenericOrderSchema</strong>&quot;&gt;
  &lt;elementType id=&quot;<strong>ORDER</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#SOLD-TO</strong>&quot;/&gt;
    &lt;relation href=&quot;<strong>#SOLD-ON</strong>&quot;/&gt;
  &lt;/elementType&gt;
  &lt;relationType id=&quot;<strong>SOLD-TO</strong>&quot;&gt;
    &lt;elt href=&quot;<strong>#PERSON</strong>&quot;/&gt;
  &lt;/relationType&gt;
  &lt;elementType id=&quot;<strong>PERSON</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#LASTNAME</strong>&quot;/&gt;
    &lt;relation href=&quot;<strong>#FIRSTNAME</strong>&quot;/&gt;
  &lt;/elementType&gt;
  &lt;relationType id=&quot;<strong>LASTNAME</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;
  &lt;relationType id=&quot;<strong>FIRSTNAME</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;
&lt;/xml:schema&gt;  


&lt;xml:schema id=&quot;BookOrderSchema&quot;&gt;
  &lt;elementType id=&quot;<strong>ORDER</strong>&quot;
extends=&quot;<strong>http://generic.com/genericOrder?XML-XPTR=ID(ORDER)
</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#ITEM</strong>&quot;
occurs=&quot;<strong>STAR</strong>&quot;/&gt;
  &lt;/elementType&gt;

  &lt;relationType id=&quot;<strong>ITEM</strong>&quot;&gt;
    &lt;any/&gt;
    &lt;relation
href=&quot;<strong>http://generic.com/genericOrder?XML-XPTR=ID(ORDER)</s
trong>&quot;/&gt;
    &lt;range
href=&quot;<strong>http://art.com/schemata?XML-XPTR=ID(BOOK)</strong>&qu
ot;/&gt;
    &lt;range
href=&quot;<strong>http://art.com/schemata?XML-XPTR=ID(RECORD)</strong>&
quot;/&gt;
    &lt;range href=&quot;<strong>#COFFEE</strong>&quot;/&gt;
  &lt;/relationType&gt;

  &lt;relationType id=&quot;<strong>SIZE</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;

  &lt;relationType id=&quot;<strong>STYLE</strong>&quot;&gt;
    &lt;pcdata/&gt;
  &lt;/relationType&gt;

  &lt;elementType id=&quot;<strong>COFFEE</strong>&quot;&gt;
    &lt;relation href=&quot;<strong>#SIZE</strong>&quot;/&gt;
    &lt;relation href=&quot;<strong>#STYLE</strong>&quot;/&gt;
  &lt;/elementType&gt;
&lt;/xml:schema&gt;</code></pre>

<h2 align="left">3. XML-Data Schema</h2>

<p align="left">The XML-Data schema language defines element
types, attributes, relations, and which of these can be used in
which combinations with others. It also provides features for
organizing element types into a genus-species hierarchy, a basic
set of element types, and a small set of lexical types. The
schema contains other features from XML Document Type Definition
(DTD) language, such as entity and notation declarations. The
XML-Data schema is powerful enough to express the same structural
information and constraints as XML DTDs. It covers all the
features of XML-DTDs. An XML DTD can be mechanically converted to
an XML-Data schema. </p>

<p>Schemata are composed of principally of declarations for: </p>

<ul>
    <li>element types, represented by <i>elementType</i></li>
    <li>attributes of elements, represented by attribute</li>
    <li>relations<em> </em>among elements, represented by
<em>relationType</em></li>
    <li>rules governing the valid combinations of the above,
        represented by <em>any, mixed </em>and<em> pcdata; </em>also
        by<em> ent</em>, <em>group</em>, <em>relation, </em>and<em>
        range.</em>.</li>
    <li>internal and external entities, represented by
<i>intEntityDecl</i>
        and <i>extEntityDecl</i></li>
    <li>notations, represented by <i>notationDcl</i></li>
</ul>

<p>Comments can be interspersed as usual in XML, and there is
provision for using references to external schemata or schema
fragments.</p>

<h3><b>3.1. The schema document element type: </b><b><i>schema</i></b>
</h3>

<p>All schema elements are contained within a schema element,
like this:</p>

<pre><code>&lt;?XML version='1.0' rmd='all'?&gt;
&lt;!doctype schema SYSTEM
&quot;http://www.w3c.org/pub/sotr/schema.dtd&quot;&gt;
&lt;xml:schema id='ExampleSchema'&gt;
  &lt;!-- schema goes here. --&gt;
&lt;/xml:schema&gt;</code></pre>

<h3><b>3.2. The element type declaration element type:
elementType</b> </h3>

<p><em>Key terms used here:</em> <strong>element, elementType,
empty, any, mixed, pcdata</strong>, <strong>content model.</strong></p>

<p>The heart of an XML-Data schema is the <strong>elementType</strong>
declaration which defines a class of elements, gives them
attributes, establishes a grammar of which other element types
and character data are allowed in their contents and defines
their allowable relationships to elements of other classes. (The
allowable content, including relations, is called &quot;content
model.&quot;)</p>

<pre><code>&lt;elementType id=&quot;example&quot;&gt;  &lt;!-- element
example (p*) --&gt;
    &lt;elt href=&quot;#p&quot; occurs=&quot;STAR&quot;/&gt;
&lt;/elementType&gt;
&lt;elementType id=&quot;p&quot;&gt;       &lt;!-- element p
((#PCDATA|p)*) --&gt;
    &lt;mixed&gt;&lt;elt href=&quot;#p&quot;/&gt;&lt;/mixed&gt; 
&lt;/elementType&gt;</code></pre>

<p>The name attribute is optional if id is present, in which case
the id is used as the name.</p>

<p>Within an elementType, <em>elt</em> indicates that instances
are permitted to only have a single element type in their
content. The <em>occurs</em> attribute of <em>elt</em> specifies
whether this content is optional, and gives its cardinality. </p>

<p><em>Empty</em> and <em>any</em> content are expressed using
predefined elements <em>empty</em> and <em>any</em>. (<em>Empty</em>
may be omitted. <em>Any</em> signals that any mixture of elements
and parsed character data is legal.) Parsed character data
content is similarly expressed with a <em>pcdata</em> item.
<em>Mixed</em>
content (a mixture of parsed character data and one or more
element types), is identified by a <em>mixed</em> element, whose
content identifies the element types allowed in addition to
parsed character data (see below). </p>

<pre><code>&lt;elementType id=&quot;ARTIST&quot;&gt;
  &lt;pcdata/&gt;
&lt;/elementType&gt;</code></pre>

<p>More complex content models are created using <em>group</em>:</p>

<pre>&lt;elementType id=&quot;animalFriends&quot; &gt;
  &lt;group groupType=&quot;OR&quot; occurs=&quot;STAR&quot;&gt;
    &lt;group groupType=&quot;OR&quot; occurs=&quot;PLUS&quot;&gt;
      &lt;elt href=&quot;#cat&quot;/&gt;
      &lt;elt href=&quot;#dog&quot;/&gt;
    &lt;/group&gt;
    &lt;elt href=&quot;#bird&quot;/&gt;
    &lt;elt href=&quot;#rabbit&quot;/&gt;
    &lt;elt href=&quot;#pig&quot;/&gt;
    &lt;elt href=&quot;#fish&quot;/&gt;
  &lt;/group&gt;
&lt;/elementType&gt;</pre>

<h3>3.3 Relations</h3>

<p><em>Key terms used here:</em> <strong>relationType, relation,
XML-Link locator, href.</strong></p>

<p><em>Relation</em> element types express a relationship between
one element (usually the relation's parent) and either another
element or an atomic value (such as a simple number, string or
date). Relations use the XML-Link <em>locator</em> without
implying navigation. The target of a relation is the element
referenced by the <em>href</em> attribute if one is present, 
else the element contents. This single convention unifies graphs
and trees.</p>

<p>Including a relation in an elementType makes it an implicit
part of that element's content model, with the default for occurs
being OPTIONAL. Relations must occur (in a valid document
instance) after any other content. RelationsTypes are elements,
and the full content model is as if there were a sequential group
containing first the explicitly provided content model, then the
relations in a <em>starred</em> <em>or</em> group with all the
relations as content. </p>

<p>Two element types are used in the schema to effect a relation:
The <em>relationType</em> is a specialized kind of <em>elementType</em>,
while <em>relation</em> has the same function as <em>elt </em>(
but validates that it refers to a relationType). </p>

<p>If a <em>default</em> attribute is specified for a relation,
it becomes the default of the <em>value</em> attribute of the
relation elt. The <em>range</em> element, if present, declares a
restriction on the valid target of a relation. Each range element
references one elementType; any of which are valid. </p>

<pre><code> &lt;relationType id=&quot;favoriteFood&quot;
&gt;&lt;mixed/&gt;&lt;/relationType&gt;
 &lt;relationType id=&quot;chases&quot;
&gt;&lt;any/&gt;&lt;/relationType&gt;

 &lt;elementType id=&quot;dog&quot; &gt;
   &lt;any/&gt;
   &lt;attribute name=&quot;name&quot;/&gt;
   &lt;relation href=&quot;favoriteFood&quot;/&gt;
   &lt;relation href=&quot;chases&quot;/&gt;
 &lt;/elementType&gt;</code></pre>

<h3>3.4 Attributes</h3>

<p><em>Key terms used here:</em> <strong>attribute, attribute,
values, default. </strong></p>

<p>After the content model, attribute declarations may occur,
which are divided into attributes with enumerated or notation
values, and all other kinds.</p>

<pre><code>&lt;elementType id=&quot;p1&quot;&gt;       &lt;!-- element
p1 ((#PCDATA|p1)*) --&gt;
    &lt;mixed&gt;&lt;elt href=&quot;#p&quot;/&gt;&lt;/mixed&gt; 
    &lt;attribute name='id' type='ID'/&gt;  &lt;!-- attlist p id
ID=#IMPLIED
                                                        exm (a|b|c) 'c'
                                                        x CDATA FIXED
'y' --&gt;
    &lt;attribute name='exm' type='ENUMERATION' values='a b
c'default='c'/&gt;
    &lt;attribute name='x' defType='FIXED' default='y'/&gt;
&lt;/elementType&gt;</code></pre>

<p>An attribute may be given a <em>default</em> value. Whether it
is required or optional is signaled by <i>presence</i>. (Presence
ordinarily defaults to IMPLIED, but if omitted and there is an
explicit default, <i>presence</i> is set to the SPECIFIED.)</p>

<p>Attributes with enumerated (and notation) values permit a
<em>values</em>
attribute, a space-separated list of legal values.. The <em>values</em>
attribute is required when the <em>type</em> is ENUMERATION or
NOTATION,<em> </em>else it is forbidden. In these cases, if a
default is specified it must be one of the specified values.</p>

<p>Similar to the facility of multiple ATTLISTs, we sometimes
need to have <em>attributesDcls</em> declared separately from the
elementType they refer to. We can do this with the <em>propertyOf</em>
element, discussed later.</p>

<h3><b>3.5 The internal and external entity declaration element
type: </b><b><i>intEntityDcl</i></b> and <b><i>extEntityDcl</i></b></h3>

<p><em>Key terms used here:</em> <strong>entity, internal entity,
external entity, notation.</strong></p>

<p>This and the next two declarations cover <em>entities</em> in
general. Entities are a powerful shorthand mechanism, similar to
macros in a programming language.</p>

<pre><code>&lt;intEntityDcl name=&quot;LTG&quot;&gt;
    &lt;entityDef&gt;Language Technology Group&lt;/entityDef&gt;
&lt;/intEntityDcl&gt;</code></pre>

<pre><code>&lt;extEntityDcl name=&quot;dilbert&quot;&gt;
    &lt;notation href=&quot;#gif&quot;/&gt;
    &lt;systemId
href=&quot;http://www.ltg.ed.ac.uk/~ht/dilb.gif&quot;/&gt;
&lt;/extEntityDcl&gt;</code></pre>

<p>Here as elsewhere, following XML, <em>systemId</em> must be a
URL, absolute or relative, and <em>publicId</em>, if present,
must be a Public Identifier as defined in ISO/IEC 9070:1991,
Information technology -- SGML support facilities -- Registration
procedures for public text owner identifiers.. If a <em>notation</em>
is given, it must be declared (see below) and the entity will be
treated as binary, i.e., not substituted directly in place of
references.</p>

<pre><code>&lt;notationDcl name=&quot;gif&quot;&gt;
    &lt;systemId href='http://who.knows.where/'/&gt;
&lt;/notationDcl&gt;</code></pre>

<h3><b>3.6. The external declarations element type:
</b><b><i>extDcls</i></b>
</h3>

<p><em>Key terms used here:</em> <strong>external entity with
declarations.</strong></p>

<p>Although we allow an external entity with declarations to be
included, we recommend a different declaration for schema
modularization. The <em>extDcls</em> declaration gives a clean
mechanism for importing (fragments of) other schemata. It
replaces the common SGML idiom of declaring an external parameter
entity and then immediately referring to it, and has the same
import, namely, that the text referred to by the combination of
<b>systemId</b>
and <b>publicId</b> is included in the schema in place of the
<b>extDcls</b>
element, and that replacement text is then subject to the same
validity constraints and interpretation as the rest of the
schema.</p>

<h3>3.7. Type Extension</h3>

<p><em>Key terms used here:</em> <strong>type (class), typeOf,
extension (inheritance, subclassing), implements, extends, typeOf
(genus).</strong></p>

<p>Schema of all types can benefit from a subtyping mechanism:
indicating that one class of object is a specialization of
another more general class. For example, cat and dog both have
the type <em>pet</em> as their more general category. To make
more effective use of such classes, we introduce one new schema
attribute, which can be used to declare explicitly that an
element type is a subclass of another: <em>extends</em>: </p>

<pre><code>&lt;xml:schema&gt;
  &lt;elementType id=&quot;animalFriends&quot; &gt;
    &lt;elt href=&quot;#pet&quot; occurs=&quot;PLUS&quot; /&gt;
  &lt;/elementType&gt;

  &lt;elementType id=&quot;pet&quot; &gt;
    &lt;any/&gt;
  &lt;/elementType&gt;

  &lt;elementType id=&quot;cat&quot; extends=&quot;#pet&quot;/&gt;

  &lt;elementType id=&quot;dog&quot;  extends=&quot;#pet&quot;/&gt;

&lt;/xml:schema&gt;</code></pre>

<p>This schema says that the <em>animalFriends</em> element class
can contain one or more elements from the <em>pet</em> class,
such as a <em>cat</em> or a <em>dog</em>. Also, that each cat and
dog instance is a pet (<font size="3">that is, any cat is
semantically a pet, and any valid cat is also a valid pet</font>).
So the following data is now valid under this schema: </p>

<pre><code>&lt;animalFriends&gt;
  &lt;cat/&gt;
  &lt;dog/&gt;
  &lt;cat/&gt;
&lt;/animalFriends&gt;</code></pre>

<h4>Type Extension</h4>

<p>It is frequently necessary to <em>add</em> new attributes to a
subclass. This requires no extra machinery, because XML already
permits multiple attribute list declarations, which cumulatively
add attributes to element types. So each subclass may easily add
any new attributes desired, as shown here: </p>

<pre><code>&lt;elementType id=&quot;dog&quot;
extends=&quot;#pet&quot;/&gt;
  &lt;attribute name=&quot;age&quot;/&gt;
&lt;/elementType&gt;</code></pre>

<p>If the super type has content model, (attributes, etc.) these
are inherited, that is, they are also declared implicitly for the
derived class. In the following example, we give an <em>owner</em>
attribute to <em>pet</em>. This are inherited, so both <em>cat</em>
and <em>dog</em> now also now have an <em>owner</em> attribute..</p>

<pre><code>&lt;xml:schema&gt;
  &lt;elementType id=&quot;animalFriends&quot; &gt;
    &lt;elt href=&quot;#pet&quot; occurs=&quot;PLUS&quot; /&gt;
  &lt;/elementType&gt;

  &lt;elementType id=&quot;pet&quot;&gt;
    &lt;any/&gt;
    &lt;attribute id='name'/&gt;
    &lt;attribute id='owner'/&gt;
  &lt;/elementType&gt;

  &lt;elementType id=&quot;cat&quot; extends=&quot;#pet&quot;/&gt;
    &lt;elt href='#kittens'/&gt;
    &lt;attribute id='lives' type='NMTOKEN'/&gt;
  &lt;/elementType&gt;

  &lt;elementType id=&quot;dog&quot; extends=&quot;#pet&quot;/&gt;
    &lt;elt href='#puppies'/&gt;
    &lt;attribute id='breed'/&gt;
  &lt;/elementType&gt;
&lt;xml:schema&gt;</code></pre>

<p>This schema says that the animalFriends element class can
contain one or more <em>pet</em> elements. Because <em>cat</em>
and <em>dog</em> are subtypes of <em>pet</em>, they can occur as
well. So the following instance fragment is now valid under this
schema: </p>

<pre><code>&lt;animalFriends&gt;
  &lt;cat name=&quot;Fluffy&quot; lives='9'/&gt;
  &lt;pet name=&quot;Diego&quot;/&gt;
  &lt;dog name=&quot;Gromit&quot; owner='Wallace' breed='mutt'/&gt;
&lt;/animalFriends&gt;</code></pre>

<p>Additional relations can also be added, but only be added if
the content model of the superType consists of a single list of
optional, repeatable element types.</p>

<p>When defining a derived element class, one can also override
existing attributes and relations. The following example adds a
<em>Height</em>
relation and overrides the <em>favoriteFood</em> relation, giving
it a default value of &quot;Fish.&quot; (We also do something
fancy here. Making this overridden element itself have its super
type favoriteFood ensures that the derived element is in all
other respects identical.) </p>

<pre><code>&lt;relationType id=&quot;height&quot;&gt;
  &lt;any/&gt;
&lt;/relationType&gt;

&lt;relationType id=&quot;#favoriteCatFood&quot;
extends=&quot;#favoriteFood&quot;/&gt;

&lt;elementType id=&quot;cat&quot; extends=&quot;#pet&quot;/&gt;
  &lt;relation href=&quot;#height&quot;/&gt;
  &lt;relation href=&quot;#favoriteCatFood&quot;
default=&quot;Fish&quot;/&gt;
&lt;/elementType&gt;</code></pre>

<h4>Schema Extension</h4>

<p>We can also use subtyping to extend an existing schema without
editing it. Suppose that we cannot edit the schema defining pet,
cat or dog, but want to use elements with those names and
semantics in our document. The following adds the
&quot;eyeColor&quot; property to <em>cat</em>.</p>

<pre><code>&lt;relationType id=&quot;eyeColor&quot;
extends=&quot;http://whereever.org/#eyeColor&quot;&gt;
    &lt;pcdata/&gt;
&lt;/relationType&gt;

&lt;elementType id=&quot;cat&quot;
extends=&quot;http://whereever.org/#cat&quot;/&gt;
  &lt;relation href=&quot;#eyeColor&quot;/&gt;
&lt;/elementType&gt;</code></pre>

<p>The rules for allowable subtyping must enforce certain
constraints, which are in principle that a subtype can have
additional relations and attributes (provided this is consistent
with the super type's content model, but never fewer) and can add
restrictions (but never relax them). In practice, this principle
leads to rules such as that default values can be added if there
are none, changed, or converted to FIXED if DEFAULT.</p>

<h4>Implements</h4>

<p>Subtyping as we have described it here is actually a
combination of two effects: First, we assert that an element of
one type is also of another (as in a cat is a pet).</p>

<p>Second, we achieve economies and maintainability in the
declarations to make sure that the first is true. That is, the
derived element class is automatically provided with all the
properties of the super type. Sometimes it is valuable to have
the first effect without the second. (This is equivalent to the
Java <em>implements</em> facility.) We indicate this by using the
<em>implements</em> element, as in </p>

<pre><code>&lt;relationType id=&quot;favoriteFood&quot; &gt;
  &lt;mixed/&gt;
&lt;/relationType&gt;

&lt;relationType id=&quot;weight&quot; &gt;
  &lt;mixed/&gt;
&lt;/relationType&gt;

&lt;elementType id=&quot;cat&quot; &gt;
  &lt;implements href=&quot;http://whereever.org/#pet&quot; /&gt;
  &lt;attribute name=&quot;name&quot;/&gt;
  &lt;relation href=&quot;#favoriteFood&quot; /&gt;
  &lt;relation href=&quot;#weight&quot; /&gt;
&lt;/elementType<em>&gt;</em></code></pre>

<p><font size="3">This has no effect on the attributes or
relations of instances of cat, but asserts in the schema that
every cat is also a pet (that is, any cat is semantically a pet,
and any valid cat is also a valid pet).</font></p>

<h4>Relation of Type Extension to Parameter Entities</h4>

<p>Sophisticated DTDs often make complex use of <em>parameter
entities</em> in an attempt to consolidate common structures in
one, reusable place. Such parameter entities often represent
implicit classes.</p>

<p>The need is real, but the approach often leads to obscurity,
and reduced maintainability. Further, expansion of entities loses
all connection with their source: once expanded, the fact that
some set of element types was a co-declared set, re-used in
multiple places, is lost. </p>

<h3>3.8 Lexical Data Types</h3>

<p>Information such as dates and numbers is often expressed in a
format that requires some further parsing. For example, the same
date can be written &quot;October 22, 1954&quot; or
&quot;19541022&quot;. (And from what I've seen, about 300 other
ways.) The <em>lextype</em> attribute discriminates formats.
Appearing on instance elements, it describes the format of the
remainder of the element. The value of the lextype attribute is
always by reference to a URI identifying the parsing rules.
XML-Data should define a small number of these. We propose
NUMBER, INTEGER, REAL and DATE.ISO8061.</p>

<pre><code>&lt;birthday
lextype=&quot;<strong>DATE.ISO8061</strong>&quot;&gt;<strong>19541022</s
trong>&lt;/birthday&gt;</code></pre>

<p><font size="4" face="Times New Roman"><code>These are declared
in the schema as follows:</code></font></p>

<pre><code>&lt;relationType id=&quot;<strong>birthday</strong>&quot;&gt;
  &lt;attribute name=&quot;<strong>lextype</strong>&quot;
default=&quot;<strong>DATE.ISO8061</strong>&quot;
presence=&quot;<strong>fixed</strong>&quot;/&gt;
&lt;/relationType&gt;</code></pre>

<p><font size="4" face="Times New Roman"><code>When giving the
lexical type of an <em>attribute</em>
in the schema, <em>lextypeIs</em> is
used, as in:</code></font></p>

<pre><code>&lt;attribute name=&quot;<strong>price</strong>&quot;
presence=&quot;<strong>REQUIRED</strong>&quot;
lextypeIs=&quot;<strong>number</strong>&quot;/&gt;</code></pre>

<p>Some patterns will indicate that several properties or
attributes should be used in combination to arrive at a value.
For example, a custom pattern could indicate a date expressed as
the following: </p>

<pre><code>&lt;relationType id=&quot;<strong>birthday</strong>&quot;&gt;
  &lt;attribute name=&quot;lextype&quot;
default=&quot;<strong>DATE.ATTR-YMD</strong>&quot;
presence=&quot;<strong>specified</strong>&quot;/&gt;
&lt;/relationType&gt;
...
&lt;birthday year=&quot;<strong>1954</strong>&quot;
month=&quot;<strong>10</strong>&quot;
day=&quot;<strong>22</strong>&quot; &gt;
</code></pre>

<h3>3.9. Basic Semantic Data Types</h3>

<p>We need to define here a small number of basic types and their
hierarchy, corresponding to simple data types such as Number and
Date. (Dates are a subtype of numbers.) </p>

<p>We also need to define the expression of each of the basic
Java and SQL data types in terms of these basic ones, plus
additional properties giving units, precision, min, max, default
pattern, and other properties. For example, an INTEGER typically
is a number a certain min and max property values. Note that
units should be an element type with possible structure, so that
things like &quot;miles/hours&quot; or &quot;feet/(sec*sec)&quot;
can be represented and used for automatic conversions.</p>

<h2 align="left">4. Standard Vocabulary</h2>

<p align="left">We expect standard libraries of vocabulary to be
developed to capture common semantic used in vertical
applications and particularly in industry and application
domains. Dublin Core and CDF are two examples of such standard
libraries.</p>

<h2 align="left">5. Relations to other proposed standards</h2>

<p align="left"><font size="3">The W3C site at</font><font
size="4"> </font><a href="http://www.w3.org/PICS/Member/NG/"><font
color="#0000FF"
size="3"><u>http://www.w3.org/PICS/Member/NG</u></font></a><font
color="#0000FF" size="3"><u> </u></font><font color="#000000"
size="3">contains links to several related papers, including Ora
Lassila's </font><a
href="http://www.w3.org/pub/WWW/Member/9705/WD-pics-ng-metadata-970514.h
tml"><font
color="#000000" size="3">PICS-NG document</font></a><font
color="#000000" size="3">, Renato Ianella's small PICS extension
proposal, CDF, MCF in XML, the </font><a
href="http://www.w3.org/pub/WWW/Member/9703/XMLsubmit.html"><font
color="#000000" size="3">Web Collections using XML</font></a><font
color="#000000" size="3"> proposal. Specific notes on some of
these follow:</font></p>

<h3>5.1 XML-LINK</h3>

<p>All relations use <em>href</em> in a manner consistent with <a
href="http://www.w3.org/pub/WWW/TR/WD-xml-link-970406.html">XML-LINK</a>
working draft dated April 6, 1997 (the most recent as of the time
of this writing). XML-Links are a type of <em>relation</em> (with
extra attributes, elements, and semantics indicating traversal).</p>

<h3>5.2 PICS-NG</h3>

<p><a
href="http://www.w3.org/pub/WWW/Member/9705/WD-pics-ng-metadata-970514.h
tml#intro">PICS-NG
Metadata Model and Label Syntax</a> describes a set of
requirements for structured data to be used on the Internet.
XML-Data is an application of XML concepts to those requirements.</p>

<h3>5.3 CDF</h3>

<p><font size="3">The </font><a
href="http://www.microsoft.com/standards/cdf-f.htm"><font
size="3">Channel Definition Format</font></a><font size="3">
(CDF) is a natural application of XML-Data and is fully
compatible with the syntax and the ideas presented in this
document</font>. Its format is a validatable grammar given a
proper schema. The existing use of href in CDF is consistent with
XML-LINK and XML-Data usage. CDF defines a number of basic
element types that would be appropriate for a standard library.</p>

<h3>5.4 MCF in XML</h3>

<p><a href="http://www.w3.org/Member/9706/xmlmcf.htm">MCF in XML</a>
has two principal components: The ability to represent a
&quot;directed labeled graph&quot; and also a set of predefined
element types. The first of these is effected by a convention on
use of the <em>href</em> attribute (the same convention used in
XML-Data <em>relations</em>, with the same effect). Of the
second, some element types are genuinely necessary to represent
schemata and a type system (these are also present in XML-Data)
while others would be appropriate for a standard library.</p>

<p>XML-Data has a number of features not in MCF: </p>

<ul>
    <li>Principally, XML-Data permits <strong>tree structures</strong>
        in cases when MCF only permits a graph. (MCF requires
        that the target of all relations must be out-of-line when
        it is an element. XML-Data allows in-line targets.) </li>
    <li>XML-Data hrefs are explicitly <strong>URI</strong>s.
        (Though MCF <em>unit</em>s can be URIs, it is not clear
        from the current document when they are and when they are
        not.)</li>
    <li>Finally, names in XML-Data were chosen for more
        compatibility with <strong>existing XML usage</strong>
        (or at least that is the intention).</li>
    <li>XML-Data schemata can represent all the information in an
        XML <strong>DTD</strong>, while it is not clear that MCF
        can do this. </li>
    <li>XML-Data has additional capabilities for expressing
<strong>relationships
        in the schema</strong> (relation, relationType, extends,
        implements). </li>
    <li>XML-Data proposes <em><strong>lextypes</strong></em> as a
        basic element type, a feature not discussed in MCF. </li>
</ul>

<p>This chart tabulates the MCF &quot;bootstrap&quot; element
types and describes their equivalence in XML-Data</p>

<dl>
    <dt>Category</dt>
    <dd>&quot;elementType&quot; in XML-Data.</dd>
    <dt>typeOf</dt>
    <dd>&quot;typeOf&quot; relation in XML-Data.
        Also,&quot;extends&quot; and &quot;implements&quot; in
        XML-Data assert the relationship in the schema. </dd>
    <dt>Unit</dt>
    <dd>&quot;href&quot; in XML-Data.</dd>
    <dt>domain</dt>
    <dd>&quot;propertyOf&quot; in XML-Data.</dd>
    <dt>range</dt>
    <dd>&quot;range&quot; in XML-Data. This gives the allowed
        type of the target of a property.</dd>
    <dt>superType</dt>
    <dd>This may correspond to &quot;implements&quot; XML Data.
        However the MCF document is not clear on this point.</dd>
    <dt>Property</dt>
    <dd>This corresponds to the abstract concept of a link class
        expressed in schemata by <em>relation</em> and
<em>relationType</em>..
    </dd>
    <dt>FunctionalProperty</dt>
    <dd>This appears to be a <em>relation</em> with <em>occurs</em>
        = OPTIONAL or REQUIRED (that is, occurs at most once).</dd>
    <dt>mutuallyDisjoint</dt>
    <dd>This is a relationship asserted among the members of an
        enumeration. XML-Data does not contain a predefined
        propertyType for this. It could be added easily if this
        is useful. </dd>
    <dt>parent</dt>
    <dd>A generic property, whose meaning appears to be
        contextual. XML-Data does not contain a predefined
        elementType for this. It is unneeded because parentage is
        expressed by containment, while when out-of-line,
        specific meanings are conveyed by more precise
        relationship types such as <em>propertyOf</em>.</dd>
    <dt>name</dt>
    <dd>&quot;name&quot; in XML-Data. However, note that like
        parent, the interpretation of name in MCF seems to be
        contextual.</dd>
    <dt>description</dt>
    <dd>XML-Data does not contain a predefined elementType for
        this. We think that this belongs to a standard library
        and not in this specification.</dd>
    <dt>Sequence</dt>
    <dd>This is a special arc type in MCF that expresses the same
        fact as lexical order in XML.</dd>
    <dt>ord</dt>
    <dd>This is a MCF helper element type for Sequence.</dd>
</dl>

<p><a name="XML-Data-vs-MCF">Comparative examples of XML-Data and
MCF in XML</a> representation of an order for several books. (All
persons in this example are assumed to be not in the document,
but elsewhere.) The <em>id</em> attribute is on all elements
representing real-world objects, in both models. In the MCF model
<em>id</em> also appears on elements needed artificially for
reference. </p>

<table border="0">
    <tr>
        <td><font size="4">MCF in XML</font></td>
        <td><font size="4">XML-Data</font></td>
    </tr>
    <tr>
        <td valign="top"><pre><code>
&lt;ORDER id=&quot;order1&quot;&gt;
  &lt;SOLD-TO
unit=&quot;<strong>http:/people#person1</strong>&quot;/&gt;
  &lt;SOLD-ON value=&quot;<strong>19970317</strong>&quot;/&gt;
  &lt;ITEMS unit=&quot;<strong>sequence1</strong>&quot;/&gt;
&lt;/ORDER&gt;

&lt;BOOK id=&quot;book1&quot;&gt;
  &lt;TITLE value=&quot;<strong>Number, the Language of
Science</strong>&quot;/&gt;
  &lt;AUTHOR unit=&quot;<strong>http:/people#person2</strong>&quot;/&gt;
&lt;/BOOK&gt;

&lt;SEQUENCE id=&quot;sequence1&quot;&gt;
  &lt;ORD UNIT=&quot;book1&quot;&gt;
    &lt;PRICE value=<strong>&quot;5.95&quot;</strong>/&gt;
  &lt;/ORD&gt;
  &lt;ORD UNIT=&quot;cd1&quot;&gt;
    &lt;PRICE value=<strong>&quot;12.95&quot;</strong>/&gt;
  &lt;/ORD&gt;
  &lt;ORD UNIT=&quot;book2&quot;&gt;
    &lt;PRICE value=<strong>&quot;6.95&quot;</strong>/&gt;
  &lt;/ORD&gt;
  &lt;ORD UNIT=&quot;food1&quot;&gt;
    &lt;PRICE value=<strong>&quot;1.50&quot;</strong>/&gt;
  &lt;/ORD&gt;
&lt;/SEQUENCE&gt;

&lt;COFFEE id=&quot;food1&quot;&gt;
  &lt;size value=&quot;<strong>small</strong>&quot;/&gt;
  &lt;style value=&quot;<strong>cafe macchiato</strong>&quot;/&gt;
&lt;/RECORD&gt;

&lt;RECORD id=&quot;cd1&quot;&gt;
  &lt;TITLE value=&quot;<strong>Rachmaninoff's Second Piano
Concerto</strong>&quot;/&gt;
  &lt;ARTIST unit=&quot;<strong>http:/people#person3</strong>&quot;/&gt;
&lt;/RECORD&gt;

&lt;BOOK id=&quot;book2&quot;&gt;
  &lt;TITLE value=&quot;<strong>The Evolution of
Complexity</strong>&quot;/&gt;
  &lt;AUTHOR unit=&quot;<strong>http:/people#person4</strong>&quot;/&gt;
&lt;/BOOK&gt;</code></pre>
        </td>
        <td valign="top"><pre>
<code>&lt;ORDER id=&quot;order1&quot;&gt;
  &lt;SOLD-TO
href=&quot;<strong>http:/people#person1</strong>&quot;/&gt;
  &lt;SOLD-ON value=&quot;<strong>9970317&quot;</strong>/&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>5.95</strong>&lt;/PRICE&gt;
    &lt;BOOK id=&quot;book1&quot;&gt;
      &lt;TITLE &gt;<strong>Number, the Language of
Science</strong>&lt;/TITLE&gt;
      &lt;AUTHOR
href=&quot;<strong>http:/people#person2</strong>&quot;/&gt;
    &lt;/BOOK&gt;
  &lt;/ITEM&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>12.95</strong>&lt;/PRICE&gt;
    &lt;RECORD id=&quot;cd1&quot;&gt;
    &lt;TITLE &gt;<strong>Rachmaninoff's Second Piano
Concerto</strong>&lt;/TITLE&gt;
      &lt;ARTIST
href=&quot;<strong>http:/people#person3</strong>&quot;/&gt;
    &lt;/RECORD&gt;
  &lt;/ITEM&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>6.95</strong>&lt;/PRICE&gt;
    &lt;BOOK id=&quot;book2&quot;&gt;
      &lt;TITLE &gt;<strong>The Evolution of
Complexity</strong>&lt;/TITLE&gt;
      &lt;AUTHOR
unit=&quot;<strong>http:/people#person4</strong>&quot;/&gt;
    &lt;/BOOK&gt;  
  &lt;/ITEM&gt;
  &lt;ITEM&gt;
    &lt;PRICE&gt;<strong>1.50</strong>&lt;/PRICE&gt;
    &lt;COFFEE&gt;
      &lt;SIZE&gt;<strong>small</strong>&lt;/SIZE&gt;
      &lt;STYLE&gt;<strong>cafe macchiato</strong>&lt;/STYLE&gt;
    &lt;/COFFEE&gt;
  &lt;/ITEM&gt;
&lt;/ORDER&gt;</code></pre>
        </td>
    </tr>
</table>

<p>&nbsp;</p>

<h2 align="left">6. Conclusion</h2>

<p><font color="#000000" size="3">Future applications of the
Internet will focus on adding user value to information through
semantic annotation. Semantics will permit information to be
discovered, targeted, reused, and integrated. Not only does this
make the content more usable, but it opens up opportunities for
software developers to build components that exploit these
semantics. Such components could include applications as prosaic
as application or user logging, or as futuristic as user agents
that assist in finding or organizing contents, World-Wide Web
&quot;surf buddies&quot; that accompany a user's browsing and
adding valuable or entertaining comments, or natural language
query systems. Semantic annotation turns the Internet into a
platform for programming powerful and valuable applications.</font></p>

<p><font color="#000000" size="3">This proposal lays the
foundation for how applications can annotate their information
content. The proposal adds powerful new constructs for
representing semantics, sufficiently advanced for use in
artificial intelligence and natural language systems, yet retains
the architecture and investment of existing XML and the
efficiency of its representation.</font></p>

<hr>

<h2 align="left">Appendix A - The XML DTD for a schema</h2>

<pre><code>
&lt;!ENTITY % nodeattrs 'id ID #IMPLIED'  &gt;
&lt;!-- href is as per XML-LINK, but is not required unless there is
      no content --&gt;

&lt;!ENTITY % exattrs   'extends CDATA #IMPLIED'  &gt;

&lt;!ENTITY % linkattrs 'id ID #IMPLIED
                      href CDATA #IMPLIED' &gt;

&lt;!-- The shared content model of elementType, linkType and
relationType --&gt;
&lt;!-- Omitted element type same as &quot;empty.&quot; --&gt;
&lt;!ENTITY % extendedmodel 'implements*,
                          (elt|group|empty|any|pcdata|mixed)?,
                          (relation|attribute)*'&gt;

&lt;!-- The top-level container --&gt;
&lt;!element schema         ((elementType|propertyOf|linkType|
                          relationType|extendType|augmentElementType|
                          intEntityDcl|extEntityDcl|
                          notationDcl|extDcls|c)*)&gt;
&lt;!attlist schema %nodeattrs;&gt;

&lt;!-- Element Type Declarations --&gt;
&lt;!element elementType   (%extendedmodel)&gt;
&lt;!-- Either name or id must be present - - absent name defaults to id
--&gt;
&lt;!attlist elementType %nodeattrs;
                      %exattrs;
                name    CDATA      #IMPLIED&gt;

&lt;!-- Element types allowed in content model --&gt;
&lt;!-- Note this is just short for a model group with only one elt in
it --&gt;
&lt;!element elt           EMPTY&gt;
&lt;!-- Elements can have exponents as well as groups --&gt;
&lt;!-- The href is required --&gt;
&lt;!attlist elt   %linkattrs;
                occurs     (required|optional|star|plus) 'required'&gt;

&lt;!-- A group in a content model, sequential or disjunctive --&gt;
&lt;!element group         ((group|elt)+)&gt;
&lt;!attlist group         %nodeattrs;
                groupType (seq|or) 'seq'
                occurs  (required|optional|plus) 'required'&gt;

&lt;!element any           EMPTY&gt;
&lt;!element empty         EMPTY&gt;
&lt;!element pcdata	EMPTY&gt;

&lt;!-- mixed content is just a flat, non-empty list of elts --&gt;
&lt;!-- We don't need to say anything about #pcdata, it's implied --&gt;
&lt;!element mixed         (elt+)&gt;
&lt;!attlist mixed         %nodeattrs;&gt; 

&lt;!-- Attributes --&gt;
&lt;!-- default value must be present iff presence is specified or fixed
--&gt;
&lt;!-- presence defaults to specified if default is present, else
implied --&gt;
&lt;!-- name attribute is locally unique, defaults to id if absent
--&gt;
&lt;!element attribute  empty&gt;
&lt;!attlist attribute  %linkattrs;
                name    CDATA #IMPLIED
                type
(id|idref|idrefs|entity|entities|nmtoken|nmtokens|
                         enumeration|notation|cdata) 'cdata'
                default CDATA #IMPLIED
                values NMTOKENS #IMPLIED
                presence (implied|specified|required|fixed) #IMPLIED 
                lextypeIs CDATA #IMPLIED&gt;

&lt;!-- Relations - - relationTypes are pointed to from relations,
            just as elementTypes are pointed to from elts --&gt;
&lt;!element relationType  (%extendedmodel;,
                         range*)&gt;
&lt;!attlist relationType  %nodeattrs;
                        %exattrs;
                        name CDATA #IMPLIED &gt;

&lt;!element range empty &gt;
&lt;!attlist range %linkattrs; &gt;

&lt;!element relation  EMPTY&gt;
&lt;!attlist relation  %linkattrs;
                    default CDATA #IMPLIED
                    occurs (required|optional|star|plus) 'optional'&gt;

&lt;!-- For adding attributes to existing element types --&gt;
&lt;!element propertyOf    EMPTY&gt;
&lt;!attlist propertyOf    href CDATA #REQUIRED&gt;

</code><font color="#000000" size="3">&lt;!element augmentElementType
((relation|attribute)*)&gt;
&lt;!attlist augmentElementType %linkattrs;
                             %</font><code>exattrs</code><font
color="#000000" size="3">;&gt;</font><code>

&lt;!-- Shorthand for simple XML-LINKs --&gt;
&lt;!element linkType (%extendedmodel;)&gt;
&lt;!attlist linkType %nodeattrs;
                   %exattrs;
                   name CDATA #IMPLIED
                   role CDATA #IMPLIED
                   title CDATA #IMPLIED
                   show (embed|replace|new) #IMPLIED
                   actuate (auto|user) #IMPLIED
                   behaviour CDATA #IMPLIED &gt;
</code><font size="4"><code>
</code></font><code>&lt;!element implements EMPTY&gt;
&lt;!attlist implements href CDATA #REQUIRED&gt;

&lt;!-- Entity Declarations --&gt;
&lt;!-- Note as this is written only external entities
      can have structure without escaping it --&gt;
&lt;!-- Name defaults to id if absent --&gt;
&lt;!element intEntityDcl     (#PCDATA)&gt;
&lt;!attlist intEntityDcl %nodeattrs;
                name    CDATA #IMPLIED&gt;

&lt;!-- The entity will be treated as binary if a notation is present
--&gt;
&lt;!-- systemID and publicId (if present) must have the required syntax
--&gt;
&lt;!element extEntityDcl    ( systemId, publicId?)&gt;
&lt;!attlist extEntityDcl %nodeattrs;
                name    CDATA #IMPLIED
		notation CDATA #IMPLIED&gt;

&lt;!-- Pointers for above --&gt;
&lt;!element systemID      EMPTY&gt;
&lt;!attlist systemID      %linkattrs;&gt;
&lt;!-- Must be empty if href is used --&gt;
&lt;!element publicID      (#PCDATA) &gt;
&lt;!attlist publicID      %linkattrs;&gt;

&lt;!-- Notation Declarations --&gt;
&lt;!-- systemID and publicId (if present) must have the required syntax
--&gt;
&lt;!element notationDcl        (systemId, publicId?)&gt;
&lt;!attlist notationDcl   %linkattrs;
                name    CDATA #IMPLIED&gt;

&lt;!-- External entity with declarations to be included --&gt;
&lt;!-- systemID and publicId (if present) must have the required syntax
--&gt;
&lt;!element extDcls       empty&gt;
&lt;!attlist extDcls
                systemId CDATA #REQUIRED
                publicId CDATA #IMPLIED&gt;

&lt;!-- Namespace Declarations --&gt;
&lt;!-- systemID and publicId (if present) must have the required syntax
--&gt;
&lt;!element namespaceDcl  EMPTY&gt;
&lt;!attlist namespaceDcl  %linkattrs;
                name    CDATA #IMPLIED&gt;

</code></pre>
</body>
</html>


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lex at www.copsol.com  Mon Jun 23 15:19:57 1997
From: lex at www.copsol.com (Alex Milowski)
Date: Mon Jun  7 16:58:03 2004
Subject: XML Property Set
In-Reply-To: <01BC7EC3.70CDF6C0.jtauber@jtauber.com> from "James K. Tauber" at Jun 22, 97 04:11:02 am
Message-ID: <199706231317.IAA16211@copsol.com>

> > In my grove-illiterate opinion, yes!  The PropertySet is a sword of Damocles
> > hanging over these discussions.  It's clear that we can't have all 70+
> > properties.  IF (and I hope it's not a big IF) we can agree on a subset
> > of the property set then we don't have this problem dissipating the
> > discussion every time we get close :-)
> 
> It shouldn't be a big IF at all. Deciding what to rip out isn't too difficult. 
> The only problem lies in agreeing on how to do the additional classes (like 
> XMLDECL) needed and how (or if) the properties should be modularised.
> 
> > James Clark came up with a grove subset about 3 months back (have a look in
> > March xml-dev) in response to one of my typical blunderings for information.
> 
> I'll go back and check that. JamesC would be in a MUCH better position to write 
> an XML property set than me!

Well, I'm going to make an offer.  I've spent the better part of a year 
working on and with a Java-based API for groves.  I am certain that I can
create an interface from this (if not take it wholesale) for the XAPI and
groves.  So, my offer is that I can come up with a draft and "the James's"
and the lot can validate if I am on the right track.

I am fairly certain that at this point in time we should not say "maybe later"
to groves.  We should standardize parser access, event interfaces, and groves
at the same time.  We have enough developers with experience in all of these.

An API architecture that I propose is:

|---------------|
|   Grove API   |
|---------------|
| Grove Builder |
|     API       |
|----------------------------------|
|          XML Event API           |
|----------------------------------|
|          XML Parser API          |
|----------------------------------|

They are described as follows:

XML Parser API:

   Provides interfaces to instantiation and use of XML parsers such that
   a new XML parser can be integrated with existing application potentially
   with their knowledge.  This might allow a user to configure an application
   with (in Java) the class name of the XML Factory or whatever.

XML Event API:

   Provides an interface to allow XML parsers to deliver events to arbitrary
   applications.  My suggestion here is that we consider two kinds of APIs
   or at least constructs.  First, there is the idea of the "document string"
   which is the exact character for character representation of each
   construct.  Second, is a semantic event like "start element".  Both are
   useful depending on what one is doing.

Grove Builder API:

   This API bridges the gap between the event API and a grove.  Essentially,
   the algorithm for building a grove is most likely the same regardless of
   the implementation technology used to create the grove.  Hence, a standard
   event handler could be defined as well as an interface to allow different
   grove implementations to be used (for example, a JDBC grove and an in
   memory grove).

Grove API:

   This API, obviously, provides access to XML groves!

Again, my suggestion is that we take advantage of interfaces in XAPI.  
Interfaces will allow us to mix inheritance hierarchies in the above four APIs.

Now, I feel strongly that above APIs or what they become are developed 
together.  They can certain affect how each other is designed.

If we have these four APIs, we have the fundamental building blocks for all
kinds of XML applications--both simple and complex.  In addition, we have
the basic infrastructure for DSSSL!  (Ah, you can see my motivation now!)

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtauber at jtauber.com  Mon Jun 23 18:49:26 1997
From: jtauber at jtauber.com (James K. Tauber)
Date: Mon Jun  7 16:58:03 2004
Subject: XML Property Set
Message-ID: <01BC8038.C146D560.jtauber@jtauber.com>


On Monday, June 23, 1997 6:18 AM, Alex Milowski [SMTP:lex@www.copsol.com] 
wrote:
> Well, I'm going to make an offer.  I've spent the better part of a year
> working on and with a Java-based API for groves.  I am certain that I can
> create an interface from this (if not take it wholesale) for the XAPI and
> groves.  So, my offer is that I can come up with a draft and "the James's"
> and the lot can validate if I am on the right track.

Sounds good. JamesC's post from March pretty much outlines properties and 
classes for a document instance which leaves prolog and also the sort of nodes 
that would be necessary (for editors, etc) to ensure that a processor can 
output character-for-character what was input.

JamesT

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From k.grimes at liant.com  Mon Jun 23 20:12:35 1997
From: k.grimes at liant.com (Kevin Grimes)
Date: Mon Jun  7 16:58:03 2004
Subject: XAPI
Message-ID: <97Jun23.101149edt.32261-1@stelmo.liant.com>


From: Kevin Grimes@LIANT on 06/23/97 02:14 PM

May I suggest that the XML API be expressed in language neutral IDL rather
than Java. I believe the main impact this would have on current
interfaces/implementations would be to the member function that actually
loads/processes the document--you'd probably want to replace the Java
InputStream or URL parameter with a String (BSTR in Microsoft's IDL)--as in
the following lines from my xml.idl...

HRESULT processDocument([in] BSTR filename);
HRESULT processDocumentURL([in] BSTR url);

...and let processDocument create the InputStream or whatever. I currently
have APIs defined by IDL, with the XML processor implemented in Java, and
clients written in C++ and Java. The C++ client-Java processor combination
uses COM and the Microsoft Java Virtual Machine, but the Java client-Java
processor pair runs under either Sun or Microsoft (same XML processor). The
client can use an IGrove or IXMLApplication (callback) interface or both. I
haven't attempted events yet, but believe this will require making the XML
processor into a Java Bean.

Regards, Kevin Grimes Liant Software (k.grimes@liant.com)


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Mon Jun 23 21:46:29 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:03 2004
Subject: XML Property Set
Message-ID: <8420@ursus.demon.co.uk>

In message <199706231317.IAA16211@copsol.com> lex@www.copsol.com (Alex Milowski) writes:
[...]
> 
> Well, I'm going to make an offer.  I've spent the better part of a year 
> working on and with a Java-based API for groves.  I am certain that I can
> create an interface from this (if not take it wholesale) for the XAPI and
> groves.  So, my offer is that I can come up with a draft and "the James's"
> and the lot can validate if I am on the right track.

I think this is an excellent way forward and many thanks to all who are
contributing to this effort.  I am prepared to make the effort to understand
it and find ways of interfacing it with JUMBO.  

> 
> I am fairly certain that at this point in time we should not say "maybe later"
> to groves.  We should standardize parser access, event interfaces, and groves
> at the same time.  We have enough developers with experience in all of these.
> 

Just to check I have it right...

> An API architecture that I propose is:
> 
> |---------------|
> |   Grove API   |   <<< I assume this has similarities to JamesClark's 
> |---------------|             ReallySimple API ...
> | Grove Builder |
> |     API       |   <<< different memory/storage models are implemented here.
> |----------------------------------|
> |          XML Event API           |  << presumably fairly similar to NXP?
> |----------------------------------|
> |          XML Parser API          |  << Corresponds to John Tigue's analysis?
> |----------------------------------|
> 
[...]
> Now, I feel strongly that above APIs or what they become are developed 
> together.  They can certain affect how each other is designed.

I'd agree with this.  Can they be developed rapidly or in 
parallel so that there aren't bottlenecks/hold-ups?

> If we have these four APIs, we have the fundamental building blocks for all
> kinds of XML applications--both simple and complex.  In addition, we have
> the basic infrastructure for DSSSL!  (Ah, you can see my motivation now!)

If I get this right it makes the DSSSL approach and the JavaClass-per-Element
(as in JUMBO), very closely connected.  The Grove API serves both purposes?
If so, that looks very exciting.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From paul at arbortext.com  Mon Jun 23 22:20:40 1997
From: paul at arbortext.com (Paul Grosso)
Date: Mon Jun  7 16:58:03 2004
Subject: XML internal text entity replacement text
Message-ID: <3.0.32.19970623151756.00686b44@pophost.arbortext.com>

In the XML spec (31-Mar-97), the paragraph in section 1.5
just prior to production [9] says:

	Literal data is any quoted string containing neither a left
	angle bracket nor the quotation mark used as a delimiter for
	that string.  It may contain entity and character references.
	Literals are used for specifying the replacement text of
	internal entities (EntityValue)....

Production [9] itself, which defines EntityValue doesn't forbid "<".
The paragraph following productions 9-15 talks about parameter entity
and character refs, but not about element markup.

Section 4.3 [production 64] uses EntityValue, and section 4.3.1 talks
about internal entities, but says nothing about whether the replacement
text can contain elements.

I don't remember hearing that internal entities couldn't contain
element markup, and appendix A doesn't list it as a difference
from SGML, so I suspect the production is correct and the wording
that says "literal data can't have '<' and EntityValue is literal data"
is wrong.  Can anyone provide confirmation or denial of my assumption
that the above quoted text is wrong in suggesting that internal text
entity replacement text cannot contain element markup?


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lex at www.copsol.com  Mon Jun 23 22:27:12 1997
From: lex at www.copsol.com (Alex Milowski)
Date: Mon Jun  7 16:58:03 2004
Subject: XML Property Set
In-Reply-To: <8420@ursus.demon.co.uk> from "Peter Murray-Rust" at Jun 23, 97 08:04:52 pm
Message-ID: <199706232025.PAA16524@copsol.com>

> Just to check I have it right...
> 
> > An API architecture that I propose is:
> > 
> > |---------------|
> > |   Grove API   |   <<< I assume this has similarities to JamesClark's 
> > |---------------|             ReallySimple API ...
> > | Grove Builder |
> > |     API       |   <<< different memory/storage models are implemented here.
> > |----------------------------------|
> > |          XML Event API           |  << presumably fairly similar to NXP?
> > |----------------------------------|
> > |          XML Parser API          |  << Corresponds to John Tigue's analysis?
> > |----------------------------------|
> > 
> [...]
> > Now, I feel strongly that above APIs or what they become are developed 
> > together.  They can certain affect how each other is designed.
> 
> I'd agree with this.  Can they be developed rapidly or in 
> parallel so that there aren't bottlenecks/hold-ups?

Yes, I believe that they can be developed in parallel.  I for one can make the
commitment that we can develop a reference implementation of groves in
Java given that the Event API is standardized across XML Java parsers.

> > If we have these four APIs, we have the fundamental building blocks for all
> > kinds of XML applications--both simple and complex.  In addition, we have
> > the basic infrastructure for DSSSL!  (Ah, you can see my motivation now!)
> 
> If I get this right it makes the DSSSL approach and the JavaClass-per-Element
> (as in JUMBO), very closely connected.  The Grove API serves both purposes?
> If so, that looks very exciting.

Well, almost.  The Grove API is *one* component of the infrastructure necessary
for a DSSSL system.  In some senses, it is the most important.  In the DSSSLTK
I opted for several APIs--one for groves, one for flow objects and flow object
trees, and one for the DSSSL engine.  In the next version there will be
one for the parser implementation as well.

Groves allows us to deliver SDQL and DSSSL engines with minimal effort, but 
there is still more to standardize.

For example, a simple DSSSL engine API might be:

public interface Processor {
   SGMLDocument transform(SGMLDocument transformation,SGMLDocument doc);
   FlowObject format(SGMLDocument style,SGMLDocument doc);
}

The DSSSLTK is a little more complex then this because it provides the ability
to "compile" transformations and stylesheets into Transformation and 
StyleSheet objects.

My main point was that other APIs (DSSSL Engine for example) are *users* of
the Parser and Grove APIs.  Hence, these should be able to be developed first
independant of the others.

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lex at www.copsol.com  Mon Jun 23 22:33:37 1997
From: lex at www.copsol.com (Alex Milowski)
Date: Mon Jun  7 16:58:03 2004
Subject: XAPI
In-Reply-To: <97Jun23.101149edt.32261-1@stelmo.liant.com> from "Kevin Grimes" at Jun 23, 97 02:14:27 pm
Message-ID: <199706232031.PAA16544@copsol.com>

> 
> From: Kevin Grimes@LIANT on 06/23/97 02:14 PM
> 
> May I suggest that the XML API be expressed in language neutral IDL rather
> than Java. I believe the main impact this would have on current
> interfaces/implementations would be to the member function that actually
> loads/processes the document--you'd probably want to replace the Java
> InputStream or URL parameter with a String (BSTR in Microsoft's IDL)--as in
> the following lines from my xml.idl...
> 

<snip>

I like the idea of using IDL.  I have to confess that I haven't had much
of an opportunity to use it (although I would have liked to have).  So, where 
are the IDL "experts" that we can bring onboard to get that part correct?

...hey, I still have to support C++ no matter how much I like Java!  ;-)

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Mon Jun 23 23:06:37 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:03 2004
Subject: XAPI
Message-ID: <1.5.4.32.19970623210523.008c6594@pop.mindspring.com>

At 03:31 PM 6/23/97 -0500, Alex Milowski wrote:
>> 
>> From: Kevin Grimes@LIANT on 06/23/97 02:14 PM
>> 
>> May I suggest that the XML API be expressed in language neutral IDL rather
>> than Java. I believe the main impact this would have on current
>> interfaces/implementations would be to the member function that actually
>> loads/processes the document--you'd probably want to replace the Java
>> InputStream or URL parameter with a String (BSTR in Microsoft's IDL)--as in
>> the following lines from my xml.idl...
>
>I like the idea of using IDL.  I have to confess that I haven't had much
>of an opportunity to use it (although I would have liked to have).  So, where 
>are the IDL "experts" that we can bring onboard to get that part correct?

Hmmmm...I just talked to one of our IDL experts, who wasn't convinced that
this would be a helpful direction.

Is there really an advantage to defining it in IDL first? The IDL could be
created after the specification is finished in Java, and the Java-based
specification is probably easier to create, understand, and test. I *like*
making things language independent, but at this stage, I'm leery of adding
complexity that doesn't add any new conceptual power. A Java-based
specification can be translated into IDL later, and until the specification
has actually been *implemented* in more than one language, the IDL doesn't
buy you much.

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Mon Jun 23 23:24:11 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:03 2004
Subject: Documentation for DTDs
Message-ID: <8438@ursus.demon.co.uk>

In message <199706231819.LAA28781@mehitabel.eng.sun.com> Murray Altheim writes:
[...]
> 
> I would imagine that a small hack to the perl code would allow for a minor
> translation to XML:
> 
>     <?XML-DTDDOC identifier>
>     <P>
>     Description of identifier here.
>     </P>
>     <?XML-DTDDOC identifier>
>     <P>
>     Description of identifier here.
>     </P>
>     ...
> 
> I haven't checked to see what other changes might be necessary in the 
> change from full SGML to XML DTDs, but I suspect these might be minor.
> Since this is free, functional, and suits the purpose, I'd say go
> with the leader...

I would also agree - I don't know if Earl Hood reads this list - were you 
suggesting we ask him to think about the problem?

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From k.grimes at liant.com  Tue Jun 24 00:26:27 1997
From: k.grimes at liant.com (Kevin Grimes)
Date: Mon Jun  7 16:58:03 2004
Subject: XAPI
Message-ID: <97Jun23.105823edt.32261-1@stelmo.liant.com>


From: Kevin Grimes@LIANT on 06/23/97 06:30 PM

Limiting the types used in the Java XAPI to basic Java types like int,
boolean, String, plus the Java interfaces that you've actually implemented
in Java (IElement, IXMLProcessor etc.) will make the job of porting the
Java XAPI to IDL easier. InputStream was the first thing I tripped over
when I translated my Java APIs into IDL--to avoid defining an InputStream
interface in IDL I rewrote the member function processDocument to use
String. Here is my version of xml.idl. This is my first use of IDL, and
XML, and Java for that matter, so please let me know if I'm doing something
stupid. I compile this with the MIDL compiler that comes with Microsoft's
Visual Studio. We have C++ and Java clients that implement the
IXMLApplication callbacks or use the IGrove and INode interfaces to
traverse the parse tree.

Regards, Kevin (k.grimes@liant.com)

// xml.idl
[
    uuid (14859300-E953-11d0-B96A-00A024f2C5E0),
    version (0.0),
    helpstring("LXMLProcessor Type Library")
]
library LXMLProcessor
{
    importlib("stdole32.tlb");

    interface INodeList;

    [
        object,
        uuid (14859302-E953-11d0-B96A-00A024f2C5E0),
        helpstring("INode Interface"),
    ]
    interface INode : IDispatch
    {
        HRESULT addChild([in] INode* child);
        HRESULT getChild([in] int i, [out, retval] INode** child);
        HRESULT getChildren([out, retval] INodeList** children);
        HRESULT getNumberOfChildren([out, retval] int* count);
        HRESULT getParent([out, retval] INode** parent);
        HRESULT setParent([in] INode* parent);
    }

    [
        object,
        uuid (14859303-E953-11d0-B96A-00A024f2C5E0),
        helpstring("INodeList Interface"),
    ]
    interface INodeList : IDispatch
    {
        HRESULT addItem([in] INode* item);
        HRESULT getCount([out, retval] int* count);
        HRESULT getItem([in] int i, [out, retval] INode** item);
    }

    [
        object,
        uuid (14859304-E953-11d0-B96A-00A024f2C5E0),
        helpstring("ICharacterData Interface"),
    ]
    interface ICharacterData : IDispatch
    {
        HRESULT toString([out, retval] BSTR* cdata);
    }

    [
        object,
        uuid (14859305-E953-11d0-B96A-00A024f2C5E0),
        helpstring("IElement Interface"),
    ]
    interface IElement : IDispatch
    {
        HRESULT addAttribute([in] BSTR name, [in] BSTR value);
        HRESULT getAttributeValue([in] BSTR name, [out, retval] BSTR*
value);
        HRESULT getId([out, retval] BSTR* id);
        HRESULT getType([out, retval] BSTR* type);
        HRESULT isEmpty([out, retval] VARIANT_BOOL* empty);
        HRESULT setId([in] BSTR id);
        HRESULT setIsEmpty();
        HRESULT toString([out, retval] BSTR* retval);
    }

    [
        object,
        uuid (14859306-E953-11d0-B96A-00A024f2C5E0),
        helpstring("IGrove Interface"),
    ]
    interface IGrove : IDispatch
    {
        HRESULT getDocumentRoot([out, retval] INode** root);
        HRESULT setDocumentRoot([in] INode* root);
    }

    [
        object,
        uuid (14859307-E953-11d0-B96A-00A024f2C5E0),
        helpstring("IXMLApplication Interface"),
    ]
    interface IXMLApplication : IDispatch
    {
        HRESULT doBinaryEntity([in] BSTR systemId,
                                                         [in] BSTR
notationName,
                                                         [in] BSTR
notationSystemId);
        HRESULT doCharacterData([in] BSTR data);
        HRESULT doEmptyElement([in] IElement* e);
        HRESULT doEndOfDocument([in] BSTR docname);
        HRESULT doEndTag([in] IElement* e);
        HRESULT doFatalError([in] BSTR error);
        HRESULT doProcessingInstruction([in] BSTR pi);
        HRESULT doReportableError([in] BSTR error);
        HRESULT doStartOfDocument([in] BSTR docname);
        HRESULT doStartTag([in] IElement* e);
        HRESULT doWarning([in] BSTR warning);
    }

    [
        object,
        uuid (14859308-E953-11d0-B96A-00A024f2C5E0),
        helpstring("IXMLProcessor Interface"),
    ]
    interface IXMLProcessor : IDispatch
    {
        HRESULT buildParseTree([in] VARIANT_BOOL build);
        HRESULT checkValidity([in] VARIANT_BOOL check);
        HRESULT getGrove([out, retval] IGrove** grove);
        HRESULT processExternalEntities([in] VARIANT_BOOL process);
        HRESULT processDocument([in] BSTR filename);
        HRESULT processDocumentURL([in] BSTR spec);
        HRESULT setApplication([in] IXMLApplication* app);
    }

    [
        uuid (1485930A-E953-11d0-B96A-00A024f2C5E0),
        helpstring("Node Class"),
        appobject
    ]
    coclass Node
    {
        interface INode;
    }

    [
        uuid (1485930B-E953-11d0-B96A-00A024f2C5E0),
        helpstring("NodeList Class"),
        appobject
    ]
    coclass NodeList
    {
        interface INodeList;
    }

    [
        uuid (1485930C-E953-11d0-B96A-00A024f2C5E0),
        helpstring("CharacterData Class"),
        appobject
    ]
    coclass CharacterData
    {
        interface INode;
        interface ICharacterData;
    }

    [
        uuid (1485930D-E953-11d0-B96A-00A024f2C5E0),
        helpstring("Element Class"),
        appobject
    ]
    coclass Element
    {
        interface INode;
        interface IElement;
    }

    [
        uuid (1485930E-E953-11d0-B96A-00A024f2C5E0),
        helpstring("Grove Class"),
        appobject
    ]
    coclass Grove
    {
        interface IGrove;
    }

    [
        uuid (1485930F-E953-11d0-B96A-00A024f2C5E0),
        helpstring("XMLApplication Class"),
        appobject
    ]
    coclass XMLApplication
    {
        interface IXMLApplication;
    }

    [
        uuid (14859310-E953-11d0-B96A-00A024f2C5E0),
        helpstring("XMLProcessor Class"),
        appobject
    ]
    coclass XMLProcessor
    {
        interface IXMLProcessor;
    }
};


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lauren at sqwest.bc.ca  Tue Jun 24 00:46:47 1997
From: lauren at sqwest.bc.ca (Lauren Wood)
Date: Mon Jun  7 16:58:03 2004
Subject: XML API and the DOM
Message-ID: <m0wgHsU-0009XBC@sqailor.sqwest.bc.ca>

I thought I would post some clarification of the DOM work here,
since it's an acronym that's been mentioned a couple of times.

W3C has a working group called the Document Object Model Working
Group. See http://www.w3.org/MarkUp/DOM/.
To quote the activity statement on the User Interface domain
page on the W3C site: (http://www.w3.org/UI/)
"The Document Object Model is a platform- and language-neutral 
interface that will allow programs and scripts to dynamically access 
and update the content, structure and style of documents. 
The document can be further processed and the results of that 
processing can be incorporated back into the presented page."

The name of the DOM group is a little misleading, since we are trying
to standardize an interface rather than the underlying model.

Obviously there will be more than a little overlap between the
DOM and XAPI. The DOM will be more general - it has to work
with HTML documents as well as XML documents, and it has to 
be platform- and language-independent. We are writing the
interface in IDL, and will also do language bindings to Java and
probably C++ and JavaScript.

Most of the people on the DOM group are also on the xml-dev mailing 
list, as we want to be sure that whatever API is decided on here flows 
into the DOM specification. The full DOM specification will contain 
a lot more and take a lot longer than the basic XAPI being talked 
about here. The first draft of level one is due to be ready by the 
end of August and I will post the URL here when it is ready. 

cheers,

Lauren

---

Lauren Wood, SoftQuad, Inc.  (posting as chair of the W3C DOM WG)


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From sbb at Eng.Sun.COM  Tue Jun 24 04:19:14 1997
From: sbb at Eng.Sun.COM (Steve Byrne)
Date: Mon Jun  7 16:58:03 2004
Subject: XAPI
In-Reply-To: <97Jun23.105823edt.32261-1@stelmo.liant.com>
References: <97Jun23.105823edt.32261-1@stelmo.liant.com>
Message-ID: <199706240218.TAA07038@javinator.eng.sun.com>

Kevin,

Thank you for sending out your initial interface definition.  Unfortunately, I
believe that the IDL that people are talking about using as an interface
definition standard is OMG's IDL, which is different from the proprietary
Microsoft IDL MIDL.  If you retrieve the CORBA 2.0 specification from
www.omg.org, you'll find that Chapter 3 defines the OMG IDL, and this is, I
believe, what you should be using to define your interfaces with.

Steve

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ht at cogsci.ed.ac.uk  Tue Jun 24 10:55:30 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun  7 16:58:03 2004
Subject: XML internal text entity replacement text
In-Reply-To: Paul Grosso's message of Mon, 23 Jun 1997 15:19:09 -0500
References: <3.0.32.19970623151756.00686b44@pophost.arbortext.com>
Message-ID: <559.199706240855@grogan.cogsci.ed.ac.uk>

Paul asks:
>  [Question about '<' in EntityValue]

I believe the text is out of sync. with the productions, and the
productions are correct, and general entities can contain markup (or
rather, characters which will be treated as markup in the right
context).

That's the way we've implemented it in LT XML.

ht

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Tue Jun 24 12:36:14 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:03 2004
Subject: XAPI
Message-ID: <199706241036.LAA19984@GPO.iol.ie>


[Jonathan Robie]
>
>Is there really an advantage to defining it in IDL first? The IDL could be
>created after the specification is finished in Java, and the Java-based
>specification is probably easier to create, understand, and test. I *like*
>making things language independent, but at this stage, I'm leery of adding
>complexity that doesn't add any new conceptual power.

Hmmm. IDL == Language independent spec of an API....might this be better
approached
as an XML application? I.e. a DTD for the XML API spec. A doc conforming to
that spec.
that can be down-translated to Java, C++, Python  and (gasp) IDL!

APIs are stuctured docs. Let's practice what we are preaching and capture the
API in XML. Unless there are compelling reasons why this does not make sense.

Just thinking out loud and looking forward to a discussion on the issue.


Sean


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From akirkpatrick at ims-global.com  Tue Jun 24 15:05:56 1997
From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com)
Date: Mon Jun  7 16:58:03 2004
Subject: DTD invented by Microsoft?!
Message-ID: <E0wgVI3-00074J-00@punch.ic.ac.uk>

The following extract is from the MS white paper on XML. Are they   
describing
what we all understand as the DTD here or is it something else? If the   
former,
what are Microsoft doing taking credit for it, I wonder...

> Microsoft has proposed a "Document Type Definition" (DTD) syntax for   
expressing the schema for an > XML document directly within XML itself,   
allowing XML data to describe its own structure. Expressing > schemata   
within XML adds great power to the XML format because it makes it   
possible for software
> examining certain data to understand its structure without earlier   
knowledge about the data or its
> meaning.

The section on white-space also seems oversimplified at best.

BTW: I don't have anything against Microsoft, even if their developments
do seem to be all over the place at the moment.

Alfie.  

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Tue Jun 24 15:40:18 1997
From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula)
Date: Mon Jun  7 16:58:03 2004
Subject: XAPI
References: <1.5.4.32.19970623210523.008c6594@pop.mindspring.com>
Message-ID: <33B0135D.6163@edu.uni-klu.ac.at>

Jonathan Robie wrote:
> Hmmmm...I just talked to one of our IDL experts, who wasn't convinced that
> this would be a helpful direction.
> 
> Is there really an advantage to defining it in IDL first? The IDL could be
> created after the specification is finished in Java, and the Java-based
> specification is probably easier to create, understand, and test. 
<snip/>

I would agree with this statement. IDL is fine but not really
necessary right now. I personally would be too much afraid
that we loose momentum if we introduce yet another
obstacle - meaning having to get a full understanding of IDL. 

If there is somebody here that is willing to take our material
and transform it into an IDL spec., that'd be great, though.
Nevertheless, let's talk Java and then we proceed from there.

IMHO :) 

-- 
Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Tue Jun 24 15:40:37 1997
From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula)
Date: Mon Jun  7 16:58:04 2004
Subject: XML Property Set
References: <199706232025.PAA16524@copsol.com>
Message-ID: <33B02038.495A@edu.uni-klu.ac.at>

|---------------|
|   Grove API   |   <<< I assume this has similarities to JamesClark's
|---------------|             ReallySimple API ...
| Grove Builder |
|     API       |   <<< different memory/storage models are implemented
here.
|----------------------------------|
|          XML Event API           |  << presumably fairly similar to
NXP?
|----------------------------------|
|          XML Parser API          |  << Corresponds to John Tigue's
analysis?
|----------------------------------|

I guess the two bottom layers and the connection between 
Event API and Grove Builder API are my call.

Let me ask you :

Should we go for a pure event oriented API, like it is now 
implemented in NXP (and leave it up to the next layer to create the 
objects) or should we have creator methods that would be set in the
event 
API like now the Esis object is set (setEsis) in the parser. These
methods 
would be called in case of a specific event and the result of this 
method call would be send to the application via the event interface. 

For instance we would have an interface :

public interface Constructors
{
   public Element createElement();
   public Attribute createAttribute();
   ...
}

Element and Attribute etc. will probably be subclasses of Node,
as per James' simple API. Node, however, should be defined
very generally so that we don't *have* to think about 
DSSSL when want to talk about/use a node.

An event-producer class conforming to the Esis(++) interface would need
to
implement a method : 

public void setCreator(Constructors constr);

It would work then like : 

a.) parser recognises a certain tag
b.) calls the appropriate creator method to create an 
object of class element
c.) sends the created object to the next layer via
the event producer

An alternative to the creator methods would be to
set the objects to be created via a Hashtable of Strings.
Then the objects would be create via their "name".

For instance an entry in the hashtable would look 
like : "Element" -----> "dsssl.Element" and the
result would be the creation of an object of type
dsssl.Element for the "event" Element.

-- 
Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lex at www.copsol.com  Tue Jun 24 16:21:04 1997
From: lex at www.copsol.com (Alex Milowski)
Date: Mon Jun  7 16:58:04 2004
Subject: XML Property Set
In-Reply-To: <33B02038.495A@edu.uni-klu.ac.at> from "Norbert H. Mikula" at Jun 24, 97 12:30:00 pm
Message-ID: <199706241418.JAA16855@copsol.com>

> 
> |---------------|
> |   Grove API   |   <<< I assume this has similarities to JamesClark's
> |---------------|             ReallySimple API ...

I'm not certain what the ReallySimple API is.  I was think along the lines
of the DSSSLTK dsssl.grove package.

> | Grove Builder |
> |     API       |   <<< different memory/storage models are implemented
> here.

Yes, exactly.  The Grove API is abstract and the Grove Builder API hides the
exact implementation methodologies for constructing grove objects from the
rest of the world.


> |----------------------------------|
> |          XML Event API           |  << presumably fairly similar to
> NXP?

Potentially.  I assume there will be some "grand convergence" on this between
parsers and the needs of a grove.


> |----------------------------------|
> |          XML Parser API          |  << Corresponds to John Tigue's
> analysis?
> |----------------------------------|

Yes, and maybe more infrastructure if we are up to it.  IMHO, this API should
provide the ability for different parsers to be *configured* for use within
an application.  Hence, this should be an abstract component that is
complete enough to allow most (if not all) applications to not have to
know the implementation details.  

We could use a factory design pattern here.

> I guess the two bottom layers and the connection between 
> Event API and Grove Builder API are my call.

Potentially.  I would guess that there could be a reference implementation of
an event handler that "knows" how to interface a grove builder.  It is
probably not true that *all* grove builders can be accessed the same.  For
example, in a database situation, extra work may be necessary in the connection
of the events to the grove builder. 

> Let me ask you :
> 
> Should we go for a pure event oriented API, like it is now 
> implemented in NXP (and leave it up to the next layer to create the 
> objects) or should we have creator methods that would be set in the
> event 
> API like now the Esis object is set (setEsis) in the parser. These
> methods 
> would be called in case of a specific event and the result of this 
> method call would be send to the application via the event interface. 

I think the event API is the most abstract and lowest level for a parser.  In
this manor, applications that do not need "grove objects" will not have to
have them created within some implementation. 

SP, for example, has quite an extensive event-oriented API.  Each event has
a great deal of detail (basically, everything there is to know).  It is
fairly easy to access the high level semantics of these events.  Low level
semantics like document strings--character for character representations of
the event--are a littler more work.  This is a design decision that we have
to make.

We could have two event APIs--one for document string access and one for
high level access including document string information, but that could
get far to complex.  One might also beg the question of why we need the
document string separated out when you can get it from the high-level
events.

> 
> For instance we would have an interface :
> 
> public interface Constructors
> {
>    public Element createElement();
>    public Attribute createAttribute();
>    ...
> }
> 
> Element and Attribute etc. will probably be subclasses of Node,
> as per James' simple API. Node, however, should be defined
> very generally so that we don't *have* to think about 
> DSSSL when want to talk about/use a node.
> 
> An event-producer class conforming to the Esis(++) interface would need
> to
> implement a method : 
> 
> public void setCreator(Constructors constr);
> 
> It would work then like : 
> 
> a.) parser recognises a certain tag
> b.) calls the appropriate creator method to create an 
> object of class element
> c.) sends the created object to the next layer via
> the event producer

Well, the above example is similar to the GroveConstructor class in the
DSSSLTK.  The GroveConstructor is different in that it trys to only allow
sub-node objects to be created from appropriate parents.  For example, the
document element can only be created by passing in the SGMLDocument node.
An element can only be created by passing in the parent of the element.

> 
> An alternative to the creator methods would be to
> set the objects to be created via a Hashtable of Strings.
> Then the objects would be create via their "name".
> 
> For instance an entry in the hashtable would look 
> like : "Element" -----> "dsssl.Element" and the
> result would be the creation of an object of type
> dsssl.Element for the "event" Element.

I'm not certain I understand what you mean.  Can you give a more
detailed example?

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eliot at isogen.com  Tue Jun 24 17:05:42 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:04 2004
Subject: DTD invented by Microsoft?!
Message-ID: <3.0.32.19970624095340.0077f968@swbell.net>

At 02:06 PM 6/24/97 +0000, akirkpatrick@ims-global.com wrote:
>> Microsoft has proposed a "Document Type Definition" (DTD) syntax for   
>expressing the schema for an > XML document directly within XML itself,   
>allowing XML data to describe its own structure. 

In Microsoft's defense, they have correctly used the term "document type
definition", which is what the acronym "DTD" expands to, to mean the
overall definition of a document type.  As SGML only defines part of the
total mechanism one needs to define a document type (the declarations
allowed within a DOCTYPE declarations, what we are now calling "DTD
declarations"), you are free to define additional formalisms for defining
schemas however you want.

Many people have defined "DTDs for DTDs" (including myself)--the only thing
you can't do is claim to be *replacing* the declarations defined by 8879.
Of course, since XML (and the WebSGML TC) allow the DOCTYPE declaration (or
its contained declarations) to be omitted, there's nothing preventing the
use of some alternate syntax for schema representation as an *application
convention*.

The XML ERB is on record as stating that while it might be useful to have a
"better" syntax for DTD declarations, the definition of such is out of
scope for XML, and in any case is a tar pit second only to name spaces (and
thus best left to the SGML revision).

Cheers,

E.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From matthewg at poet.de  Tue Jun 24 17:41:49 1997
From: matthewg at poet.de (Matthew Gertner)
Date: Mon Jun  7 16:58:04 2004
Subject: XAPI
Message-ID: <01BC80C5.7984F300@matthewg@poet.de>

> [Jonathan Robie]
> >
> >Is there really an advantage to defining it in IDL first? The IDL could 
be
> >created after the specification is finished in Java, and the Java-based
> >specification is probably easier to create, understand, and test. I 
*like*
> >making things language independent, but at this stage, I'm leery of 
adding
> >complexity that doesn't add any new conceptual power.
>
> Hmmm. IDL == Language independent spec of an API....might this be better
> approached
> as an XML application? I.e. a DTD for the XML API spec. A doc conforming 
to
> that spec.
> that can be down-translated to Java, C++, Python  and (gasp) IDL!
>
> APIs are stuctured docs. Let's practice what we are preaching and capture 
the
> API in XML. Unless there are compelling reasons why this does not make 
sense.
>
> Just thinking out loud and looking forward to a discussion on the issue.
>
> Sean

If I may be so bold, this sounds like a great idea to me. Producing an API 
in Java is a valid approach and is more than defensible considering the 
current Internet climate. However, there is also an argument to be made for 
a language-independent approach (as evidenced by the discussion in this 
thread). If this approach is to be favored, it seems to me to make far more 
sense to develop a generalized DTD for API specifications and make the 
specification itself in XML. This would have the following advantages:

1) Make a truly language-independent spec which conforms to the XML 
philosophy. (I am not going to talk about the "spirit of XML". :-)
2) Produce a reusable DTD which would have significant value in its own 
right.
3) Provide the perfect basis for generating documentation directly from the 
API specification.
4) Ensure that every "user" has the necessary expertise to understand the 
formulation of the spec. I am not sure how many people really master IDL. 
Presumably anyone using XAPI will be able to read and understand XML.
5) Provide a demonstration to the outside world as to how XML can be used 
to facilitate language/application independence and information reuse.

It couldn't be that hard to write a DSSSL app to produce a concrete 
language implementation from the XML-based spec, right?

Cheers,

Matthew

------------------------------------------------
Matthew Gertner
Project Manager/Architect, Internet/Document Management
POET Software GmbH

Tel: +49 (40) 609 90254
Fax: +49 (40) 609 90115
E-mail: matthewg@poet.de
------------------------------------------------


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From clloyd at gorge.net  Tue Jun 24 17:52:49 1997
From: clloyd at gorge.net (Chris Lloyd)
Date: Mon Jun  7 16:58:04 2004
Subject: XAPI and IDL
Message-ID: <01BC807B.A8998F60@chaosmobile.com.chaos>

Kevin,
I might be wrong but I think we are getting mixed up between pure IDL and inplementing an API that uses Microsoft's IDispatch interface. IDispatch is limited by VARIANT types and LPUNKNOWNs but IDL and even COM for that matter is not. 

Even if Alex passes a pointer to a stream or some other object, that object can be wrapped with IUNKNOWN if you are trying to provide a COM interface for the object.

I know that it's easier to provide a COM interface using IDispath and basic types but I think it's just far too limiting for an API. Maybe we should let the interface be designed in Java first and worry about the IDL later. :)

Chris Lloyd

From: Kevin Grimes@LIANT on 06/23/97 06:30 PM

Limiting the types used in the Java XAPI to basic Java types like int,
boolean, String, plus the Java interfaces that you've actually implemented
in Java (IElement, IXMLProcessor etc.) will make the job of porting the
Java XAPI to IDL easier. InputStream was the first thing I tripped over
when I translated my Java APIs into IDL--to avoid defining an InputStream
interface in IDL I rewrote the member function processDocument to use
String. Here is my version of xml.idl. This is my first use of IDL, and
XML, and Java for that matter, so please let me know if I'm doing something
stupid. I compile this with the MIDL compiler that comes with Microsoft's
Visual Studio. We have C++ and Java clients that implement the
IXMLApplication callbacks or use the IGrove and INode interfaces to
traverse the parse tree.

Regards, Kevin (k.grimes@liant.com)

  

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From akirkpatrick at ims-global.com  Tue Jun 24 17:58:50 1997
From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com)
Date: Mon Jun  7 16:58:04 2004
Subject: XAPI
Message-ID: <E0wgXzR-0001k4-00@punch.ic.ac.uk>

Matthew Gertner wrote...

If I may be so bold, this sounds like a great idea to me. Producing an   
API
in Java is a valid approach and is more than defensible considering the
current Internet climate. However, there is also an argument to be made   
for
a language-independent approach (as evidenced by the discussion in this
thread). If this approach is to be favored, it seems to me to make far   
more
sense to develop a generalized DTD for API specifications and make the
specification itself in XML. This would have the following advantages:

 ------------------------
I think the concern was that this kind of overhead might
hold up the API development (some were already arguing
that it is too early to think about a grove API at all).

I agree that Java may be too web-orientated but would
rather see the API take shape in this language than not at all.

Having said that, the people doing the API work should
try to make it easy to get the structures/methods/documentation
into other formats and should certainly minimise any
language specific areas (I guess Java is a good environment
in this respect?!).

Alfie.  

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From peter at techno.com  Tue Jun 24 19:39:08 1997
From: peter at techno.com (Peter Newcomb)
Date: Mon Jun  7 16:58:04 2004
Subject: XAPI
Message-ID: <199706241729.NAA16378@exocomp.techno.com>

> [Jonathan Robie]
>
> Hmmm. IDL == Language independent spec of an API....might this be
> better approached as an XML application? I.e. a DTD for the XML API
> spec. A doc conforming to that spec.  that can be down-translated to
> Java, C++, Python and (gasp) IDL!

Designing a generic language (or DTD) for API description is
non-trivial (consider the work that has gone into creating IDL) and
therefore cannot be done satisfactorily in time for XAPI, given that
XAPI is needed _now_.

Having said that, I agree with you completely.  In fact, this has
already been done to some extent for SGML: it's called the SGML
property set.  A property set is nothing more than interface
specifications for classes of objects using the "grove" object model.

Property sets are not, however, suited for generic interface
descriptions.  There is no way, for instance, to describe an action
method such as "parse this" in a property set.  A property sets are
abstract interfaces to static groves.

In my own work, I have developed another SGML language (otherwise
known as a DTD) that allows me to describe both interfaces and
implementations of object classes using a more generic object model.
This language also allows me to tie some of these classes and methods
to properties in property sets, thus providing a framework for
implementation of property sets, but also giving me a platform and
language-neutral representation of my entire API and implementation.
This representation is then compiled down to APIs and implementations
for specific platforms and languages.

This system has taken some time to develop, (and is still under
development), but has already shown its worth in terms of ease of
coding, maintenance, porting, and documentation.  However, it is still
not a fully generic and complete system (it may never be), as I have
geared it towards implementing property sets, and have only added and
implemented those features needed for doing so.

In the long run, I suggest taking a similar approach: create an XML
language for describing APIs and use it to describe the XAPI, linking
it to the relevant classes and properties from the SGML and/or XML
property sets (for use with DSSSL and/or HyTime).  Develop the API
description language along with the XAPI described with it; add to the
API description language only those features needed for XAPI, while
leaving the door open for further enhancements needed for other
applications.  Study the object models used by IDL, Java, C++, Python,
and others, especially with regards to how they impact API
development.

In the short term, let us develop XAPI-J with the above in mind
(somewhere near the back) so that people can use it now, and so that
it can be used as a model for future development.

-peter

--
Peter Newcomb                           TechnoTeacher, Inc.
233 Spruce Avenue                       P.O. Box 23795
Rochester, NY 14611-4041 USA            Rochester, New York 14692-3795 USA
+1 716 529 4303 (home)                  +1 716 464 8696 (direct)
+1 716 755 8698 (cell)                  +1 716 271 0796 (main)
+1 716 529 4304 (fax)                   +1 716 271 0129 (fax)
peter@petes-house.rochester.ny.us       peter@techno.com
http://www.petes-house.rochester.ny.us  http://www.techno.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Tue Jun 24 20:08:23 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:04 2004
Subject: XAPI
Message-ID: <1.5.4.32.19970624180706.00933ee0@pop.mindspring.com>

At 01:29 PM 6/24/97 -0400, Peter Newcomb wrote:
>> [Jonathan Robie]
>>
>> Hmmm. IDL == Language independent spec of an API....might this be
>> better approached as an XML application? I.e. a DTD for the XML API
>> spec. A doc conforming to that spec.  that can be down-translated to
>> Java, C++, Python and (gasp) IDL!

For the record, I didn't say that. Sean McGrath did, quoting my earlier
message, which went like this:

>Is there really an advantage to defining it in IDL first? The IDL could be
>created after the specification is finished in Java, and the Java-based
>specification is probably easier to create, understand, and test. I *like*
>making things language independent, but at this stage, I'm leery of adding
>complexity that doesn't add any new conceptual power.

So not only am I in agreement with the rest of your message, your message
actually agrees with what I said earlier!

Incidentally, Alex Milowski referred to the "factory design pattern". Using
design patterns as a basis for the design is really helpful, because there
is a book which describes each of these patterns in detail, complete with
diagrams, scenarios, etc. For instance, there are 9 pages on the factory
design pattern that Alex mentioned. This makes it much easier to communicate
about design choices on a high conceptual level. POET's Wildflower API,
which was developed completely independently of Alex's software, also uses a
design patterns approach to parse, manage, and navigate SGML documents in
the document repository. I wonder if some of the rest of us are also design
patterns critters?


Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtigue at datachannel.com  Tue Jun 24 21:31:09 1997
From: jtigue at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:04 2004
Subject: XAPI
References: <1.5.4.32.19970624180706.00933ee0@pop.mindspring.com>
Message-ID: <33B020B2.65235836@datachannel.com>

Jonathan Robie wrote:

> Incidentally, Alex Milowski referred to the "factory design pattern".
> Using
> design patterns as a basis for the design is really helpful, because
> there
> is a book which describes each of these patterns in detail, complete
> with
> diagrams, scenarios, etc. For instance, there are 9 pages on the
> factory
> design pattern that Alex mentioned. This makes it much easier to
> communicate
> about design choices on a high conceptual level. POET's Wildflower
> API,
> which was developed completely independently of Alex's software, also
> uses a
> design patterns approach to parse, manage, and navigate SGML documents
> in
> the document repository. I wonder if some of the rest of us are also
> design
> patterns critters?

The book is Design Patterns Element of Reusable Object-Oriented Software
by Erich Gamma, Helm, Johnson, and Vlissides (Addison-Wesley) ISBN:
0-201-63361-2 or see http://st-www.cs.uiuc.edu/users/patterns/ for a
blurb. XAPI included IXMLProcessorFactory which uses the Factory Method
pattern on page 107. Having common handles to design concepts definitely
helps the conversation and I have used Gamma et al. where I can in XAPI.

--
John Tigue
Programmer
jtigue@datachannel.com
DataChannel (http://www.datachannel.com)
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 316 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970624/d16ffbae/vcard.vcf
From cbullard at hiwaay.net  Wed Jun 25 01:45:28 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun  7 16:58:04 2004
Subject: DTD invented by Microsoft?!
References: <E0wgVI3-00074J-00@punch.ic.ac.uk>
Message-ID: <33B05BF1.47B3@hiwaay.net>

akirkpatrick@ims-global.com wrote:
> 
> The following extract is from the MS white paper on XML. Are they
> describing
> what we all understand as the DTD here or is it something else? If the
> former,
> what are Microsoft doing taking credit for it, I wonder...
> 
> > Microsoft has proposed a "Document Type Definition" (DTD) syntax for
> expressing the schema for an > XML document directly within XML itself,
> allowing XML data to describe its own structure. Expressing > schemata
> within XML adds great power to the XML format because it makes it
> possible for software
> > examining certain data to understand its structure without earlier
> knowledge about the data or its
> > meaning.
> 
> The section on white-space also seems oversimplified at best.
> 
> BTW: I don't have anything against Microsoft, even if their developments
> do seem to be all over the place at the moment.

This appears to be the long awaited and somewhat dreaded 
attempt to use instance syntax for type definitions.  It 
is an idea that has been floated several times on the 
XML WG list and generally resisted.

It is a bad idea and may be the reason SGML community 
members finally withdraw from XML development.

len

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed Jun 25 01:51:22 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:04 2004
Subject: XML Property Set
Message-ID: <8492@ursus.demon.co.uk>

I think we have the basis of agreement on the overall architecture and it's 
important to move reasonably quickly with it.  From what I can gather the main
players are in agreement with the 4-block structure and - as a typical webhacker
- it seems to make sense to me.  [There are two places that I would hook JUMBO
into - the Event API and the Grove API.

It's VERY important to retain focus.  The 'Grove' approach should be similar
to the ReallySimple API [This was a propsoed Interface from James Clark on 
this list back in March under the 'Simple API' thread.  James produced an
interface that even I can understand, and that's what I hope we are
taking forward (in spirit at least).

In message <199706241418.JAA16855@copsol.com> lex@www.copsol.com (Alex Milowski) writes:
> > 
> > |---------------|
> > |   Grove API   |   <<< I assume this has similarities to JamesClark's
> > |---------------|             ReallySimple API ...
> 
> I'm not certain what the ReallySimple API is.  I was think along the lines
> of the DSSSLTK dsssl.grove package.

I think it's very important to keep this as lightweight as possible at this 
stage.  We're building prototypes (the language isn't stable - we don't know
what July 1 might include/omit :-).  So this API must make sense to a wide
range of people - it will be their main interaction with a parser.

> 
> > | Grove Builder |
> > |     API       |   <<< different memory/storage models are implemented
> > here.
> 
> Yes, exactly.  The Grove API is abstract and the Grove Builder API hides the
> exact implementation methodologies for constructing grove objects from the
> rest of the world.

I suspect that it will quite a small community that needs to interact with this;
specialist developers who care about the memory model, caching, interaction with
OBDs etc.

> 
> 
> > |----------------------------------|
> > |          XML Event API           |  << presumably fairly similar to
> > NXP?
> 
> Potentially.  I assume there will be some "grand convergence" on this between
> parsers and the needs of a grove.

Good.
> 
> 
> > |----------------------------------|
> > |          XML Parser API          |  << Corresponds to John Tigue's
> > analysis?
> > |----------------------------------|
> 
> Yes, and maybe more infrastructure if we are up to it.  IMHO, this API should
> provide the ability for different parsers to be *configured* for use within
> an application.  Hence, this should be an abstract component that is
> complete enough to allow most (if not all) applications to not have to
> know the implementation details.  

Yes.  It hasn't been difficult to interact with the current parsers, but as
more come we shall get terminological slippage and this may cause confusion.
These APIs should hold the terminology fixed.
> 
[...]
> 
> I'm not certain I understand what you mean.  Can you give a more
> detailed example?

I think it would be very valuable to have some examples as soon as reasonable.
We shall then get a feel for the size of the property set (hopefully very small)
the factory model and so forth.  It would be very useful to have a V0.1 to
concentrate discussion and to get a feel for scale.

[WRT other discussions, I agree with those who are suggesting pure Java at 
present.  Although the extensions to other languages are probably fairly
straightforward, it all adds effort.  Java will prove the concept, show the
problems, and it is then much easier to extend and generalise.]

Remember also that there is/will_be a lot of work on XML-LINK, XML-TYPE,
XML-STYLE and we shall all get diverted when these crystallise.  A relatively
solid processing API will help all of these efforts as well.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From peter at techno.com  Wed Jun 25 05:44:26 1997
From: peter at techno.com (Peter Newcomb)
Date: Mon Jun  7 16:58:04 2004
Subject: XAPI
In-Reply-To: <1.5.4.32.19970624180706.00933ee0@pop.mindspring.com> (message
	from Jonathan Robie on Tue, 24 Jun 1997 14:07:06 -0400)
Message-ID: <199706250321.XAA16541@exocomp.techno.com>

> Date: Tue, 24 Jun 1997 14:07:06 -0400
> From: Jonathan Robie <jwrobie@mindspring.com>
> Cc: xml-dev@ic.ac.uk
> 
> At 01:29 PM 6/24/97 -0400, Peter Newcomb wrote:
> >> [Jonathan Robie]
> >>
> >> Hmmm. IDL == Language independent spec of an API....might this be
> >> better approached as an XML application? I.e. a DTD for the XML API
> >> spec. A doc conforming to that spec.  that can be down-translated to
> >> Java, C++, Python and (gasp) IDL!
> 
> For the record, I didn't say that. Sean McGrath did, quoting my earlier
> message, which went like this:

Sorry about that... I receive CTS as email and had already deleted
your post before I read Sean's post and was moved to write mine.  I
got confused by the quoting.

> >Is there really an advantage to defining it in IDL first? The IDL could be
> >created after the specification is finished in Java, and the Java-based
> >specification is probably easier to create, understand, and test. I *like*
> >making things language independent, but at this stage, I'm leery of adding
> >complexity that doesn't add any new conceptual power.
> 
> So not only am I in agreement with the rest of your message, your message
> actually agrees with what I said earlier!

Yes... I had read your post, and meant mine as a second to yours.

-peter

--
Peter Newcomb                           TechnoTeacher, Inc.
233 Spruce Avenue                       P.O. Box 23795
Rochester, NY 14611-4041 USA            Rochester, New York 14692-3795 USA
+1 716 529 4303 (home)                  +1 716 464 8696 (direct)
+1 716 755 8698 (cell)                  +1 716 271 0796 (main)
+1 716 529 4304 (fax)                   +1 716 271 0129 (fax)
peter@petes-house.rochester.ny.us       peter@techno.com
http://www.petes-house.rochester.ny.us  http://www.techno.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ht at cogsci.ed.ac.uk  Wed Jun 25 10:17:11 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun  7 16:58:04 2004
Subject: DTD invented by Microsoft?!
In-Reply-To: len bullard's message of Tue, 24 Jun 1997 18:44:50 -0500
References: <E0wgVI3-00074J-00@punch.ic.ac.uk> <33B05BF1.47B3@hiwaay.net>
Message-ID: <730.199706250817@grogan.cogsci.ed.ac.uk>

Len writes:

>  This appears to be the long awaited and somewhat dreaded 
>  attempt to use instance syntax for type definitions.  It 
>  is an idea that has been floated several times on the 
>  XML WG list and generally resisted.

It was resisted, correctly in my view, as a component of XML-lang
itself, and in the decision the point was made several times that the
right place for this was one level up, as a generic application.
That's what the schema proposal in the XML-data document is aimed at
providing.

>  
>  It is a bad idea and may be the reason SGML community 
>  members finally withdraw from XML development.

I'd be interested to hear your reasons for thinking it's a bad idea --
not surprisingly I think it's a good idea -- it puts flexibility in
the place it ought to be, and provides clean mechanisms for dealing
with precisely the tasks which PEs are messily used for now.  Why
disagreement about this point should implicate the relation between
XML and SGML is unclear to me.

ht


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ht at cogsci.ed.ac.uk  Wed Jun 25 10:27:52 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun  7 16:58:04 2004
Subject: XML Property Set
In-Reply-To: "Norbert H. Mikula"'s message of Tue, 24 Jun 1997 12:30:00 -0700
References: <199706232025.PAA16524@copsol.com>
	<33B02038.495A@edu.uni-klu.ac.at>
Message-ID: <734.199706250827@grogan.cogsci.ed.ac.uk>

I think this is all useful discussion, but I also think that it's
getting too monolithic.  In our experience with using a simple API to
access a (normalised) SGML document stream, which led to our LT XML
tool, we found that most of the quick and simple tools (often
DTD-specific) we wanted to build were most easily constructed on top
of an I/O model, not and event model or a grove model.  That is, one
where the basic APPLICATION structure was

while (bit=GetBit(xmlStream)) {
   select (bit.type) {
    case startTag: ...
    case endTag: ...
    case textData: ...
    case PI: ...
   }
}

I'd be sorry to lose this level, but am not clear where it fits in the
developing picture.  Is this the 'XML Parser API'?

ht

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From akirkpatrick at ims-global.com  Wed Jun 25 11:08:56 1997
From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com)
Date: Mon Jun  7 16:58:04 2004
Subject: DTD invented by Microsoft?!
Message-ID: <E0wgo4J-0003F2-00@punch.ic.ac.uk>

Could someone explain what "instance syntax for type definitions"
means. Thanks!

ht@cogsci.ed.ac.uk[SMTP:PC @INTERNET {ht@cogsci.ed.ac.uk}]
writes:

 --------------------------------------------------------------------------  
 -------------
Len writes:

>  This appears to be the long awaited and somewhat dreaded
>  attempt to use instance syntax for type definitions.  It
>  is an idea that has been floated several times on the
>  XML WG list and generally resisted.

It was resisted, correctly in my view, as a component of XML-lang
itself, and in the decision the point was made several times that the
right place for this was one level up, as a generic application.
That's what the schema proposal in the XML-data document is aimed at
providing.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lex at www.copsol.com  Wed Jun 25 15:38:21 1997
From: lex at www.copsol.com (Alex Milowski)
Date: Mon Jun  7 16:58:05 2004
Subject: XML Property Set
In-Reply-To: <734.199706250827@grogan.cogsci.ed.ac.uk> from "Henry S. Thompson" at Jun 25, 97 09:27:44 am
Message-ID: <199706251336.IAA00363@copsol.com>

> I think this is all useful discussion, but I also think that it's
> getting too monolithic.  In our experience with using a simple API to
> access a (normalised) SGML document stream, which led to our LT XML
> tool, we found that most of the quick and simple tools (often
> DTD-specific) we wanted to build were most easily constructed on top
> of an I/O model, not and event model or a grove model.  That is, one
> where the basic APPLICATION structure was
> 
> while (bit=GetBit(xmlStream)) {
>    select (bit.type) {
>     case startTag: ...
>     case endTag: ...
>     case textData: ...
>     case PI: ...
>    }
> }
> 
> I'd be sorry to lose this level, but am not clear where it fits in the
> developing picture.  Is this the 'XML Parser API'?

IMHO, this would be the XML Event API.  Somewhere previous to this you would
use the XML Parser API to setup the parser and tell it what document to
parse.

==============================================================================
R. Alexander Milowski     http://www.copsol.com/   alex@copsol.com
Copernican Solutions Incorporated                  (612) 379 - 3608

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Wed Jun 25 16:28:01 1997
From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula)
Date: Mon Jun  7 16:58:05 2004
Subject: XML Property Set
References: <199706232025.PAA16524@copsol.com>
		<33B02038.495A@edu.uni-klu.ac.at> <734.199706250827@grogan.cogsci.ed.ac.uk>
Message-ID: <33B185E1.3D25@edu.uni-klu.ac.at>

Henry S. Thompson wrote:
> I'd be sorry to lose this level, but am not clear where it fits in the
> developing picture.  Is this the 'XML Parser API'?

What you have in mind is pretty much the level of the
event based API. 

To my understanding the XML parser API
is meant to tell the parser how to behave (i.e.
validate/non-validate, set optional error reporting..)
and to set things like the object dealing with
the Esis stream, set the input stream etc.

-- 
Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jeanpa at microsoft.com  Wed Jun 25 19:04:24 1997
From: jeanpa at microsoft.com (Jean Paoli)
Date: Mon Jun  7 16:58:05 2004
Subject: XML-Data
Message-ID: <78DFE33066ABD0118B9200805FD431BA93299D@RED-16-MSG.dns.microsoft.com>

The current version of XML-Data, the Microsoft Position Paper, is at
http://www.microsoft.com/standards/xml, along with a white paper on XML.
Together, these documents present our vision for the use of structured
data on the web. 
I hope these are easier to use than the version I mailed to you on
Sunday.

The XML-Data paper is also in http://www.w3.org/XML/Group/9706/xml-data
(thanks to Dan Conolly).

-Jean Paoli

> ----------
> From: 	Jean Paoli
> Sent: 	Sunday, June 22, 1997 10:37 PM
> To: 	'w3c-sgml-wg@w3.org'; 'xml-dev@ic.ac.uk';
> 'w3c-sgml-erb@hpsgml.fc.hp.com'
> Cc: 	Andrew Layman; Thomas Reardon; Adam Bosworth; Hadi Partovi
> Subject: 	XML-Data
> 
> I am pleased to present XML-Data, a Position Paper from Microsoft.
> XML-Data is an application of XML for exchanging 
> structured data and metadata on the Internet. 
> This position paper is sent to multiple working groups
> in the W3C dealing with this subject (XML, meta-data)
> and we expect this paper to be discussed and improved
> by these working groups.
> The current proposal needs namespaces and uses the Layman/Bray
> proposal.
> 
> The URL of this paper (on the Microsoft site) will be posted tomorrow.
> -Jean Paoli
> 
> ----------------
> 
> 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Ingo.Macherius at TU-Clausthal.de  Wed Jun 25 19:54:44 1997
From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius)
Date: Mon Jun  7 16:58:05 2004
Subject: DTD invented by Microsoft?!
In-Reply-To: <E0wgo4J-0003F2-00@punch.ic.ac.uk>
Message-ID: <199706251753.TAA18214@sinfonix.rz.tu-clausthal.de>

> Could someone explain what "instance syntax for type definitions"
> means. Thanks!

<meta>
	I am aware this is a beginner's question. Is xml-dev the right place
	to answer ? If not, where is the place for such Q/A ? 
</meta>

Anyway:
In valid XML there are two distinct parts of a document, the DTD and 
the "document instance". Both serve different purposes. The 
"instance" is the marked up text the user produces. (So any valid 
HTML page is an "instance" of the HTML DTD). The tags allowed in the 
instance are declared in the DTD using a different syntax.

The term "instance syntax for type definitions" means, that the same 
syntax is used for both DTD and instance. Compare:

	<!doctype aaa [
		<!element aaa	(bbb+)		> <!-- This is XML DTD syntax -->
		<!element bbb	(ccc*)		>
		<!element ccc	(#PCDATA)	>
	]>

with

	<doctype>					<!-- This is the same structure	-->
		<element>				<!-- expressed in instance syntax -->
			<name>aaa</name>	<!-- (example only, invalid)	--> 
			<model><plus>bbb</plus></model>
		</element>
		<element>
			<name>bbb</name>
			<model><rep>ccc</rep></model>
		</element>
		<element>
			<name>ccc</name>
			<model></rni type="PCDATA"></model>
		</element>
	</doctype>

Using the second case there has to be a mechanism to tell 
meta-structure-defining tags (<element>, <doctype>, ...) from 
user-defined ones, e.g.	
	1. namespaces (proposed mechanism for XML)
	2. reserved attributes (like the current XML-Link draft)
	3. reserved names (like with HTML)
	4. processing instructions (shudder)
	5 ...

Q: Has Microsoft published the intended syntax for "Schemata"  
(the MS name for "marked up" DTD) to the public ? I can't find the 
link, help is welcome.

	++im
--
Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
mailto:Ingo.Macherius@tu-clausthal.de    http://www.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From marcus at lab.com  Wed Jun 25 20:13:53 1997
From: marcus at lab.com (Wendell Piez)
Date: Mon Jun  7 16:58:05 2004
Subject: XML DTD for HTML?
Message-ID: <33B160F0.5091363B@lab.com>

List members:

Is there an XML DTD for HTML publicly available? We would be much
obliged to take a look....

Regards,

Wendell Piez
HuskyLabs
marcus@lab.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ebaatz at barbaresco.East.Sun.COM  Wed Jun 25 21:26:17 1997
From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS)
Date: Mon Jun  7 16:58:05 2004
Subject: Character encoding questions
Message-ID: <libSDtMail.9706251524.12690.ebaatz@barbaresco>

I was struck by the following sentence in the Microsoft XML White Paper:

  XML supports a range of encodings...subject only to the restriction
  that an entire document must share the same encoding.
  
My immediate reaction was that that wasn't correct, although the
definition of "document" above isn't obvious to me (for example, are
external entities part of a document?).  However, when checking into the
XML April specification, I got in over my head.  I am hoping that someone
here will help me out of my hole.

If my XML document is a simple Unicode text file then I begin it like
the following

  a Byte Order Mark
  <?XML version="1.0" encoding="ISO-10646-UCS-2"?>
  ...

with the Byte Order Mark being required even though an EncodingDecl is
used?  (I would have said "yes" until I got to Appendix E "Autodetection
of Character Sets," which worries about detecting UCS-2 when there
is no Byte Order Mark.)  Is the EncodingDecl necessary if the file
starts with a Byte Order Mark?

Where can I have an EncodingPI?  Section 4.3.3 talks about their being
"at the beginning of a system entity, before any other character data or
markup" but doesn't define "system entity" (perhaps one that has an
ExternalID that contains "SYSTEM"?).  If my document references an
external entity, then I believe that the external entity must start
with an EncodingPI (see Appendix E "Autodetection of Character Sets")
if it isn't in UTF-8 or start with a Byte Order Mark.

If I wanted to take the external entity and, for portability reasons,
bundle it into my XML document as an internal entity, what do I do with
the external entity's EncodingPI?  It doesn't seem to be allowed in the
internal entity declaration, somewhat like:

  <!ENTITY Pub-Status <?XML encoding="ISO-10646-UCS-2"?>"text here">
  
I presume that the answer is that I cannot convert an external entity
into an internal unless the external entity and my XML document have the
same encoding.

What is the motivation for not allowing a change of encoding within
an entity?  The mechanism for handling that seems no different than
that needed to handle different encodings in external entities, which
I think of as being logically a part of the referencing document.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed Jun 25 23:53:04 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:05 2004
Subject: XML-Data
Message-ID: <8518@ursus.demon.co.uk>

In message <78DFE33066ABD0118B9200805FD431BA93299D@RED-16-MSG.dns.microsoft.com> Jean Paoli writes:
> The current version of XML-Data, the Microsoft Position Paper, is at
> http://www.microsoft.com/standards/xml, along with a white paper on XML.
> Together, these documents present our vision for the use of structured
> data on the web. 
> I hope these are easier to use than the version I mailed to you on
> Sunday.

Thanks very much Jean,
	In fact the mail that was sent last week was unreadable on my mailer
(completely) although it still had to be downloaded.

	It would be appreciated if *very* large documents were mounted on the 
WWW and not directly mailed to this list, because not everyone can read them 
easily, and some of us have to pay for connect time.

	I discovered the URL independently and it's certainly a useful 
resource for the XML community - abstracters and curators will no doubt add 
it to their pages.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed Jun 25 23:53:10 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:05 2004
Subject: DTD invented by Microsoft?!
Message-ID: <8519@ursus.demon.co.uk>

In message <199706251753.TAA18214@sinfonix.rz.tu-clausthal.de> "Ingo Macherius" writes:
> > Could someone explain what "instance syntax for type definitions"
> > means. Thanks!
> 
> <meta>
> 	I am aware this is a beginner's question. Is xml-dev the right place
> 	to answer ? If not, where is the place for such Q/A ? 

The first place to go would be Peter Flynn's FAQ if it is relevant to XML.
www.ucc.ie/xml/
Peter has a form where you can post questions and/or answers.  When the FAQ 
was set up he was urging people to post.  It's now a very impressive site
with colours for versions, etc. and I'm not sure whether there is still a call
for material. ???Peter.

	There can be times when "beginners' questions" are appropriate on this 
list (*I* ask enough :-).  This is when there is a real danger that a general
lack of understanding might lead to fuzziness in *implementation*.  I think
this is particularly true when terminology is not clear.  So, for example,
'What is a resource in XML-LINK?' is worth discussing, and EliotK has given
an excellent working reference document for us.  

> </meta>
> 
	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed Jun 25 23:53:37 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:05 2004
Subject: XML-LINK
Message-ID: <8520@ursus.demon.co.uk>

I recently posted some concerns about XML-LINK on XML-WG and it was suggested
that XML-DEV would be more appropriate; I agree. The main question is to
what extent *generic* XML-link-processors can be built which are 
application-independent. They would rely totally on the XML-LINK spec for their
implementation. I have been rereading Eliot Kimber's popsting of 1997-05-31 
on this list and have found it extremely helpful.  Since no-one has challenged 
any of the ideas or terminology there, I shall take that as a reference point
and try to use his terms consistently. (I am aware that July 1 may bring 
additional clarification, but discussion here will help).

In many respects, XML-LINK behaviour has parallels to our discussions on 
XML-LANG APIs.  The draft specifies what goes in, but leaves more fluid 'what
comes out'.  It's critical that we have consistent terminology in all of these
endeavours and outline the areas of complete agreement.

My primary concern is with the terms 'resource' and 'embed', where I believe
there is scope for added precision, and where I am not clear that all the 
discussion on XML-WG about these has been consistent with Eliot's document.

It seems clear that link traversal requires us to have parsed documents in
'memory' (this could also mean persistent storage, etc.) A link connects
'nodes in trees' and 'resource' is essentially synonymous with 'node' (EK, P1.).
I will build a simple example, and then ask how it might be implemented:

a.xml:

<P>This is <A ID="A" XML-LINK="SIMPLE" HREF="b.xml#ID(B1)" TITLE="l1">
a <B>link</B></A> in a paragraph</P>

can be parsed to a tree (I use '-' to indicate childOf in a TOC-like structure
and PC(string) indicates a child with #PCDATA content (whitespace problems
ignored).

P
-PC(This is )
-A
--PC(a )
--B
---PC(link)
-PC( in a paragraph)

Now, from Eliot's posting I identify the node A as the resource at one end of 
the link l1. The content of A is not relevant to the resource, since a node
is a point.  [However some XML-WG postings seemed to imply that the content
of A is a resource, which is at variance with Eliot's explanation.]

For EXTENDED, INLINE="TRUE" I am less clear what the resource is in:

<MYLINK XML-LINK="EXTENDED" ID="family" INLINE="true">
<P>Here is the
<A XML-LINK="LOCATOR" ID="father" HREF="b.xml#ID(father)">father</A> and the
<A XML-LINK="LOCATOR" ID="mother" HREF="b.xml#ID(mother)">mother</A> and the
<A XML-LINK="LOCATOR" ID="baby" HREF="b.xml#ID(baby)">baby</A>
</P>
</MYLINK>

which parses to:

MYLINK
-P
--PC(Here is the)
--A
---PC(father)
--PC(and the)
--A
---PC(mother)
--PC(and the)
--A
---PC(baby)

Now this is a single link, with (presumably) a single end at the INLINE end.
So does this mean that the 'resource' of this link is the MYLINK node
with ID=family? Or are there three 'resources' at this end, the A nodes with
IDs of 'father', 'mother' and 'baby'?

*-*-*-*

Now for the other end of the link, and EMBED.  I have implemented EMBED in JUMBO
like IMG in HTML:
<A HREF="foo.xml#ID(MOL)" SHOW="EMBED"/>
would locate the ID=MOL in foo.xml, process() it to create an object, which 
would then display() itself in the document at the position where the A link 
would be rendered.  But I am more concerned about when the located node is
a (sub)tree which [XML-LINK] 'should be embedded, for the purposes of display
or processing in the body of the resource and at the location where the
traversal started'

Taking the first example (a.xml) which links to b.xml and assume b.xml 
contains:

b.xml:

<P>This is <NODE ID="B1">
a <B>node</B></A> in a paragraph</P>

which is parsed to:

P
-PC(This is )
-NODE
--PC(a )
--B
---PC(node)
-PC( in a paragraph)

The link from ID="A" in a.xml links to node ID="B1" in b.xml.  In one
interpretation, that's it - 'embed'ding is up to the application. But is there
any reasonable default behaviour?

(A) it could be traversed as if it were physically part of the a.xml document
(i.e. if NODE were a child of A (and presumably the eldest sibling).  The 
processor would encounter A, process the node (only), find it had a LINK, 
process that, then find A had content and process that.  Note that the content
of A remains and it would be application-dependent whether the *content* of A 
was hidden or remains.

(B) Nothing happens unless BEHAVIOR is set.  In which case are there reasonable
values for it?  And does the concept of embedding have any meaning?

My own feeling is that (A) is the most reasonable default.  With NEW we have a
separate window, with a separate namespace (so it doesn't matter if b.xml has
a different DTD from a.xml).  So this 'window' has to transported into the
current 'window'.

I'd value comments.  If this seems to be a consensus view then I'll try to 
implement it in JUMBO.  At present I suspect that JUMBO has got this partly
right and partly wrong.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From cbullard at hiwaay.net  Thu Jun 26 02:06:11 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun  7 16:58:05 2004
Subject: DTD invented by Microsoft?!
References: <E0wgVI3-00074J-00@punch.ic.ac.uk> <33B05BF1.47B3@hiwaay.net> <730.199706250817@grogan.cogsci.ed.ac.uk>
Message-ID: <33B1B254.1A55@hiwaay.net>

Henry S. Thompson wrote:
> 
> Len writes:
> 
> >  This appears to be the long awaited and somewhat dreaded
> >  attempt to use instance syntax for type definitions.  It
> >  is an idea that has been floated several times on the
> >  XML WG list and generally resisted.
> 
> It was resisted, correctly in my view, as a component of XML-lang
> itself, and in the decision the point was made several times that the
> right place for this was one level up, as a generic application.
> That's what the schema proposal in the XML-data document is aimed at
> providing.
> 
> >
> >  It is a bad idea and may be the reason SGML community
> >  members finally withdraw from XML development.
> 
> I'd be interested to hear your reasons for thinking it's a bad idea --

Why do we need two ways to do the same thing?  Rick Jeliffe 
provided the example in the SGML DTD syntax we know now.  
If simplicity is the goal, why introduce this now?

len

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From galiard at let.rug.nl  Thu Jun 26 06:52:52 1997
From: galiard at let.rug.nl (Harry Gaylord)
Date: Mon Jun  7 16:58:05 2004
Subject: Character encoding questions 
Message-ID: <199706260452.GAA22614@odur.let.rug.nl>

> I was struck by the following sentence in the Microsoft XML White Paper:
> 
>   XML supports a range of encodings...subject only to the restriction
>   that an entire document must share the same encoding.
>   
> My immediate reaction was that that wasn't correct, although the
> definition of "document" above isn't obvious to me (for example, are
> external entities part of a document?).  However, when checking into the
> XML April specification, I got in over my head.  I am hoping that someone
> here will help me out of my hole.
> 
> If my XML document is a simple Unicode text file then I begin it like
> the following
> 
>   a Byte Order Mark
>   <?XML version="1.0" encoding="ISO-10646-UCS-2"?>
>   ...
> 
> with the Byte Order Mark being required even though an EncodingDecl is
> used?  (I would have said "yes" until I got to Appendix E "Autodetection
> of Character Sets," which worries about detecting UCS-2 when there
> is no Byte Order Mark.)  Is the EncodingDecl necessary if the file
> starts with a Byte Order Mark?
> 
> Where can I have an EncodingPI?  Section 4.3.3 talks about their being
> "at the beginning of a system entity, before any other character data or
> markup" but doesn't define "system entity" (perhaps one that has an
> ExternalID that contains "SYSTEM"?).  If my document references an
> external entity, then I believe that the external entity must start
> with an EncodingPI (see Appendix E "Autodetection of Character Sets")
> if it isn't in UTF-8 or start with a Byte Order Mark.
>
In classical SGML this info is contained in the system declaration where
one or more character sets can be declared and the control characters
used to switch between them, using the ISO 2022 and related standard
systems. These are read in before the dtd.

However, if I understand the XML proposals correctly, they do not envisage
a system declaration. The best info on system declarations are a white 
paper from omnimark and an article in TAG by Wayne Wohler. On character
sets you might have a look at my article in CHUM a couple of years ago.
I have a preprint in ps available by ftp if you want to see it. It does
not have the character set tables which ISO claims the copyright for.

With the implementation of unicode/ucs we don't need all those things with
control characters which are too succeptible to corruption. All the
characters you need (or almost all in my case) are in the new character set.
   
The other option in classic SGML is to use a subdoc, but as far as I can 
remember it can contain its own dtd, but I don't think it can have a
system declaration. My docs are at the office.
 
>
Harry Gaylord
former chair TEI committee on character sets
member ISO SC2 and NNI shadow committee
 
> 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From north at synopsys.com  Thu Jun 26 09:29:17 1997
From: north at synopsys.com (Simon North)
Date: Mon Jun  7 16:58:05 2004
Subject: PUBLIC identifiers in XML?
In-Reply-To: <78DFE33066ABD0118B9200805FD431BA932987@RED-16-MSG.dns.microsoft.com>
Message-ID: <199706260728.JAA12969@cadis.de>

This is mostly likely a RTFM question, but the XML FAQ says:

    "No public identifiers in entity and notation declarations"

While the XML-lang says (page 22 in the dead tree version or see 
http://www.w3.org/pub/WWW/TR/WD-xml-lang#secA. for the 
borrowed electrons version):

    "No public identifiers in ENTITY, DOCTYPE, and NOTATION 
      declarations".  

but at the same time, XML-lang explicitly includes PUBLIC in the 
production rule (Section 4.3.2, page 18 or see 
http://www.w3.org/pub/WWW/TR/WD-xml-lang#sec4.3.2) AND 
has an example of an external entity declaration that *does* use a 
public identifier.
 
I've also seen public identifiers used in DOCTYPE declarations for 
XML, and I had understood that this was OK but should still be 
supported by a SYSTEM identifier. 

I had also heard/read somewhere that a resolution mechanism for 
public identifiers was being worked on and that the restriction might 
then go away. 

Could someone please enlighten me on this?

Thanks. 


Simon North - COSSAP Technical Writer, Synopsys
Synopsys GmbH, Kaiserstr. 100, 52134 Herzogenrath
Germany. +49 2407 955873 -- north@synopsys.com
Voice mail: +1 415 694 4141 55055

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ht at cogsci.ed.ac.uk  Thu Jun 26 11:44:53 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun  7 16:58:06 2004
Subject: DTD invented by Microsoft?!
In-Reply-To: len bullard's message of Wed, 25 Jun 1997 19:05:40 -0500
References: <E0wgVI3-00074J-00@punch.ic.ac.uk> <33B05BF1.47B3@hiwaay.net>
	<730.199706250817@grogan.cogsci.ed.ac.uk> <33B1B254.1A55@hiwaay.net>
Message-ID: <1113.199706260944@grogan.cogsci.ed.ac.uk>

Len writes:
>  > Len wrote
>  > >  It is a bad idea and may be the reason SGML community
>  > >  members finally withdraw from XML development.
>  > 
>  Henry S. Thompson wrote:
>  > I'd be interested to hear your reasons for thinking it's a bad idea --
>  
>  Why do we need two ways to do the same thing?  Rick Jeliffe 
>  provided the example in the SGML DTD syntax we know now.  
>  If simplicity is the goal, why introduce this now?

Just because I can express any logical formula using Shaeffer stroke,
or any program in assembler, doesn't mean I should.  Using PEs to
encode an element-type hierarchy not only obscures the author's
intention, it invites accidental error, encourages hacking at the
margins, and makes it harder for non-specialist users to augment the
hierarchy cleanly.  Compare, for example

  <!entity % x.phrase 'myCrystal |'>
  <!element myCrystal (. . .)>

with

  <elementType id='myCrystal' extends='#phrase'>
    . . .
  </elementType>

Understanding why and how the first of these does its work requires
considerable specialist knowledge, and if you don't believe me ask Lou
Burnard and Michael Sperberg-McQueen how easy they have found it to
educate TEI users to make such extensions themselves.

An explicit type hierarchy also simplifies things for the original
author, making the schema easier to maintain, to explain, and to read
for the ordinary user.  Compare:

<!ENTITY % paraContent '(#PCDATA | %m.phrase | %m.inter)*'      >
<!ENTITY % m.phrase '%x.phrase %m.data; . . .'>
<!ENTITY % a.global '        id ID #IMPLIED
                             . . .'>
<!ELEMENT p         - O  (%paraContent;)                    >
<!ATTLIST p              %a.global;
          TEIform            CDATA               'p'            >

with

<!elementType id='p' extends='#global'>
  <mixed>
   <elt href='#phrase'/>
   <elt href='#inter'/>
  </mixed>
  <attribute id='TEIform' presence='fixed' default='p'/>
</elementType>

<elementType id='phrase'>
 . . .
</elementType>

<elementType id='global'>
 <attribute name='id' type='id'>
 . . .
</elementType>

Note finally that the PE method only easily allows a single layer of
specialisation -- once you've defined x.phrase in the above example,
you can't give the results to someone else and say "And to add stuff
to paragraph's content model, define x.phrase to what you want".  If
you want to do that, you have to

<!define % x.phrase '%y.phrase myCrystal |'>

and so on.  The element-type hierarchy approach allows multiple
independent specialisations, with a free choice of attachment points
(i.e. extends='phrase' or extends='myCrystal').

Hope this helps communicate the value I see in this approach.

ht

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ebaatz at barbaresco.East.Sun.COM  Thu Jun 26 16:04:18 1997
From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS)
Date: Mon Jun  7 16:58:06 2004
Subject: Character encoding questions
Message-ID: <libSDtMail.199706261003.2567.ebaatz@barbaresco>

Thanks for you wide-ranging and graceful reply.

>  (There have been some suggestions that ... encoding
>  declarations [be] optional if there is an external carrier with a
>  character-encoding label...

I hope those sentiments are resisted.  Having something like
a declaration that is transport or operating environment
independent seems a lot simpler, reliable, and understandable.
If the declaration is redundant, it is harmless.


>  The reason I, for one, didn't lobby for allowing change of encoding
>  within an entity...

For what it is worth, I'm in agreement.  In practice I don't see
much need for such a feature and there exist straightforward ways
of handling such problems when they do exist.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From flammia at sls.lcs.mit.edu  Thu Jun 26 16:49:51 1997
From: flammia at sls.lcs.mit.edu (Giovanni Flammia)
Date: Mon Jun  7 16:58:06 2004
Subject: DTD invented by Microsoft?!
References: <E0wgVI3-00074J-00@punch.ic.ac.uk> <33B05BF1.47B3@hiwaay.net>
		<730.199706250817@grogan.cogsci.ed.ac.uk> <33B1B254.1A55@hiwaay.net> <1113.199706260944@grogan.cogsci.ed.ac.uk>
Message-ID: <33B278E4.4867D782@sls.lcs.mit.edu>

As someone who is not used to write DTDs, I appreciate the
simplifications
proposed by Henry Thompson. With XML, less is more. So, for example, I
can see
why constraining XML documents to be trees is better than allowing
people to encode
arbitrary object graphs.

Isn't XML and its extensions to become "SGML for the masses, without
DTDs"?

If you keep a gentle learning curve for people to create new tags, I am
sure
the popularity of XML will spread like wildfire. I apologize if this
comment
might seem misplaced, but if one has to learn
full-blown SGML syntax and how to write DTDs, then most people who
are afraid to get into SGML now (and are currently occasional users of
SGML w/o dwelling into
DTDs) will be also afraid to work with XML.

I am a little bit confused about how much power of expression should XML
have.
If an XML document encodes detailed semantics about how to process its
elements, like a full blown programming language, and you have to use an
IDL for it, isn't XML competing with distributed object communication
(e.g., CORBA), and distributed object databases (e.g., ObjectStore) but
much less efficient (requiring parsing to communicate with objects,
rather than calling the objects' methods directly)? How does all this
fit together?

Shouldn't XML be specialized to expose just enough of the semantics
necessary to improve
indexing, searching, and multi-modal display of Web documents?

Giovanni Flammia
flammia@sls.lcs.mit.edu


-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 280 bytes
Desc: Card for Giovanni Flammia
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970626/75d92eb1/vcard.vcf
From north at synopsys.com  Thu Jun 26 17:08:34 1997
From: north at synopsys.com (Simon North)
Date: Mon Jun  7 16:58:06 2004
Subject: DTD invented by Microsoft?!
In-Reply-To: <33B278E4.4867D782@sls.lcs.mit.edu>
Message-ID: <199706261508.RAA18154@cadis.de>

Giovanni Flammia wrote:

>  With XML, less is more. 

My 2 cents ... yes, and possibly even less than HTML. While it isn't 
(yet) an XML application, the HDML DTD gives a hint of how you can 
use a very meagre set of elements to create an application.

> if one has to learn
> full-blown SGML syntax and how to write DTDs, then most people who
> are afraid to get into SGML now (and are currently occasional users
> of SGML w/o dwelling into DTDs) will be also afraid to work with
> XML.

Why should we/they be exposed to it? ... come on, tool developers! 
Writing CSS style sheets (I'll leave DSSSL style sheets out of this), 
can be pretty complicated but there are some fair WYSIWYG 
tools coming on the market already. 

> If an XML document encodes detailed semantics about how to
> process its elements, 

But wasn't that the whole point of dropping LINK, LINKTYPE and 
USELINK? XML doesn't need the semantics, IMHO, these need to be 
provided externally to xml-lang; hence the need for XAPI.

> How does all this fit together?

Java?

> Shouldn't XML be specialized to expose just enough of the semantics
> necessary to improve indexing, searching, and multi-modal display of
> Web documents?

isn't it already?

Simon North - COSSAP Technical Writer, Synopsys
Synopsys GmbH, Kaiserstr. 100, 52134 Herzogenrath
Germany. +49 2407 955873 -- north@synopsys.com
Voice mail: +1 415 694 4141 55055

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ebaatz at barbaresco.East.Sun.COM  Thu Jun 26 17:41:03 1997
From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS)
Date: Mon Jun  7 16:58:06 2004
Subject: Character encoding questions
Message-ID: <libSDtMail.199706261139.28028.ebaatz@barbaresco>

>  >Having something like
>  >a declaration that is transport or operating environment
>  >independent seems a lot simpler, reliable, and understandable.
>  >If the declaration is redundant, it is harmless.
>  
>  If they are in conflict, it can be harmful.

Quite true if a program that is confronted with the conflict
does something harmful.  I'm speculating that fewer harmful
results will occur in the real world if #1 below occurs
than if #2 occurs.

#1. A program trusts the XML information unless it results in
the XML document not looking like an XML document.  Then the
program can give up or try some environment-driven methods.

#2. A program keeps a lot of specialized information about
environments it might run in (lets assume that the program
is somewhat portable so the environments include as least
Windows, Apple, and some sort of Unix) and how an XML document
might reach it.  If the program has the right information
and correctly winds its way through it, then it gets a good XML
document.

I think that #2 is much harder to do than being told what to do
by the document.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From matt at wdi.disney.com  Thu Jun 26 18:12:18 1997
From: matt at wdi.disney.com (Matthew Fuchs)
Date: Mon Jun  7 16:58:06 2004
Subject: DTD invented by Microsoft?!
In-Reply-To: Giovanni Flammia <flammia@goldilocks.lcs.mit.edu>
        "Re: DTD invented by Microsoft?!" (Jun 26, 10:12am)
References: <E0wgVI3-00074J-00@punch.ic.ac.uk>  <33B05BF1.47B3@hiwaay.net> 
	<730.199706250817@grogan.cogsci.ed.ac.uk> 
	<33B1B254.1A55@hiwaay.net> 
	<1113.199706260944@grogan.cogsci.ed.ac.uk> 
	<33B278E4.4867D782@sls.lcs.mit.edu>
Message-ID: <9706260914.ZM15268@scrumpox.rd.wdi.disney.com>

On Jun 26, 10:12am, Giovanni Flammia wrote:
> Subject: Re: DTD invented by Microsoft?!
>
> As someone who is not used to write DTDs, I appreciate the
> simplifications
> proposed by Henry Thompson. With XML, less is more. So, for example, I
> can see
> why constraining XML documents to be trees is better than allowing
> people to encode
> arbitrary object graphs.
>
You can't constrain them to be trees.  However the element structure has to be
a tree because that is the only graph structure which can be linearized to a
text document.  A major use of attributes is to indicate the back and cross
edges in the original graph.

> Isn't XML and its extensions to become "SGML for the masses, without
> DTDs"?
>
No.  DTDs were not created just to cause pain and suffering.  They are actually
not hard to create.  Well-formedness was to allow useful processing to occur
without the parser requiring the DTD and to make parsers easier to write, not
to inspire tag salad.

> If you keep a gentle learning curve for people to create new tags, I am
> sure
> the popularity of XML will spread like wildfire. I apologize if this
> comment
> might seem misplaced, but if one has to learn
> full-blown SGML syntax and how to write DTDs, then most people who
> are afraid to get into SGML now (and are currently occasional users of
> SGML w/o dwelling into
> DTDs) will be also afraid to work with XML.
>
Lack of XML has not prevented people from introducing new tags.  But tag salad
is like spaghetti code.  Tags (elements) are not independent of each other if
they have any semantics.  DTDs help keep this under control.

> I am a little bit confused about how much power of expression should XML
> have.

I have an acquaintance who loves to say that ASCII is computationally complete.
 You can express arbitrary computations in ASCII.  Why should XML be less?

> If an XML document encodes detailed semantics about how to process its
> elements, like a full blown programming language, and you have to use an
> IDL for it, isn't XML competing with distributed object communication
> (e.g., CORBA), and distributed object databases (e.g., ObjectStore) but
> much less efficient (requiring parsing to communicate with objects,
> rather than calling the objects' methods directly)? How does all this
> fit together?
>
No.  The document doesn't encode these semantics, but you need an API to allow
semantics to be applied to the document.  Your contrast with distributed
objects is also false.  Look at CORBA under the hood, and you'll see there's
parsing going on.  Also, one of the points of mobile agent technology is that
distributed invocation is not necessarily cheap.  Sending a document can be
very much like sending an agent.  (Check out my paper "Let's Talk" at
http://cs.nyu.edu/phd_students/fuchs).

> Shouldn't XML be specialized to expose just enough of the semantics
> necessary to improve
> indexing, searching, and multi-modal display of Web documents?
>
Then it's just HTML++ Yuck!  :-(

Matthew Fuchs
matt@wdi.disney.com

-- 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From rrseibel at att.com  Thu Jun 26 18:56:48 1997
From: rrseibel at att.com (Seibel, Robert R)
Date: Mon Jun  7 16:58:06 2004
Subject: XML and HTML Intermixed
Message-ID: <9706261658.AB03565@hoccson.ho.att.com>

XML Dev. Team:

In my application, I see the need to be able to mix XML (my own tags)
and HTML tags in a core content database. I plan on using a DTD
at various authoring points to validate structure and tags.

Do you see mixing tags as reasonable? The XML tags could be converted
to the appropriate HTML tags if sent to a browser. Then again
all of the tags or information could be formatted for the appropriate
output device
on the fly.

For instance, I may have a tag called PROBLEM and another called
SOLUTION.
As I'm explaining the solution, it would be nice to use HTML tags to
explain the
solution.

Example:

<PROBLEM>Problem description</PROBLEM>
<SOLUTION>
<OL>
    <LI>Do this first</LI>
    <LI>This is second</LI>
</OL>
<P>Call me on questions.</P>
<SOLUTION>

Let's say I used a style sheet to display the contents. It seems to me
that
using HTML tags intermixed with XML tags is a good thing. I don't have
to
reinvent my own tags when HTML already defines them. 
Comments?

Thanks,
Bob Seibel

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Thu Jun 26 23:59:09 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:06 2004
Subject: PUBLIC identifiers in XML?
Message-ID: <3.0.32.19970626145528.00a8d560@pop.intergate.bc.ca>

At 09:29 AM 26/06/97 +0000, Simon North wrote:
>This is mostly likely a RTFM question, but the XML FAQ says:
>
>    "No public identifiers in entity and notation declarations"
...
>
>Could someone please enlighten me on this?

Yes, XML has public identifiers.  The docs are wrong.  In the errata file,
will get fixed. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Fri Jun 27 02:07:04 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:06 2004
Subject: Lark 0.90 available, with an application
Message-ID: <3.0.32.19970626170447.00a71490@pop.intergate.bc.ca>

Hi - Lark 0.90 is now available at 
 http://www.textuality.com/Lark

Differences:
 - now does entity references in attribute values
 - does &#X style hex character references
 - has draconian error handling
 - the Handler has an element() method to serve as an element factory
 - lots of bug fixes
 - it's all in a package, textuality.lark

Doesn't do PE's yet.
It's now over 40k, sigh.

For me, the interesting thing is that it now comes with an application
named XH.  It was bothering me that I was writing but not using the
software, so I created xh, which reads the XML form of all the docs
I'm working on (XML-lang, XML-link, MCF, etc etc etc) and generates 
the HTML.  This used to be done with a mouldy tumerous perl program -
nothing against perl, but xh is a lot cleaner and nicer.  Also it
produces valid HTML, which the perl didn't.

Xh is interesting as it is probably a canonical customer for XAPI
(why did we lose JAX, I liked it?) - it doesn't use the event stream,
it lets the parser build the tree and then just runs around the
elements and attributes.

For Xh, I also, after getting it working, realized that I had re-used
Peter Murray-Rust's trick of just having a .class per element-type
(Class.forName() and Class.newInstance(), gotta love 'em) - I wonder if
this is just a coincidence or is this the basic paradigm on which XML 
software is going to be built?  If so, it might make sense to wire
a standard class-finder call into XAPI.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From cbullard at hiwaay.net  Fri Jun 27 04:20:55 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun  7 16:58:06 2004
Subject: Lark 0.90 available, with an application
References: <3.0.32.19970626170447.00a71490@pop.intergate.bc.ca>
Message-ID: <33B32355.14F0@hiwaay.net>

Tim Bray wrote:
> 
> If so, it might make sense to wire
> a standard class-finder call into XAPI.

I'm reading "Late Night VRML 2.0 with Java".  
The same approach (class per nodeType) seems 
to be recommended there.  Granted, VRML is 
designed to be an object-oriented format, but 
I suspect you are right.

Good going.

len

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From galiard at let.rug.nl  Fri Jun 27 06:34:33 1997
From: galiard at let.rug.nl (Harry Gaylord)
Date: Mon Jun  7 16:58:06 2004
Subject: character encoding questions
Message-ID: <199706270434.GAA10659@odur.let.rug.nl>

Paul Grosso has sent me a note that I was talking about the SGML
declaration, not the system declaration yesterday. He is right.

The preprint is available from the following address:
ftp://let.rug.nl/pub/Galiard/chum7.ps

Let me know if you have any problems in getting it.

Harry Gaylord


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at Eng.Sun.COM  Fri Jun 27 06:38:58 1997
From: Jon.Bosak at Eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:06 2004
Subject: XML and HTML Intermixed
In-Reply-To: <9706261658.AB03565@hoccson.ho.att.com> (rrseibel@att.com)
Message-ID: <199706270437.VAA05158@boethius.eng.sun.com>

[Bob Seibel:]

| Let's say I used a style sheet to display the contents. It seems to me
| that using HTML tags intermixed with XML tags is a good thing. I don't
| have to reinvent my own tags when HTML already defines them.

You can mix tags all you want; with the exception of a handful of
reserved names, the XML name space belongs to you.  But this means
that "UL" has no more meaning to an XML processor than "SOLUTION"
does; in both cases, you must use some other mechanism to specify the
semantics.  The most common mechanisms are going to be Java classes or
stylesheets.

Some people have suggested the definition of an "HTXML" to grandfather
existing HTML tags but let you define any other tag that is not HTML.
One great big problem with this approach is that if HTXML is based on
HTML 4.0 (say), and HTML 4.0 has no SOLUTION tag, and therefore you
use SOLUTION all through your documents assuming a particular meaning
for SOLUTION, and then in HTML 4.1 a tag named SOLUTION with a
different meaning is defined, you're hosed.

A possible way out of this would be to define a reserved attribute to
tell an XML browser that you want some element type to have the
semantics of some HTML tag:

   <BULLETLIST XML-HTML-EQUIV="UL"> ... </BULLETLIST>

This has some advantages, but given the speed with which XML is
moving, I personally am not persuaded that it's worth the trouble.

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at Eng.Sun.COM  Fri Jun 27 07:11:52 1997
From: Jon.Bosak at Eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:06 2004
Subject: Lark 0.90 available, with an application
In-Reply-To: <3.0.32.19970626170447.00a71490@pop.intergate.bc.ca> (message from Tim Bray on Thu, 26 Jun 1997 17:04:52 -0700)
Message-ID: <199706270510.WAA05183@boethius.eng.sun.com>

[Tim Bray:]

> (why did we lose JAX, I liked it?)

Because we were informed that it means "toilet" in Ireland -- clearly
a variant of "jakes," which is why JAX bothered me for reasons that I
couldn't identify until the Irish usage was pointed out.

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Jun 27 09:31:42 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:06 2004
Subject: XML and HTML Intermixed
Message-ID: <8543@ursus.demon.co.uk>

In message <9706261658.AB03565@hoccson.ho.att.com> "Seibel, Robert R" writes:
> XML Dev. Team:

There is no 'team' other than the public-spirited members of this list and
others :-).  Everyone is invited to join in - no entrance qualifications - just
a willingness to help the development process.

> 
> In my application, I see the need to be able to mix XML (my own tags)
> and HTML tags in a core content database. I plan on using a DTD
> at various authoring points to validate structure and tags.

This is an absolutely key question - which some of us raise at regular 
intervals.  My analysis - which I hope others will challenge or amplify - is
something like this:

HTML2.0 and HTML3.2 *at present* are SGML-compatible (if properly authored,
with balanced tags, quoted attributes, etc.)  They are not XML-compatible
for reasons which have been discussed here (inclusions/exclusions, '&' content
models, etc. in the DTD, and some EMPTY tags which require the <FOO/> syntax
in XML).  We all expect that 'someone' will convert common DTDs to XML and HTML
is a leading candidate but so far no-one has actually done it. (IMO it needs
to have the (in)formal blessing of the W3C, since HTML is a W3C protegee).

So the question might break down to:

(a) can I mix HTML(non-XML) with XML in the same document?
	This would not be a valid XML document overall, but it might be valid
input to an HTML browser which recognised XML markup.  It's up to the browser
(or other software) creator as to whether that's meaningful.

(b) can I refer to an XML document from an HTML document?
	This is simple if there is a MIME type for XML, since standard helper
technology can be used. [This is what I do for CML (Chemical Markup Language)
and I use the browser to call a viewer for text/xml or chemical/x-cml]. It
is generally believed that 'someone' is submitting an application to IETF/IANA 
for registration of the text/xml MIME type (??Progress??).

(c) can I XML-ise HTML and mix it with my own DTD?
	Yes.  It depends on how this is done.  I have edited HTML2.0 to be 
XML-compliant for my own purposes.  CML 'contains' HTML2.0 as part of the
CML DTD.  This guarantees there are no namespace problems (i.e. CML cannot
have identical ELEMENTs to those in HTML).  So this allows CML documents to 
contain chunks of XML-ised HTML.  Rendering these is non trivial, because it
is not easy to pass HTML to the browser without using Javascript and I do
not like doing this (non-portable, flaky, etc.)  Moreover I have tweaked
my HTML to use the full XML-LINK syntax for tags such as <A>.

(d) Can I use HTML with my document if I have an ElementType which clashes
with one in HTML?
	Not easily.  The question of combining DTDs and document fragments
has exercised the ERB/WG and generated megabytes of opinion.  A solution
will appear at some time in the future.

(e) Can I use XML-ised HTML and include XML-LINKs to other XML documents?
	Yes, if the HTML has been extended to use XML-LINK.  This is what I
do to avoid namespace clashes.  It may have its detractors. Be warned that 
there is not much software which can display XML documents using two different
DTDs at the same time; I'm working out how JUMBO will do this - if I get some
answers to my LINK queries it should be fairly straighforward.
> 
> Do you see mixing tags as reasonable? The XML tags could be converted
> to the appropriate HTML tags if sent to a browser. Then again

There are normally no default 'appropriate HTML tags'.  How would you convert
<FOO>
<BAR>276+354/872=6354?</BAR>
</FOO>

to HTML? One way to tackle this is through stylesheets (CSS1 or DSSSL) where 
appropriate formatting/rendering is applied to each tag, including context.
Alternatively (as in JUMBO) Java classes can be supplied for each ElementType
which might convert to HTML. (For example, MOLecule in CML has 1500 lines of 
Java which among many other things will render it as HTML).

> all of the tags or information could be formatted for the appropriate
> output device
> on the fly.
> 
> For instance, I may have a tag called PROBLEM and another called
> SOLUTION.
> As I'm explaining the solution, it would be nice to use HTML tags to
> explain the
> solution.
> 
> Example:
> 
> <PROBLEM>Problem description</PROBLEM>
> <SOLUTION>
> <OL>
>     <LI>Do this first</LI>
>     <LI>This is second</LI>
> </OL>
> <P>Call me on questions.</P>
> <SOLUTION>
> 
> Let's say I used a style sheet to display the contents. It seems to me
> that
> using HTML tags intermixed with XML tags is a good thing. I don't have
> to
> reinvent my own tags when HTML already defines them. 
> Comments?

I am strongly in favour or re-using DTDs and document fragments.  So many
chemical documents will draw from 3 DTDs:
	- HTML for the main text
	- MathML for the mathematics
	- CML for the chemistry
The ERB/WG has debated this at great length and accepts it as very desirable and
high-priority.  No actual mechanism is given at present.  An additional 
character has been reserved for NAMEs in case we need to use it for namespace
#in the future, but we're not allowed to use it yet [I think that is the correct
position??]. 

To summarise, I believe that mix-and-match from different DTDs is a valid and
useful approach to XML.  It means that there can be 'islands of validity'
[an idea from the WG] within XML documents, so that XML-WF docs will not
be semantically void tag soup.  The difficulty at present is how those
islands are identified - there is no consensus yet.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Jun 27 10:40:39 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:06 2004
Subject: Lark 0.90 available, with an application
Message-ID: <8554@ursus.demon.co.uk>

In message <3.0.32.19970626170447.00a71490@pop.intergate.bc.ca> Tim Bray writes:
> Hi - Lark 0.90 is now available at 
>  http://www.textuality.com/Lark
> 
> Differences:
>  - now does entity references in attribute values
>  - does &#X style hex character references
>  - has draconian error handling
>  - the Handler has an element() method to serve as an element factory
>  - lots of bug fixes
>  - it's all in a package, textuality.lark

Great!!  I was waiting for the 'package' to bolt it into JUMBO.
[I'm writing this before I have downloaded it.]

> 
> Doesn't do PE's yet.
> It's now over 40k, sigh.

We can't easily get round this problem.  XML takes a *lot* of code.  I have
found that JUMBO has huge classes (e.g. 100 member functions) for Node, Tree
and TOC.  Trouble is that they all have to be loaded even if only a small
amount of functionality is used - e.g. you have to have mouseDrag(), 
mouseMove() even if the user might not drag the mouse :-)

> 
> For me, the interesting thing is that it now comes with an application
> named XH.  It was bothering me that I was writing but not using the
> software, so I created xh, which reads the XML form of all the docs
> I'm working on (XML-lang, XML-link, MCF, etc etc etc) and generates 
> the HTML.  This used to be done with a mouldy tumerous perl program -
> nothing against perl, but xh is a lot cleaner and nicer.  Also it
> produces valid HTML, which the perl didn't.
> 
> Xh is interesting as it is probably a canonical customer for XAPI
> (why did we lose JAX, I liked it?) - it doesn't use the event stream,

So did I! There are 30K+ references to JAX on the net including jax.org
(where the mouse genome is being explored).  

> it lets the parser build the tree and then just runs around the
> elements and attributes.

Yes.  JUMBO does this by having a generic SGMLNode (named before XML was 
invented) which has default actions for attributes, contents, etc. It has
routines such as process(), toHTML(), toString(), display(Graphics g), etc.
So that reading a DTD-less XML document it can still do something with
it.

> 
> For Xh, I also, after getting it working, realized that I had re-used
> Peter Murray-Rust's trick of just having a .class per element-type
> (Class.forName() and Class.newInstance(), gotta love 'em) - I wonder if
> this is just a coincidence or is this the basic paradigm on which XML 
> software is going to be built?  If so, it might make sense to wire
> a standard class-finder call into XAPI.

I suspected we were quite close to this with ElementFactory. I've been 
slightly reluctant to post JUMBO code for this part because JUMBO has evolved
rather than been planned (it wasn't intended to be graphical to start with :-)

The basic steps are:
	- parse the document into a Tree of Nodes (actually Elements at present)
		This is all that can be done with a DTD-less document.
		If NXP or Lark is given as an argument, JUMBO will use them
		as the parser.  It creates Elements as it encounters them (even
		with Lark - this is historical).
	- if a DTD is given, it downloads a *.class file for that DTD. [This is
		resolved locally at present, but if we agree on catalogs and 
		other naming conventions, then we can resolve it globally.
	- the class file gives a list of ElementTypes ('GI's) for which there
		are *.class files available.  Thus in PLAYDTD.class there are
		references to STAGEDIRNode.class, SPEECHnode.class.  **This does
		NOT have to have a class for each type unless that is seen as
		essential.  The default Node methods are used.
	- if a Node has a GI in the DTD class, it is specifically created.  Thus
		the PLAYDTD.class has code like:

	if (gi.equals("SPEECH")) {
		node = DTD.createSubclassedNode("SPEECH", content, attributes);
	} else {
		node = new Node(content, attributes);
	}

Then the subclassed Nodes have node-specific methods, and display() will show
specific icons, etc.

This is done at, or immediately after, parse time.  So JUMBO will create a 
subclassed Node from a generic Lark element if required.  If this is what 
ElementFactory does, then great!

There are the following performance hits:
	(a) it is slower to parse since the specialised nodes are created
	at that time
	(b) all the specialised code is loaded at parse time even if the user
	doesn't require it.  Since performance is hit by code size, some 
	applications run very slowly.  
So perhaps there needs to ba a lazy creation of specialised Elements??  IOW
everything is generic until it's actually referenced, when it gets a 
specialised Element from the factory.

Maybe I will post the code for PLAYDTD if it would help the process.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From n.bradley at pindar.co.uk  Fri Jun 27 10:43:30 1997
From: n.bradley at pindar.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:06 2004
Subject: XML and HTML Intermixed
Message-ID: <v03007801afd93baeb7ea@[193.123.165.122]>

>[Bob Seibel:]
>
>| Let's say I used a style sheet to display the contents. It seems to me
>| that using HTML tags intermixed with XML tags is a good thing. I don't
>| have to reinvent my own tags when HTML already defines them.
>
>You can mix tags all you want; with the exception of a handful of
>reserved names, the XML name space belongs to you.  But this means
>that "UL" has no more meaning to an XML processor than "SOLUTION"
>does; in both cases, you must use some other mechanism to specify the
>semantics.  The most common mechanisms are going to be Java classes or
>stylesheets.

Is there a general assumption that the browser vendors will support XML in
the near future? If this is so, I would think that HTML tags can be
avoided. Just use cascading style sheets on XML tags instead.

The one big exception to this would be tables, which are not covered by CSS
(yet?). I know I have mentioned this before, but if browser vendors are
going to lead the XML revolution, and I think (or at least hope) they are,
then we should expect that they would want to retain their investment in
HTML tables, rather than adopt some new standard. Could we not therefore
accept this reality, but to maintain flexibility state that the Table
element definition must include a fixed attribute, say -XML-TABLE, when
this assumption should be made.

_________________________________________________________
       Neil Bradley, SGML Consultant, Pindar plc

                      Author of
             "The Concise SGML Companion"
      Addison-Wesley Longman (ISBN: 0-201-41999-8)

The third-rate mind thinks with the majority;
the second-rate mind thinks with the minority;
the first-rate mind is only happy thinking (A. A. Milne)

Tel:   +44 (0)1904 330162
EMail: neil@bradley.co.uk
URL:   http://www.bradley.co.uk
_________________________________________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Fri Jun 27 13:44:33 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:58:07 2004
Subject: XML and HTML Intermixed
In-Reply-To: <8543@ursus.demon.co.uk>
Message-ID: <Nn+R1EAwr3szEwGG@light.demon.co.uk>

In message <8543@ursus.demon.co.uk>, Peter Murray-Rust
<Peter@ursus.demon.co.uk> writes
>...
>(e) Can I use XML-ised HTML and include XML-LINKs to other XML
documents?
>       Yes, if the HTML has been extended to use XML-LINK.  This is
what I
>do to avoid namespace clashes.  It may have its detractors. Be warned
that 
>there is not much software which can display XML documents using two
different
>DTDs at the same time; I'm working out how JUMBO will do this - if I
get some
>answers to my LINK queries it should be fairly straighforward.
>...
>To summarise, I believe that mix-and-match from different DTDs is a valid and
>useful approach to XML.  It means that there can be 'islands of validity'
>[an idea from the WG] within XML documents, so that XML-WF docs will not
>be semantically void tag soup.  The difficulty at present is how those
>islands are identified - there is no consensus yet.

I would suggest, looking at the XML-Link spec, that the clean way to mix
and match is to use simple links with the attribute specifications: 

        SHOW="EMBED" ACTUATE="AUTO"

Have your chunk of HTML as a separate document (which can be valid or
well-formed, as you wish), and just point to it. This is a fragment from
an object record in a museum catalogue, where the artist's biographical
details are stored in a separate XML document with a different DTD:

<production>
 <person ROLE="artist">Mathias, William
  <description HREF="mathiasw.xml" SHOW="EMBED" ACTUATE="AUTO"/>
 </person>

On hitting the empty <description> element, the XML processor will go
off and read mathias.xml. This will be parsed separately, and probably
held separately in memory.  It is a genuinely separate document with its
own namespace and so on.  But _for_the_purposes_of_display_and_processin
g_ it is 'inserted' into the source document at the point the
<description> element occurs.  And ACTUATE="AUTO" says that it is, in
effect, a necessary part of the catalogue record.

If you want to physically embed a chunk of HTML into your document which
conforms to a non-HTML DTD, surely you have to extend your DTD as Peter
has done. And once you have put the HTML element types into your DTD,
you're not really 'mixing and matching' in the sense of this discussion:
just borrowing a bunch of element types from another DTD.

Richard Light

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From akirkpatrick at ims-global.com  Fri Jun 27 15:57:42 1997
From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com)
Date: Mon Jun  7 16:58:07 2004
Subject: Lark 0.90 available, with an applica
Message-ID: <E0whbWp-0002rw-00@punch.ic.ac.uk>

Sorry if this has been over before, but these are my
thoughts on the class-per-element-type idea (mentioned
recently in Tim Bray's post about Lark).

I did something very similar recently (admittedly in C++)
and abandoned it. My application was an SGML->RTF
convertor. It read the events using SP and created a tree
of elements derived from SGMLElement but specialised
towards RTF. The hierarchy looked something like:

  SGMLElement
      RtfFile
      RtfContainer
      RtfPara
         RtfTitle
            RtfTitleTarget
         RtfAdmonition
      RtfInline (parametrised)
         RtfLink
      etc.

I found the following drawbacks:

1. Leads to "class spaghetti" with similar code being spread
all over the place.

2. There is usually a large degree of dependence between the
elements and the driving application. Often the elements need
to access the driving application directly and there is no obvious
and efficient way provide this interface.

3. You need to create a new class for each new element type
(less of a problem in Java?). For C++, this means recompiling
the application.

It was actually when I looked at the prospect of creating a whole
new raft of classes for the HTML output that I decided to start again.
I rewrote my application to use the follow process:

1. SgmlReader reads document and creates tree of generic elements.
Each element has an SgmlRule member variable/class.

2. SgmlStylesheet reads a stylesheet (also in SGML) and associates
properties with the elements based on gi, position, etc. These properties
are added to the SgmlRule for each element.

3. RtfWriter/HtmlWriter/XxxWriter recursively processes the elements
deciding what to do based on the properties applied by the stylesheet.

(I realise this is similar to the way Jade operates but our RTF writer
also handles WinHelp and has other output/app-specific features).

Ideally, this should be generalised further with a SgmlElementPlusRule
class which just contains a pointer to the SgmlElement and the SgmlRule
(otherwise the SgmlElement has a dependency on SgmlRule).

The stylesheet mechanism is (just about) indendent of the output format.
All the code to handle RTF/HTML/whatever is centralised in the XxxWriter
class. I've found this much easier to enhance and maintain than the   
previous
implementation. I've also found that 90% of the time we can do things   
with
the stylesheet without recompiling the application.

I'd be really interested to hear views in favour of the class approach.

Alfie.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eliot at isogen.com  Fri Jun 27 16:17:01 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:07 2004
Subject: Lark 0.90 available, with an application
Message-ID: <3.0.32.19970627075258.00f35098@mail.swbell.net>

At 05:04 PM 6/26/97 -0700, Tim Bray wrote:
>For Xh, I also, after getting it working, realized that I had re-used
>Peter Murray-Rust's trick of just having a .class per element-type
>(Class.forName() and Class.newInstance(), gotta love 'em) - I wonder if
>this is just a coincidence or is this the basic paradigm on which XML 
>software is going to be built?  If so, it might make sense to wire
>a standard class-finder call into XAPI.

It makes sense to have one class per element type (after all, you usually
have distinct types because they have distinct semantics and thus distinct
behavior). You can take this one step further if you add architectures:
your element-type-specific classes can themselves be derived from
arch-form-specific classes.

In other words, given an architecture hierarchy, it should be natural to
define a corresponding implementing class hierarchy.  If your processor has
a fallback scheme for mapping element types to objects that includes
mapping element types to the objects for the architectural forms from which
they are derived when they don't have their own object, it should be
possible to build fairly generic processors for common architectures that
can be quickly applied to new documents simply by adding the architectural
mapping to the documents.

For example, in Tim's case he's mapping to HTML.  It's probably the case
that most of the mapping is a simple one-to-one mapping, which can be
represented by deriving the base document from HTML as an "architecture".[I
say "architecture" because HTML is not really suitable as an architecture
as it is not sufficiently general--in particular, the lack of generic,
nesting divisions with generic titles makes it difficult, if not
impossible, to derive from HTML document types that themselves use
recursive divisions because the mapping to HTML is dependent on the nesting
context.  You could use the SGML implicit LINK feature to do such a
mapping, but I'm not suggesting that as a general solution.]  Not all the
mappings are this simple, but probably 80% are.  Given this, instead of
having one object class per element type in the base documents, you could
have one class per HTML "form" plus unique classes only for those base
elements that require more complex mappings.  If you've got a DTD, you can
do most of the mapping there:

<!ELEMENT Chapter - - (CTitle, Section+) >
<!ATTLIST Chapter
    HTML  NAME  #FIXED "div"
>
<!ELEMENT CTitle   - - (#PCDATA)* >
<!ATTLIST CTitle
   HTML   NAME  #FIXED "h1"
>

Thus, expressed in procedural syntax (I don't know Java), you can have
logic like:

switch (element_type()) {
  case "specialized-A":
     map_specialized_A();
     break;
  case "specialized-B":
     map_specialized_B();
     break;
  default:
    if (if_derived_from_arch(current_node(),"HTML") {
       # Is element derived from the HTML arch?
       switch (arch_form(current_node(),"HTML") {
          case "html":
            print "<html>";
            break;
          case "h1"
            print "<h1>"
          default:
            # no mapping
       }
    }
}

Of course, without a formal mechanism to refer to a hierarchy of
architectures in XML (e.g., using data attributes as defined by the AFDR of
the HyTime standard [review temporarily at
www.drmacro.com/hythtml/clause-A.3.html]), you can only define one level of
architectural inheritance, but that's probably enough to simplify 80% of
the mappings people need to do.

It should be clear as well that browsers can provide easy-to-invoke default
styles by defining an HTML-like architecture that reflects their formatting
semantics and then provide built-in architectural recognition for at least
that architecture (this is essentially what we do when we down-translate to
HTML--we're invoking the browsers' built-in formatting semantics associated
with the HTML "architecture"--but there's no reason the transform can't be
done in the browser.  But note that HTML itself (at least in its current
form) won't work for the reasons given above.

Cheers,

E.
     
--
<Address HyTime=bibloc>
W. Eliot Kimber, Senior Consulting SGML Engineer
Highland Consulting, a division of ISOGEN International Corp.
2200 N. Lamar St., Suite 230, Dallas, TX 95202.  214.953.0004
www.isogen.com
</Address>

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Fri Jun 27 16:33:01 1997
From: nmikula at edu.uni-klu.ac.at (Norbert Mikula)
Date: Mon Jun  7 16:58:07 2004
Subject: Lark 0.90 available, with an application
In-Reply-To: <3.0.32.19970627075258.00f35098@mail.swbell.net>
Message-ID: <Pine.OSF.3.93.970627162429.950H-100000@edusrv.edu.uni-klu.ac.at>

On Fri, 27 Jun 1997, W. Eliot Kimber wrote:

> At 05:04 PM 6/26/97 -0700, Tim Bray wrote:
> >For Xh, I also, after getting it working, realized that I had re-used
> >Peter Murray-Rust's trick of just having a .class per element-type
> >(Class.forName() and Class.newInstance(), gotta love 'em) - I wonder if
> >this is just a coincidence or is this the basic paradigm on which XML 
> >software is going to be built?  If so, it might make sense to wire
> >a standard class-finder call into XAPI.
> 
> It makes sense to have one class per element type (after all, you usually
> have distinct types because they have distinct semantics and thus distinct
> behavior). You can take this one step further if you add architectures:
> your element-type-specific classes can themselves be derived from
> arch-form-specific classes.

I have also had a similar idea a few days ago.*1
I would like to know whether you guys think it makes
sense to go even further and have also this kind of
calls for attributes and other potential "nodes" in 
our parse tree. I would think so.

Now the question remains if this approach should substitute
the event base stream that built the bottom layer of
our XAPI-J discussion. I think the event based approach
should still form the base. Many people, I believe,
feel still very comfortable with it.

*1 http://www.lists.ic.ac.uk/hypermail/xml-dev/9706/0133.html

Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ebaatz at barbaresco.East.Sun.COM  Fri Jun 27 16:41:37 1997
From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS)
Date: Mon Jun  7 16:58:07 2004
Subject: Character encoding questions
Message-ID: <libSDtMail.199706271040.17603.ebaatz@barbaresco>

Oh, oh (as my 18-month old daughter says :-).  Your email message
addresses me as if I was an expert on SGML's and XML's use of character
sets.  I am not, so I will not be attempting to answer your queries.
Fortunately, your email copied xml-dev, and many SGML and XML experts
exist there.  I just want to explicitly invite them to respond to your
queries, as I, alas, cannot.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Jun 27 20:45:27 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:07 2004
Subject: XML and HTML Intermixed
Message-ID: <8576@ursus.demon.co.uk>

In message <Nn+R1EAwr3szEwGG@light.demon.co.uk> Richard Light writes:
> In message <8543@ursus.demon.co.uk>, Peter Murray-Rust
> <Peter@ursus.demon.co.uk> writes
> >...
> >(e) Can I use XML-ised HTML and include XML-LINKs to other XML
> documents?
> >       Yes, if the HTML has been extended to use XML-LINK.  This is
> what I
> >do to avoid namespace clashes.  It may have its detractors. Be warned
> that 
> >there is not much software which can display XML documents using two
> different
> >DTDs at the same time; I'm working out how JUMBO will do this - if I
> get some
> >answers to my LINK queries it should be fairly straighforward.
> >...
> >To summarise, I believe that mix-and-match from different DTDs is a valid and
> >useful approach to XML.  It means that there can be 'islands of validity'
> >[an idea from the WG] within XML documents, so that XML-WF docs will not
> >be semantically void tag soup.  The difficulty at present is how those
> >islands are identified - there is no consensus yet.
> 
> I would suggest, looking at the XML-Link spec, that the clean way to mix
> and match is to use simple links with the attribute specifications: 
> 
>         SHOW="EMBED" ACTUATE="AUTO"
> 
> Have your chunk of HTML as a separate document (which can be valid or
> well-formed, as you wish), and just point to it. This is a fragment from
> an object record in a museum catalogue, where the artist's biographical
> details are stored in a separate XML document with a different DTD:

This is exactly what I was suggesting above in (e).  I only didn't put in the
details because I have posted them in gory detail a few postings ago under
'Re: XML-LINK'.  
> 
> <production>
>  <person ROLE="artist">Mathias, William
>   <description HREF="mathiasw.xml" SHOW="EMBED" ACTUATE="AUTO"/>
>  </person>
> 
> On hitting the empty <description> element, the XML processor will go
> off and read mathias.xml. This will be parsed separately, and probably
> held separately in memory.  It is a genuinely separate document with its
> own namespace and so on.  But _for_the_purposes_of_display_and_processin
> g_ it is 'inserted' into the source document at the point the
> <description> element occurs.  And ACTUATE="AUTO" says that it is, in

This is what I am waiting for guidance on :-).  Some people such as Eliot
(and I as a humble follower), see 'resource' as a point.  Others appear to
use 'resource' to represent a finite piece of information.  If the latter
is, in fact, the ERB's view, then the question of where to 'insert' the
other information is critical.

If you find my earlier analysis useful, I'd be grateful for comments as this 
would give me confidence to implement it (or not!).
 
	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Jun 27 20:45:37 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:07 2004
Subject: Lark 0.90 available, with an application
Message-ID: <8577@ursus.demon.co.uk>

In message <Pine.OSF.3.93.970627162429.950H-100000@edusrv.edu.uni-klu.ac.at> Norbert Mikula writes:
> On Fri, 27 Jun 1997, W. Eliot Kimber wrote:
> 
[...]
> 
> I have also had a similar idea a few days ago.*1
> I would like to know whether you guys think it makes
> sense to go even further and have also this kind of
> calls for attributes and other potential "nodes" in 
> our parse tree. I would think so.

Yes.  Definitely.  The more of this that can be generalised, the better.
Essentially quite a lot of JUMBO is involved in this sort of processing
and I'd be more than happy to try to migrate JUMBO's ideas towards an API.
My Node class (== Element, more or less) has nearly 100 member functions,
and I'll try to post them as a javadoc API (just needs locating on the WWW).


> 
> Now the question remains if this approach should substitute
> the event base stream that built the bottom layer of
> our XAPI-J discussion. I think the event based approach
> should still form the base. Many people, I believe,
> feel still very comfortable with it.

I feel very comfortable with the event stream API and I would certainly
not substitute it.  Essentially JUMBO can consume Elements from either an
NXP-like event stream, or a Lark-like tree structure.  It may be useful to
identify those objects such as Element and Attribute that are relevant to both
environments, and there could be an Element/Attribute Factory sitting on top
of both (but leaving them exposed as well for those who need the lower level).

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Jun 27 20:46:03 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:07 2004
Subject: Lark 0.90 available, with an applica
Message-ID: <8578@ursus.demon.co.uk>

In message <E0whbWp-0002rw-00@punch.ic.ac.uk> akirkpatrick@ims-global.com writes:
> Sorry if this has been over before, but these are my

No, it's a new and useful discussion :-)

> thoughts on the class-per-element-type idea (mentioned
> recently in Tim Bray's post about Lark).
> 
> I did something very similar recently (admittedly in C++)
> and abandoned it. My application was an SGML->RTF
> convertor. It read the events using SP and created a tree
> of elements derived from SGMLElement but specialised
> towards RTF. The hierarchy looked something like:
> 
>   SGMLElement
>       RtfFile
>       RtfContainer
>       RtfPara
>          RtfTitle
>             RtfTitleTarget
>          RtfAdmonition
>       RtfInline (parametrised)
>          RtfLink
>       etc.
> 
> I found the following drawbacks:

I think the primary problem is that the mapping of SGML to RTF is formally
impossible.  If the SGML application was MathML, the content might be a second
order differential equation; if CML, it might be the active site of HIV 
protease.  Neither of these has the concept of 'paragraph' :-)

It's very common that people use 'SGML' as a shorthand for 'a-conventional
human-readable-textual-document-marked-up-with-a-set-of-tags-that-make-textual
sense'.  They then devise SGML2XYZ translators.  These can only be generic if
the have heuristics about how commonly encountered markup maps onto XYZ
constructs.

JUMBO has a small number of such heuristics.  It tries to find the title of
an Element (for display) as follows:
	- use the TITLE attribute
	- else find a child with TITLE elementType
	- else use the ID attribute
	- else take the first 30 characters of PCDATA
	- else take the elementType
but this is only to try to help human navigators - it's not a formal 
transformation.

> 
> 1. Leads to "class spaghetti" with similar code being spread
> all over the place.

This isn't necessary if inheritance is used.  JUMBO has a superclass Node which
has default procedures (e.g. getTitle() above).  By default all Elements display
or are processed using this.  There are a lot of useful defaults a Node can 
have.
> 
> 2. There is usually a large degree of dependence between the
> elements and the driving application. Often the elements need
> to access the driving application directly and there is no obvious
> and efficient way provide this interface.

No.  In JUMBO there is very little coupling between subclassed Nodes and
JUMBO.  Yes, they have to be subclassed from Node, because that's what they
are, but beyond that they have their own behaviour (or none).
> 
> 3. You need to create a new class for each new element type
> (less of a problem in Java?). For C++, this means recompiling
> the application.

My MOLNode class is 1500 lines of Java because molecules are complex.  There
are routines like orthogonaliseFractionalCoordinates, getMolecularWeight,
countHydrogenAtoms, etc.  These would have to be written whatever structure was 
used.  There is actually very little duplicated code.   Similarly Matrix, Graph
and so forth require distinct code.

If classes share common functions then they can be subclasses of an 
intermediate class.  Thus in PLAYDTD, both ACT and SCENE could be subclassed
from PlayDivision.  This class would know that both ACT and SCENE had a child
TITLE.  [Indeed they might both be instances of PlayDivision directly.]  
Many elements can get by with just the generic Node class.

> 
> It was actually when I looked at the prospect of creating a whole
> new raft of classes for the HTML output that I decided to start again.
> I rewrote my application to use the follow process:
> 
> 1. SgmlReader reads document and creates tree of generic elements.
> Each element has an SgmlRule member variable/class.
> 
> 2. SgmlStylesheet reads a stylesheet (also in SGML) and associates
> properties with the elements based on gi, position, etc. These properties
> are added to the SgmlRule for each element.
> 
> 3. RtfWriter/HtmlWriter/XxxWriter recursively processes the elements
> deciding what to do based on the properties applied by the stylesheet.
> 
> (I realise this is similar to the way Jade operates but our RTF writer
> also handles WinHelp and has other output/app-specific features).

It sounds as if you would be better off using DSSSL, since it handles
transformations.  It's possible to do the same thing in Java - and probably
takes the same amount of code - but you may need to define some formatting
classes (Div, Para, etc.).

> 
[...]
> 
> I'd be really interested to hear views in favour of the class approach.

I hope I've given some above.  Wherever the object is complex, then it makes
sense for its behaviour to be attached closely to it.  I wouldn't like to
write a 3-D geometry program in DSSSL (though it would be possible) just as
I'd prefer not to do typesetting in Java.

The difficult part comes with element-in-context.  If an element has different
behaviours in different contexts, then code can become hairy.  This is often a 
problem with CML-like DTDs where there are only 10-20 elements per DTD.  

The other difficult bit is with relations between objects.  This can be managed
generically with XML-LINK, but usually semantics have to be added.  I am 
trying to make XML-LINK as generic as possible in JUMBO, but I suspect there
will be places within one Node where links to anoth have to be specifically
considered.

	HTH 

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Jun 27 21:27:10 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:07 2004
Subject: API and JUMBO
Message-ID: <8584@ursus.demon.co.uk>

Following the discussion of the API and the elements-as-classes I have posted
my API for JUMBO as javadoc classes.  Since there are over 3000 member functions
please excuse some awful documentation - some of it was done at an early stage!
(Also I haven't been able to copy the javadoc icons).

It's at:

http://www.vsms.nottingham.ac.uk/vsms/java/jumbo/api/

and the most relevant file there is

http://www.vsms.nottingham.ac.uk/vsms/java/jumbo/api/sgml.html

(It's only called SGML because I started it before XML:-)

The key class is SGMLNode, possibly followed by SGMLTree and DTD and 
SGMLAttlist.  If you read the API I hope you will get some feel for the sorts of
member functions that I have found necessary.  Note that JUMBO is tree-oriented,.
[The graphical functions are in DrawableSGMLNode and SGMLTOC, so avoid those
if you are interested in abstract functions only.]

I'd value comments, and I'll try to help with the admittedly bad documentation.
In my defence, I had no idea where all this was going when I started - JUMBO
was not intended to be graphical, an editor, or to support hyperlinks :-)

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Fri Jun 27 22:55:51 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:58:07 2004
Subject: XML and HTML Intermixed
In-Reply-To: <8576@ursus.demon.co.uk>
Message-ID: <PuU5kDADIBtzEwP9@light.demon.co.uk>

In message <8576@ursus.demon.co.uk>, Peter Murray-Rust
<Peter@ursus.demon.co.uk> writes
>
>This is what I am waiting for guidance on :-).  Some people such as Eliot
>(and I as a humble follower), see 'resource' as a point.  Others appear to
>use 'resource' to represent a finite piece of information.  If the latter
>is, in fact, the ERB's view, then the question of where to 'insert' the
>other information is critical.

Well, the XML-Link draft defines 'resource' as "an addressable unit of
information or service which is participating in a link", e.g. "files,
images, documents, programs and query results". In that sense, surely, a
'resource' is definitely a finite piece of information rather than a
point.

I think I can understand the 'point' point-of-view, in that linking (to
XML documents or elements within them) always addresses nodes, which are
points. However, the XML locator syntax carefully ensures that the
target of a link within an XML document is always an element (or
sometimes a little clutch of elements) - it can never be part of an
element (as it can with TEI extended pointers).  

So your 'point' is always an element node(s) in the tree structure.
This being the case, I have assumed myself that the intention is to _be
able to_ treat the target resource (element) as a finite thing which can
be delivered to the client.  

Surely the wording for ?XML-XPTR= syntax shows an intent to actually
deliver the whole element: "... the host should perform the XPointer
processing to extract the sub-resource [= element], and that only the
sub-resource should be transmitted to the client".

Richard Light.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Fri Jun 27 23:42:01 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:07 2004
Subject: Lark 0.90 available, with an applica
Message-ID: <3.0.32.19970627143941.00a6c890@pop.intergate.bc.ca>

At 02:08 PM 27/06/97 +0000, akirkpatrick@ims-global.com wrote:
>I did something very similar recently (admittedly in C++)
>and abandoned it...
>
>1. Leads to "class spaghetti" with similar code being spread
>all over the place.

In the XtoH application, the ElementLogic class from which all the 
element classes are subclassed has an atStart(), an atEnd(), and
a doText().  In a lot of cases, the atStart/atEnd amounted to "emit the
following string, interpolating the following attribute values".  So
yes, a lot of parallelism, but this seemed a fair price to pay for
the independence and modularity.

>2. There is usually a large degree of dependence between the
>elements and the driving application. Often the elements need
>to access the driving application directly and there is no obvious
>and efficient way provide this interface.

Not always true.  I don't do C++, but in Java, after the controller
cooks up the per-element object, he calls its method 
registerController(this) - the per-element classes all have
a mController member, thus they can callback to the controller.
The amount they had to do so was pretty small.

>3. You need to create a new class for each new element type
>(less of a problem in Java?). For C++, this means recompiling
>the application.

Non-problem in Java... in fact, you don't even need to know what
you've got when you start; when you find a new element, you can
dynamically see if there's a class for it.

>I'd be really interested to hear views in favour of the class approach.

Why I wrote this.  I would say that while we'd all prefer a declarative
stylesheet approach, it is my belief that in a lot of cases it's going
to be common to use, at least occasionally, some per-element custom
logic.  Java makes this easy enough to be very appealing as a general
framework.

 - Tim


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Sun Jun 29 03:39:07 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:07 2004
Subject: Lark 0.90 refreshed, argh
Message-ID: <3.0.32.19970628183455.00aa22f0@pop.intergate.bc.ca>

It wanted to use the olde-fashioned <!--* *--> comments.  Big hole in
the test suite. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From north at synopsys.com  Mon Jun 30 11:35:22 1997
From: north at synopsys.com (Simon North)
Date: Mon Jun  7 16:58:07 2004
Subject: Request: more SGML restriction explanations
Message-ID: <199706300935.LAA25658@cadis.de>

Please accept my apologies if these are non-developmental questions, 
and my heartfelt thanks to all that have answered my earlier 
question. 

I am trying to establish a few points and would appreciate some 
enlightened answers ... 

1. Is <!ENTITY #DEFAULT ""> allowed? This was really useful SGML 
and would probably be pretty handy in XML. 

2. Are MS, MD, STARTTAG and ENDTAG forbidden in declarations?

3. What exactly is meant by "no attribute value specs on ENTITY 
declarations"? 

4. I'm not allowed data attributes on NOTATIONs, and I'm not allowed 
name groups in ATTLISTs, but can I cheat and use the following:

    <!ATTLIST #NOTATION element_name attribute_name attribute_spec> 

or would this be illegal too? Could I get rouind it by using a data 
attribute spec?


Thanks in advance, 

Simon North


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From elm at arbortext.com  Mon Jun 30 16:07:24 1997
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun  7 16:58:07 2004
Subject: Request: more SGML restriction explanations
Message-ID: <3.0.32.19970630100938.00ad3220@village.doctools.com>

At 11:35 AM 6/30/97 +0000, Simon North wrote:
>Please accept my apologies if these are non-developmental questions, 
>and my heartfelt thanks to all that have answered my earlier 
>question. 
>
>I am trying to establish a few points and would appreciate some 
>enlightened answers ... 
>
>1. Is <!ENTITY #DEFAULT ""> allowed? This was really useful SGML 
>and would probably be pretty handy in XML. 

#DEFAULT entity declaration isn't allowed as part of XML.

>2. Are MS, MD, STARTTAG and ENDTAG forbidden in declarations?

Yes; bracketed text entities aren't allowed as part of XML.

>3. What exactly is meant by "no attribute value specs on ENTITY 
>declarations"? 

In SGML, data attributes are declared for notations and their values
specified as part of NDATA entity declarations.  Data attributes are not
allowed as part of XML.

>4. I'm not allowed data attributes on NOTATIONs, and I'm not allowed 
>name groups in ATTLISTs, but can I cheat and use the following:
>
>    <!ATTLIST #NOTATION element_name attribute_name attribute_spec> 
>
>or would this be illegal too? Could I get rouind it by using a data 
>attribute spec?

Nope, not allowed as part of XML.  Sorry!

	Eve

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtigue at datachannel.com  Mon Jun 30 17:26:27 1997
From: jtigue at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:07 2004
Subject: XAPI-J refinement proposal
Message-ID: <33B7D06D.4A0DDD57@datachannel.com>

Currently in XAPI there is the interface IContent which has the methods
relevant to being a "node" in a parse tree/grove; accessors for parent
and children. Recent work has shown that there would be some benefit to
breaking IContent into IContent (child) and IContainer (parent). Also
there was feedback asking for addContent() to be extended to
appendContent() and insertContent(). This would look like:

package xml;
import java.util.Enumeration;
public interface IContainer
    {
    public Enumeration getContents();
    public void insertContent( IContent aContent, IContent
preceedingContent );
    //  appendContent() puts aContent at the end of the list
    public void appendContent( IContent aContent );
    public void removeContent( IContent aContent );
    }

AND

package xml;
import java.util.Enumeration;
public interface IContent
    {
    public void setParent( IContainer aContainer );
    public IContainer getParent();
    public String getData();
    }

So a Document class (not currently part of XAPI-J) would implement
IContainer but not IContent. IElement would implement both. A Text or
Data class would implement only IContent. I don't see how this
interferes with any existing processors. I hope I have not missed
anything.

Another point is IContent.getData(). This is how, for example, PCData
would be retrieved. Grove terminology refers to non marked up text as
"data" so we have getData(). Except for this detail the method could
just as well have been called getText() (which was my first choice),
getString(), or some such.

Any comments?

--
John Tigue
Programmer
jtigue@datachannel.com
DataChannel (http://www.datachannel.com)
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 316 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970630/14b98b7c/vcard.vcf