From dmeggins at uottawa.ca  Sat May  3 19:43:14 1997
From: dmeggins at uottawa.ca (David Megginson)
Date: Mon Jun  7 16:57:46 2004
Subject: Entity Value
Message-ID: <199705031139.HAA00326@localhost>

In WD-xml-lang-970331, the following production describes an entity
value:

[9] EntityValue ::= '"' ([^%&"] | PEReference | Reference)* '"'
                  | "'" ([^%&'] | PEReference | Reference)* "'"

Later on, however, clause 4.5 ("Predefined Entities"), states the
following:

  If the entities in question are declared, they must be declared as
  internal entities whose replacement text is the single character
  being escaped, as shown below:

  <!ENTITY lt		"<">
  <!ENTITY gt		">">
  <!ENTITY amp		"&">
  <!ENTITY apos		"'">
  <!ENTITY quot		'"'>

I'm afraid that I do not understand how do the entity values for &lt;
and &amp; satisfy the EntityValue production.  Am I missing something
elsewhere in the draft?


All the best,


David

-- 
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
University of Ottawa            dmeggins@uottawa.ca
        http://www.uottawa.ca/~dmeggins

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat May  3 23:40:54 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:46 2004
Subject: XML and HTML browsers
Message-ID: <6110@ursus.demon.co.uk>

I would like to re-use *existing* browser functionality rather
than continuing to extend the *generic* aspects of a browser in JUMBO.
I'm interested in exploring the general question of how a specialist
Java application interacts with a Java-enabled HTML browser.  I'm not an
expert here, but clearly Javascript is a potential solution.  (I hacked a
bit, and can't yet say I feel happy with the process - but perhaps that's 
because I haven't got the feel that JavaScript is going to continue to be
around and usable in a standard form.)  Anyway...

At present I can define my requirements quite simply:

I have a chunk of XML that I can transform into HTML and I want to show
it in the browser.  However, the browser must:
	- add hotspots to hyperlinks where appropriate
	- send me back a message/callback when a link is activated
	- allow me to:
		use paint(Graphics g) to a specific area of its screen
			(allowing for scrolling) OR:
		let me supply it with an IMG for that area
	- manage the multiplicity of windows (NEW, REPLACE, etc)
	- allow mouse events within 'my' graphics area and return them to me
	- allow me to post menus of some sort and return events
(There is probably stuff I have forgotten...)

There is also the question of spawning an XML helper application when the
*browser* encounters a text/xml file (which might turn out to have
DOCTYPE CML in it and so require a more specific helper).  Also, if
I have an XML-LINK to a *.html file (rather than an XML file) then can I
instruct the browser to display that and keep the XML application for
further action.

	This is not surprisingly a bit rambling - any light would be valuable.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tallen at sonic.net  Sun May  4 02:18:37 1997
From: tallen at sonic.net (Terry Allen)
Date: Mon Jun  7 16:57:46 2004
Subject: XDB 0.3 available (XMLized Docbook 3.0)
Message-ID: <199705040021.RAA22786@bolt.sonic.net>


Now available at

   http://www.sonic.net/~tallen/xdb03.zip

is a DTD derived from DocBook 3.0 that I believe is valid XML.  Thanks
to Norbert Mikula for pointing out that the %local.foo.foo; parameter
entities, which are defined with the content "", are illegal per
XML-lang production 46.  The sole difference between XDB 0.3 and
XDB 0.2 (which I've removed) is the deletion of these parameter
entities and an updated copyright notice.

This change blows away the DocBook customization mechanism.  The
purpose of distributing this DTD is solely to determine what 
constitutes a valid XML DTD.  Obviously some other method of
customization will be required (unless the SGML ERB can be
persuaded to relax its strictures on empty parameter entities).
The quick hack that comes to mind is to define these pe's as
containing a placeholder element ZZZZZ, which would be declared

<!ELEMENT ZZZZZ EMPTY>

without an ATTLIST.  As I remarked earlier, I'd be happy to
hear of other solutions.  As the DTD is fairly useless without
the customization mechanism, this version cannot be considered
progress on the road to a proper XMLlated DocBook that the
Davenport Group would want to distribute.  Mea culpa.

Regards,

  Terry Allen    Electronic Publishing Consultant    tallen[at]sonic.net
                   http://www.sonic.net/~tallen/
    Davenport and DocBook:  http://www.ora.com/davenport/index.html
          T.A. at Passage Systems:  terry.allen[at]passage.com 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Sun May  4 12:20:28 1997
From: nmikula at edu.uni-klu.ac.at (Norbert Mikula)
Date: Mon Jun  7 16:57:46 2004
Subject: XDB 0.3 available (XMLized Docbook 3.0)
In-Reply-To: <199705040021.RAA22786@bolt.sonic.net>
Message-ID: <Pine.OSF.3.93.970504115037.21721A-100000@edusrv.edu.uni-klu.ac.at>

On Sat, 3 May 1997, Terry Allen wrote:
 
> Now available at
> 
>    http://www.sonic.net/~tallen/xdb03.zip
> 
> is a DTD derived from DocBook 3.0 that I believe is valid XML.  

And NXP says "You bet it is" ;-)

I have used the latest release of NXP (not yet published) and
tried it with the "Blue" test file. 

I only needed to take care of filenames (problem with
case-sensitivity), but then everything went fine, at the 
first attempt !

Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dmeggins at uottawa.ca  Sun May  4 13:14:13 1997
From: dmeggins at uottawa.ca (David Megginson)
Date: Mon Jun  7 16:57:46 2004
Subject: [Correction] Entity Value
In-Reply-To: <132239552@toto.iv>
Message-ID: <199705041112.HAA00233@localhost>

David Megginson writes:

 >   If the entities in question are declared, they must be declared as
 >   internal entities whose replacement text is the single character
 >   being escaped, as shown below:
 > 
 >   <!ENTITY lt		"<">
 >   <!ENTITY gt		">">
 >   <!ENTITY amp		"&">
 >   <!ENTITY apos		"'">
 >   <!ENTITY quot		'"'>
 > 
 > I'm afraid that I do not understand how do the entity values for &lt;
 > and &amp; satisfy the EntityValue production.  Am I missing something
 > elsewhere in the draft?

That last paragraph should read

 > I'm afraid that I do not understand how do the entity value for
 > &amp; satisfies the EntityValue production.  Am I missing something
 > elsewhere in the draft?


Thanks,


David

-- 
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
University of Ottawa            dmeggins@uottawa.ca
        http://www.uottawa.ca/~dmeggins

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Sun May  4 20:04:47 1997
From: dgd at cs.bu.edu (David Durand)
Date: Mon Jun  7 16:57:46 2004
Subject: XDB 0.3 available (XMLized Docbook 3.0)
In-Reply-To: <199705040021.RAA22786@bolt.sonic.net>
Message-ID: <v03007802af927a0f4c24@[205.181.197.77]>

At 5:21 PM -0700 5/3/97, Terry Allen wrote:
>Thanks
>to Norbert Mikula for pointing out that the %local.foo.foo; parameter
>entities, which are defined with the content "", are illegal per
>XML-lang production 46.  The sole difference between XDB 0.3 and
>XDB 0.2 (which I've removed) is the deletion of these parameter
>entities and an updated copyright notice.
>
>This change blows away the DocBook customization mechanism.  The
>purpose of distributing this DTD is solely to determine what
>constitutes a valid XML DTD.  Obviously some other method of
>customization will be required (unless the SGML ERB can be
>persuaded to relax its strictures on empty parameter entities).

I want to ask what justification there is, if any, for ruling out empty
PEs? I don't remember discussion of this point clearly (though I do
remember shock when the rule was pointed out).

   This seems like a very bad idea, as Terry's desire for add entities in
is very reasonable.

   Is this fallout of the weird SGML interleaving of entity structure and
content model parsing?

   -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ht at cogsci.ed.ac.uk  Sun May  4 21:23:02 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun  7 16:57:46 2004
Subject: Entity Value
In-Reply-To: David Megginson's message of Sat, 3 May 1997 07:39:57 -0400
References: <199705031139.HAA00326@localhost>
Message-ID: <5316.199705041922@grogan.cogsci.ed.ac.uk>

I believe the spec. is inconsistent at the point David identifies, and
must be corrected to read

<!ENTITY amp	"&#38;">

I've told Tim and Michael this, but have never had a definitive reply.

ht

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Mon May  5 10:22:53 1997
From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula)
Date: Mon Jun  7 16:57:46 2004
Subject: Empty PE - was Re: XDB 0.3 available (XMLized Docbook 3.0)
References: <v03007802af927a0f4c24@[205.181.197.77]>
Message-ID: <336DFC2B.3F53@edu.uni-klu.ac.at>

David Durand wrote:
> I want to ask what justification there is, if any, for ruling out empty
> PEs? I don't remember discussion of this point clearly (though I do
> remember shock when the rule was pointed out).
> 
> This seems like a very bad idea, as Terry's desire for add entities in
> is very reasonable.

I do agree, that Terry's desire is very reasonable. However, if we can
find not a formal and concise way to express it we have problems. One of
the
objectives of XML was, that it should be "easy" to implement and it 
should incorporate "contemporary" disciplines of computer science like
formal languages etc. As you probably know, I use JavaCC, a Lex/Yacc
like
approach, to build NXP. I really had difficulties to transform this 
production to LL(1), and I am still not sure if there is a clean way to 
bring it to LL(n) (a way that I could live with).  

Furthermore it violates the general idea that %a should actually
satisfy S? a S?. With an empty PE this would not work.

<FN>As much as a first
liked the %a idea, when I had to implement it, and I still not satisfied 
with my current solution, it caused a lot of headache (ouuuch).</FN> 

So in short, I don't mind the idea of having empty PE's, if it is
possible
to implement/express it in a reasonable way. Any ideas would be
appreciated !

-- 
Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Mon May  5 15:25:02 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:46 2004
Subject: JUMBO
Message-ID: <6155@ursus.demon.co.uk>

Here is a summary of some progress with JUMBO.  My intention is to have
it all tidied by Barcelona.

JUMBO has its own parser (Mus Michaelis algorithm), but can use NXP via
a command-line switch (and will hopefully grok Lark in the same way in a
few hours).  This means that it gives a visual rendering of current XML files
assuming they parse with NXP or Lark.

Errors.  NXP does not throw catchable errors but (I think) produces a null 
output stream.  Lark requires JUMBO to handle doSyntaxError().  JUMBO has no
heuristics to turn a broken WF-document into something valid.  If the parser
writer wishes to pass JUMBO an Esis (NXP) or a Tree (Element*, Lark) then
JUMBO will treat that as valid if it isn't thrown an error.  Without being
thrown an error, or being passed an error flag, it's not easy for JUMBO
to know it's got one. 

If you think this list has become slightly sleepy, and you haven't been 
reading the WG discussions, you can always ask: 'How do we treat parse
errors?'.  The general feeling is it's an implementation matter, so we
have to have a means of passing it to applications.

JUMBO has been rewritten internally to remove the grotesque architecture
that I started with.  This has not added functionality, but I am at least
prepared to show some code in public.  JUMBO is now in a state where it is 
possible to use it to convert legacy files to WF-XML.  Some limited
validation of content model and attributes are also possible but they are
not automatically DTD-driven, because DTDs have a poor API for programming.

JUMBO implements XML-LINK as far as I understand it.  I have NOT done:
	- spans and '..' because I am unsure of the semantics
	- GROUP/DOCUMENT - because I can't see *what* to implement
	- some of the trickier bits of negative addressing in PREVIOUS, etc.
		because I'm waiting till that's stable and everyone actually
		agrees one its operation

Most XML-LINK implementation is application-dependent.  *I AM MISSING ANY 
EXAMPLES OF XML-LINK other than my own.*  I can implement my own, but they
are probably JUMBO-specific.  

JUMBO has primitive editing facilities, especially for WF-docs.  I have not
done attribute editing, because programming little boxes in Java is horrible.
In principle JUMBO can validate content models if it uses NXP's code, but I
need to discuss this with Norbert.

JUMBO is aimed at supporting *INFORMATION COMPONENTS* rather than traditional
DTDs (which are not much use in technical and scientific subjects).  An
information component is an Element linked to code for displaying, processing, 
etc. (I use Java).  JUMBO can manage many of the common information components
such as hypertext, images, tables, graphs, bibliography, etc.  If you are
interesting in learning more about this, I am launching an 8-week virtual
course on this at:
	http://www.vsms.nottingham.ac.uk/vsms/java
and a CDROM with the new JUMBO, examples, API, etc. will be included in the
course materials.  No previous knowledge of XML and Java is assumed, but
you need programming skills.  All necessary information is one the WWW pages.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dmeggins at uottawa.ca  Mon May  5 19:04:06 1997
From: dmeggins at uottawa.ca (David Megginson)
Date: Mon Jun  7 16:57:46 2004
Subject: More Entity-Value fun
In-Reply-To: <199705051627.LAA49280@tigger.cc.uic.edu>
References: <199705041112.HAA00233@localhost>
	<199705051627.LAA49280@tigger.cc.uic.edu>
Message-ID: <199705051702.NAA00381@localhost>

Here is another XML quandry: how can I declare an internal entity with
"25%" as its replacement text -- without using a character reference
-- when "%" is not allowed to appear in an entity value?  

Perhaps it would make sense to add &percnt; to the predefined entities
(4.5).


All the best,


David

-- 
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
University of Ottawa            dmeggins@uottawa.ca
        http://www.uottawa.ca/~dmeggins

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue May  6 02:11:07 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:46 2004
Subject: Strong Typing in SGML and XML
Message-ID: <3.0.32.19970505170907.009f46a0@pop.intergate.bc.ca>

Ever since about 15 minutes after SGML was born, database people have been
discovering, to their surprise, that it contains no facilities for
strong data typing.  You can have an element named <BIRTH-DATE>, and
SGML will have no problem accepting 

<birth-date>purple bananas rule</birth-date>.

Whenever more than two people start talking about the future of SGML, 
someone starts complaining about typing.  With the advent of XML, the 
volume has increased.  As an old database guy, I've been one of the loud 
complainers.  

While we're really not ready for this on the WG, it is something that
we're going have to do something about before too long.  So I've posted
a modest proposal at:

    http://www.textuality.com/xml/typing.html

Overview points:

1. This only types elements, not attributes.  It's easier.
2. It's based on SQL types, not HyTime lextypes.  That's what the
   database world is used to.  This could probably be implemented
   using lextypes.
3. The syntax for dates and so on should match some ISO standard,
   but I haven't found which one yet.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Tue May  6 11:29:53 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:46 2004
Subject: Strong Typing in SGML and XML
Message-ID: <6204@ursus.demon.co.uk>

In message <3.0.32.19970505170907.009f46a0@pop.intergate.bc.ca> Tim Bray writes:
> Ever since about 15 minutes after SGML was born, database people have been
> discovering, to their surprise, that it contains no facilities for
> strong data typing.  You can have an element named <BIRTH-DATE>, and
> SGML will have no problem accepting 
> 
> <birth-date>purple bananas rule</birth-date>.
> 
> Whenever more than two people start talking about the future of SGML, 
> someone starts complaining about typing.  With the advent of XML, the 
> volume has increased.  As an old database guy, I've been one of the loud 
> complainers.  


I agree fully with this proposal.  This also highlights one of the essential
aspects of XML-DEV, which is going to come up repeatedly.  This is that
there are things that the ERB/WG is going to consider in the future, but
people want ways forward right now.  XML-DEV provides a forum so that:
	- people can find what previous approaches already exist
	- groups of people can point in the same direction if they wish to
	- problems can be identified before the ERB/WG process, making that
		faster and more effective.

This is an area which I've had to address in CML.  CML uses strong-data-typing
but I made it up myself.  It has STRING, INTEGER, FLOAT, DATE and various
others that XML-LINK has made obsolete.  So it's very easy to change to
the approach suggested here.

Wherever possible concepts should be re-used and I like the use of SQL.
(I don't like *SQL*, but that's a different matter).  I'm assuming, Tim,
that some of the proposal was carried nearly verbatim, because parts of
it are slightly opaque to those who don't know the SQL standard.

> 
> While we're really not ready for this on the WG, it is something that
> we're going have to do something about before too long.  So I've posted
> a modest proposal at:
> 
>     http://www.textuality.com/xml/typing.html

Good start.  I don't think it needs expanding in scope, just some reworking
in places.

> 
> Overview points:
> 
> 1. This only types elements, not attributes.  It's easier.

Agree 100%.  I started with typed attributes and there is an enormous amount
of work in managing them as well as typed content.  You have to be able to
serach them, transform them (at least in CML), qualify them with attributes
and so on.

> 2. It's based on SQL types, not HyTime lextypes.  That's what the
>    database world is used to.  This could probably be implemented

What you have seems fine.  I assume that it is virtually an automatic 
translation.

>    using lextypes.
> 3. The syntax for dates and so on should match some ISO standard,
>    but I haven't found which one yet.

Do you mean you there are several and you haven't decided between them?
I thought that people had converged on a single one (I can't remember
the number, it's something like 8601).

Detailed points:

I don't find SQLSIZE 'obvious' - it's essentially the character-string 
length, and if starting from scratch it should be more like SQLMAXLENGTH.
But if everyone uses it and learns to love it, I suppose we have to.

In box 2 you have XML-MIN - I assume this is a typo.

I found SIZE, MIN and MAX, very confusing.  I *think* that the text is 
correct, but it's very easy to get lost.  Are we stuck with these?

4.5  Presumably SQLMIN<=SQLMAX? etc...

4.6 Reference to SQL SCALE was unclear.  Is there a requirement for SQLSCALE
as well or does this simply need rewriting.

4.7 I am not happy without exponential notation.  For example do we 
really have to represent Avogadro's number (6.023E+23) as
602300000000000000000000?  Surely we can use IEEE notation?

Is equality defined/definable for floating point?

4.8 I go along with 8601 or whatever it is.  That also defines TIME.

4.9 SQLSIZE was bad enough before.  Overloading it to manage the timezone
is really horrible.  Is this not defined in 8601 in which case we can use it?

4.10 Again I think this is covered by the ISO standard.

But this is an excellent start.  Again I raise the idea that XML should
introduce Generally Accepted Conventions.  This could be one.  
Later it might become part of the standard.  This way we help point people
in the right direction.

We have a lot of readers of XML-DEV.  This sort of area is an excellent one
to be contributing to.  Volunteers to summarise resources of this sort
(e.g. pointers to the ISO data standard, SQL datatyping, etc.) would be 
much appreciated.

	P.


> 
> Cheers, Tim Bray
> tbray@textuality.com http://www.textuality.com/ +1-604-708-9592
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 
> 
-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From DPawson at rnib.org.uk  Tue May  6 12:26:44 1997
From: DPawson at rnib.org.uk (David Pawson)
Date: Mon Jun  7 16:57:46 2004
Subject: DOCTYPE misunderstood
Message-ID: <s36f14c2.031@rnib.org.uk>

In developing some demonstration XML we have come
across
an issue we would like to resolve, perhaps the experts out
there 
could help.

>From a single source document, marked up in XML, we
need to produce 4 output transforms, braille, large print, html
and typeset.

Additionally, we want (for local use) to be able to 'create'
'document type' (our own definition).

Question: Should we be using the doctype as the switch,
or an input to the output processing application (perhaps as
a command line option).

Our definition on document type goes something along the 
lines of (for one particular use) - an editors note, a report,
a memo. [Seems logical to talk about document type in this
way].

The spec doesn't give 'usage hints' for doctype, what are 
the perceptions from the authors?

We are happy with selectable DTDs' (or not), but should
we be alterning the source document simply to obtain 
output variants? That seems to go against the idea of many
outputs from single source.

Advice would be appreciated

Regards, DaveP
dpawson@rnib.org.uk


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From chaotic at maths.tcd.ie  Tue May  6 16:08:26 1997
From: chaotic at maths.tcd.ie (Alan Spencer)
Date: Mon Jun  7 16:57:46 2004
Subject: DOCTYPE misunderstood 
In-Reply-To: Your message of "Tue, 06 May 1997 11:20:36 -0000."
             <s36f14c2.031@rnib.org.uk> 
Message-ID: <9705061508.aa23613@salmon.maths.tcd.ie>

In message <s36f14c2.031@rnib.org.uk> David writes:

> ...
> From a single source document, marked up in XML, we
> need to produce 4 output transforms, braille, large print, html
> and typeset.
> 
> Additionally, we want (for local use) to be able to 'create'
> 'document type' (our own definition).
> 
> Question: Should we be using the doctype as the switch,
> or an input to the output processing application (perhaps as
> a command line option).
> 
> ...
> Regards, DaveP
> dpawson@rnib.org.uk
> 
I would also be interested in an answer to this question as I am developing
a similar system.
Also, I have a similar question: Is it possible to have
a system where if a particular entity is required for rendering a 'document'
that it may be included from a master document. This question comes from
a problem (which I have solved very inelegantly using perl) which involves
many levels of document definition.
The system I have in place let's you define (currently through HTML)
properties of a whole set of documents, for example the background colour,
or maybe keywords, and a set of sets. I have implemented this as a tree
of documents where properties are inherited down from parent to child.
The system is very hacky and not too robust.

I could see XML being very useful in these (and many other) problems, and
for this reason I am trying to implement it using it. I think this may be a
question which concerns DSSSL and some sort of 'parent linking', but I'm
not familiar enough with the way these work to say.

Thanks,
Alan Spencer.

################################################################################
chaotic@maths.tcd.ie
http://www.maths.tcd.ie/~chaotic/
Trinity College Dublin - Maths Student.
################################################################################

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue May  6 19:24:11 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:47 2004
Subject: Strong Typing in SGML and XML
Message-ID: <3.0.32.19970506102133.007a1bf0@pop.intergate.bc.ca>

At 09:44 AM 5/6/97 GMT, Peter Murray-Rust wrote:

>> 3. The syntax for dates and so on should match some ISO standard,
>>    but I haven't found which one yet.
>
>Do you mean you there are several and you haven't decided between them?
>I thought that people had converged on a single one (I can't remember
>the number, it's something like 8601).

I mean I spent half an hour poking around the Web and didn't come
up with anything right away.  If someone will send me a pointer to
the standard syntax, I'll put it in the draft.

>I don't find SQLSIZE 'obvious'

OK, all of the types but one need a single parameter; each parameter
is numeric, except for DATE, which is a boolean for timezone
existence.  I didn't want to make up different attributes for each one.
Yes, it's hopelessly overloaded.  Maybe it should just be called
XML-SQLPARAM.  It is *not* the case that there is a single concept
in SQL to which all these parameters map.

>In box 2 you have XML-MIN - I assume this is a typo.

Right. 

>I found SIZE, MIN and MAX, very confusing.  I *think* that the text is 
>correct, but it's very easy to get lost.  Are we stuck with these?

Not stuck; this is the first ever draft.  Improvements welcome.

>4.5  Presumably SQLMIN<=SQLMAX? etc...

Yes.

>4.6 Reference to SQL SCALE was unclear.  Is there a requirement for SQLSCALE
>as well or does this simply need rewriting.

Scale of a decimal fraction is the number of digits to the right of
the decimal point.

>4.7 I am not happy without exponential notation.  For example do we 
>really have to represent Avogadro's number (6.023E+23) as
>602300000000000000000000?  Surely we can use IEEE notation?

Yes, this has to be supported.  Somebody else pointed that out too.

>Is equality defined/definable for floating point?

Yes, because in the real world, there are no real numbers [sorry,
math joke] - what I mean is that floating point numbers exist either
as fixed-size binary objects in computer storage, or as strings of
digits, decimal points, and exponents, also in storage.  Either
way, equality tests are meaningful.  Given good implementations of
the IEEE rules, they are even useful.

 -Tim

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ebaatz at barbaresco.East.Sun.COM  Tue May  6 20:24:28 1997
From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS)
Date: Mon Jun  7 16:57:47 2004
Subject: Why must an XML document contain an element?
Message-ID: <libSDtMail.9705061419.19450.ebaatz@barbaresco>

My application accepts plain text.  If its client wants it to do
a better job, it can markup the text using an XML syntax.

So, the client could want to send the application something like:

  This is plain text.
  
However, if the application is expecting XML markup, then it would
be nice if everything a client sent was an XML document.  So, for
the sake of clarity and consistency, I can force the client to send:

  <?XML version="1.0" encoding="UCS-2"?>
  This is plain text.
  
Well, that doesn't work, because that isn't a well-formed XML document
because it doesn't have an element, see:

  [23] document ::= Prolog element Misc*
  
So I could force the client to send:

  <?XML version="1.0" encoding="UCS-2"?>
  <foobar/>
  This is plain text.

where "foobar" is the client's choice of a lega name:

  [5] Name ::= (Letter | '_') (NameChar)*

But forcing the inclusion of characters that don't convey any
useful information to the application goes against my sense of
cleanliness.

Why must an XML document include at least one element?


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue May  6 20:47:39 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:47 2004
Subject: Why must an XML document contain an element?
Message-ID: <3.0.32.19970506114453.009f63f0@pop.intergate.bc.ca>

At 02:19 PM 5/6/97 -0400, Eric Baatz - Sun Microsystems Labs BOS wrote:

>  [23] document ::= Prolog element Misc*
>So I could force the client to send:
>  <?XML version="1.0" encoding="UCS-2"?>
>  <foobar/>
>  This is plain text.

This would still not work, because "This is plain text." doesn't
match the nonterminal 'Misc'.  You need:

<foobar>This is plain text.</foobar>

>But forcing the inclusion of characters that don't convey any
>useful information to the application goes against my sense of
>cleanliness.
>
>Why must an XML document include at least one element?

If you don't have any useful info to convey, then don't put in
the tag.  It's not XML, but the text is presumably still useful.

It is a defining characteristic of XML that any "character data", 
i.e. non-markup text, has to be part of an element.  In other words,
a document must have a logical structure, and all its text must
have a place in that logical structure.  One benefit: you know
unambiguously when the message has ended, without waiting for
sockets to close and so on.  One of the things that makes XML
processors simple is they can look simple-mindedly for begin and
end tags, no exceptions.

I can accept that there are tons of useful documents that do not
have an explicitly-marked up logical structure, and an important
place in the world for plain text.  And, we hope, an important 
place for XML.  But they're not the same thing. -Tim


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ebaatz at barbaresco.East.Sun.COM  Tue May  6 21:05:34 1997
From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS)
Date: Mon Jun  7 16:57:47 2004
Subject: Why must an XML document contain an element?
Message-ID: <libSDtMail.9705061500.25812.ebaatz@barbaresco>

After I had send my message, I was staring at 

  [23] document ::= Prolog element Misc*
  [36] element  ::= EmtpyElement | STag content ETag
  [35] content  ::= (element | PCData | ... | Comment)*

when the light went on and I said "Oh, everything in one
element.  Wish I hadn't sent that last message."

Thanks for such a quick, gentle, value-added response.

  
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Tue May  6 21:23:06 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:47 2004
Subject: Why must an XML document contain an element?
Message-ID: <6219@ursus.demon.co.uk>

Hi Eric,
	I see Tim has answered some of your queries.  I'll take another
(implied one).
In message <libSDtMail.9705061419.19450.ebaatz@barbaresco> Eric Baatz - Sun Microsystems Labs BOS writes:
> My application accepts plain text.  If its client wants it to do

I think you have an assumption here that you know what software will be 
processing your document at the other end.  So far that isn't defined in
XML - it may be later.  At present all that the client knows is:
	- the document is XML
	- *possibly* what the DOCTYPE is
	- *possibly* what stylesheets are associated with the document.

AT present there is no mechanism in XML for saying 'please process this document
with FOOBAZ software'.  That's more like a plugin requirement.  The most
that XML can say is:
	- please apply this stylesheet to the document.  And the stylesheet
		can have sophisticated algorithmic behaviour through DSSSL
	- OR please apply this behavior to the document (or some part of the
		document).  At present the syntax isn't defined.  My current
		approach in JUMBO is to apply a separate Java class per 
		element.  Other people may have different strategies.

Let's assume your document is
<FOO> This is the first line

and there was a newline
</FOO>
If you sent your document to JUMBO, it would capture the text including 
spaces and newlines and store it as a PCDATA element.  If you wanted to
output it it would output it as you sent it.  If you wanted to display
it it would look excatly the same.  If, however, you used <HTML> instead
it would try (rather crudely) to format it as HTML.  the newline would
disappear and newlines would be included in the display where the text
hid the right edge.  At present JUMBO is not sophisticated enough to manage
the DEFAULT|PRESERVE attribute - by default it's DEFAULT which is the
application's default w/s processing mode (which happens to be PRESERVE!!).
Remember also that a 'plain text' document has a lot of implied structure
which the application cannot be expected to pick up without careful
conventions.
> a better job, it can markup the text using an XML syntax.
> 
> So, the client could want to send the application something like:
> 
>   This is plain text.
>   
> However, if the application is expecting XML markup, then it would

I am not quite sure whether I understand your use of client and application.
My model is:
	WWW --->doc---> parser --> application
If you use a WWW browser (?client) to interface to the WWW, then you
might have:
	WWW--->doc---> browser -->parser --> application
Some people would call the whole of the client-side stuff a client, whereas
others might just use it for the browser.  I think this is an important 
point and have urged the XML community to try to identify these components
precisely.  For my own part, I separate parser and application in the
architecture, and this is a useful model.

	What does your application get from the browser/parser?  We're still
trying to work that out.  NXP gives me an Esis stream [Norbert, I need a 
handle to extarct the DOCTYPE, since that's not in Esis].  Lark gives me a 
root element of a tree, which I can navigate myself.  Some people want to
pass groves to the application, but I'm not sure of the status of those
developments.

	P.

> be nice if everything a client sent was an XML document.  So, for
> the sake of clarity and consistency, I can force the client to send:

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From peter at techno.com  Tue May  6 21:33:10 1997
From: peter at techno.com (Peter Newcomb)
Date: Mon Jun  7 16:57:47 2004
Subject: Strong Typing in SGML and XML
In-Reply-To: <3.0.32.19970505170907.009f46a0@pop.intergate.bc.ca> (message
	from Tim Bray on Mon, 05 May 1997 17:09:21 -0700)
Message-ID: <199705061929.PAA11039@exocomp.techno.com>

> Date: Mon, 05 May 1997 17:09:21 -0700
> From: Tim Bray <tbray@textuality.com>
> 
> Ever since about 15 minutes after SGML was born, database people have been
> discovering, to their surprise, that it contains no facilities for
> strong data typing.  You can have an element named <BIRTH-DATE>, and
> SGML will have no problem accepting 
> 
> <birth-date>purple bananas rule</birth-date>.

It seems to me that this sort of data typing can already be
accomplished:

<!NOTATION SQL-DATE PUBLIC "...//NOTATION SQL DATE//EN">
<!ELEMENT birth-date - - (#PCDATA)>
<!ATTLIST birth-date
   notation NOTATION (SQL-DATE) #FIXED SQL-DATE
>

Using data attributes, and HyTime's Data Attributes For Elements
(DAFE) facility (which acts as if the data content notation for an
element were an architectural form), one could implement a scheme like
the one you (Tim) propose:

<!NOTATION SQL-DATE PUBLIC "...//NOTATION SQL DATE//EN">
<!ATTLIST #NOTATION SQL-DATE
   sql-min CDATA #IMPLIED
   sql-max CDATA #IMPLIED
>
<!ELEMENT birth-date - - (#PCDATA)>
<!ATTLIST birth-date
   sql-type NOTATION (SQL-DATE) #FIXED SQL-DATE
   sql-min CDATA #FIXED "1900-01-01" -- some bogus restrictions --
   sql-max CDATA #FIXED "1999-12-31"
>

Or better yet:

<!NOTATION SQL-DATATYPE PUBLIC "...//NOTATION SQL DATATYPE//EN">
<!ATTLIST #NOTATION SQL-DATATYPE
   sql-type NAME #REQUIRED
   sql-size NUMBER 0
   sql-min CDATA #IMPLIED
   sql-max CDATA #IMPLIED
>
<!ELEMENT birth-date - - (#PCDATA)>
<!ATTLIST birth-date
   notation NOTATION (sql-datatype) #FIXED sql-datatype
   sql-type NAME #FIXED sql-date
   sql-min CDATA #FIXED "1900-01-01" -- some bogus restrictions --
   sql-max CDATA #FIXED "1999-12-31"
>

Of course, both of these require data attributes, which XML does not
(yet!) support.

-peter

--
Peter Newcomb                           TechnoTeacher, Inc.
233 Spruce Avenue                       P.O. Box 23795
Rochester, NY 14611-4041 USA            Rochester, New York 14692-3795 USA
+1 716 464 8696 (home)                  +1 716 464 8696 (direct)
+1 716 755 8698 (cell)                  +1 716 271 0796 (main)
+1 716 529 4304 (fax)                   +1 716 271 0129 (fax)
peter@petes-house.rochester.ny.us       peter@techno.com
http://www.petes-house.rochester.ny.us  http://www.techno.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ebaatz at barbaresco.East.Sun.COM  Tue May  6 22:19:42 1997
From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS)
Date: Mon Jun  7 16:57:47 2004
Subject: Why must an XML document contain an element?
Message-ID: <libSDtMail.9705061613.25462.ebaatz@barbaresco>

>  I think you have an assumption here that you know what software
>  will be processing your document at the other end.

Yes.  The markup is intended for private communication between
an application and  a speech synthesizer via an API.  I wouldn't
expect any kind of browser to be involved.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Tue May  6 23:10:24 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:47 2004
Subject: Why must an XML document contain an element?
Message-ID: <6223@ursus.demon.co.uk>

In message <libSDtMail.9705061500.25812.ebaatz@barbaresco> Eric Baatz - Sun Microsystems Labs BOS writes:
> After I had send my message, I was staring at 
> 
>   [23] document ::= Prolog element Misc*
>   [36] element  ::= EmtpyElement | STag content ETag
>   [35] content  ::= (element | PCData | ... | Comment)*
> 
> when the light went on and I said "Oh, everything in one
> element.  Wish I hadn't sent that last message."

I'm glad you did, because it raised some important issues.
XML-DEV, in the tradition of SGML, welcomes contributions from
thsoe who are exploring the language.  (comp.text.sgml and XML-WG
are littered with postings from me which I might have been better
to suppress :-).  It's very important that we get this traffic because
it shows where the presentation  of the language and its tools is
deficient.  If people find things hard to understand, then there is 
an onus on the documenters to make more effort.

Anyway, it's been far too quiet on this list :-)  We need to know how
people are finding XML and what they want from it.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Tue May  6 23:47:15 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:47 2004
Subject: Strong Typing in SGML and XML
Message-ID: <6228@ursus.demon.co.uk>

In message <3.0.32.19970506102133.007a1bf0@pop.intergate.bc.ca> Tim Bray writes:
> At 09:44 AM 5/6/97 GMT, Peter Murray-Rust wrote:
> 
> >> 3. The syntax for dates and so on should match some ISO standard,
> >>    but I haven't found which one yet.
> >
> >Do you mean you there are several and you haven't decided between them?
> >I thought that people had converged on a single one (I can't remember
> >the number, it's something like 8601).
> 
> I mean I spent half an hour poking around the Web and didn't come
> up with anything right away.  If someone will send me a pointer to
> the standard syntax, I'll put it in the draft.

ISO 8601.  Being ISO it isn't on the WWW, but there is a very concise
summary which I found at http://www.mcs.vuw.ac.nz/  - just look
for ISO8601 in the search engine.

It manages timezones within the date, and dates and times both absolute and
relative. 
> 
> >I don't find SQLSIZE 'obvious'
> 
> OK, all of the types but one need a single parameter; each parameter
> is numeric, except for DATE, which is a boolean for timezone
> existence.  I didn't want to make up different attributes for each one.
> Yes, it's hopelessly overloaded.  Maybe it should just be called
> XML-SQLPARAM.  It is *not* the case that there is a single concept

I much prefer this.  OTOH some might require two?

[...]
> 
> Yes, this has to be supported.  Somebody else pointed that out too.
> 
> >Is equality defined/definable for floating point?
> 
> Yes, because in the real world, there are no real numbers [sorry,
> math joke] - what I mean is that floating point numbers exist either
> as fixed-size binary objects in computer storage, or as strings of
> digits, decimal points, and exponents, also in storage.  Either
> way, equality tests are meaningful.  Given good implementations of
> the IEEE rules, they are even useful.

As always this has to be precisely specified.  It should be clear whether
a number in memory is being compared as its IEEE representation od something
else.

> 
>  -Tim

A general point about validation which I keep labouring and not making 
much headway is where does all this happen?  It can happen at authoring, 
at parsing, or at the application.  My concern is that unless this is defined
it's likely to fall through the net.  And having built strong typing into
CML, it's not always trivial to implement (in fact I'm sure it's not 
correct in places).  For example, should the system always hold a string
value regardless of the original type?  And if it converts back to a 
string representation presumably it should use the original string rather
than reconvert.  What happens with transformations is certainly not trivial,
because it can involve precision and output format.

	P.

> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May  7 03:00:44 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:47 2004
Subject: PCDATA
Message-ID: <6230@ursus.demon.co.uk>

I am trying to interface JUMBO with NXP and Lark.  I have bolted them both
in, but get different answers (I think) for PCDATA on WF documents.

How many PCDATA elements would be expected in the file?
<?XML VERSION="1.0"?>
<!DOCTYPE CML>
<CML>
<XVAR>
This is a variable
</XVAR>
</CML>

And what would be their values?

P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Wed May  7 03:15:30 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:47 2004
Subject: PCDATA
Message-ID: <3.0.32.19970506181242.009f2d80@pop.intergate.bc.ca>

At 01:21 AM 5/7/97 GMT, Peter Murray-Rust wrote:
>How many PCDATA elements would be expected in the file?
<?XML VERSION="1.0"?>
<!DOCTYPE CML>
<CML>
<XVAR>
This is a variable
</XVAR>
</CML>

Let's flatten that.  Clearly there can't be any PCDATA before <CML>, so:

<CML>\n<XVAR>\nThis is a variable\n</XVAR>\n</CML>
     11      2222222222222222222222       33

Three pieces of PCDATA.  Uh, I'll check Lark now... if it says anything
else, that's a bug. -T


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eric_albright at sprynet.com  Wed May  7 05:46:25 1997
From: eric_albright at sprynet.com (Eric Albright)
Date: Mon Jun  7 16:57:47 2004
Subject: Strong Typing in SGML and XML
Message-ID: <199705070346.UAA02957@m9.sprynet.com>

First, I'd like to concur with the need for a formal specification for data
typing.

I had hoped that HyTime's lextype feature would be sufficient. I for one
would like to hear from the HyTime experts about how they would implement
the parallel data typing. -- No use reinventing any standard. It may only
need simplifying and explaining.

Having said that, I ask when is strong data typing necessary? As far as I
can tell there is only one place where it is useful -- when the document is
being created or altered. There will always be data validation that cannot
be handled by data typing and as such must be delegated to a validating
application or a human. e.g.
<NAME><FIRST>Albright</FIRST><LAST>Eric</LAST></NAME>

As for comments about the proposal:

I would like to see a simplified version of the data types. It is very
important for databases to know the exact size in bytes that a data element
will occupy. SGML/XML deals with a character string and therefore does not
care. More important to me are the constraints on the data implicit by a
given type. I think we need to determine the types of constraints that each
data type requires and allow for the maximum flexibility without
sacrificing precision.

As far as I can tell, there are three basic types--character, numeric, and
temporal. Each type requires its own unique constraints:

CHARACTER - an alphabet, length constraint, content constraint (regular
expressions)

NUMERIC - a maximum value, a minimum value, some type of rounding/precision

TEMPORAL - a maximum value, minimum value, (the maximum and minimum values
may be constrained in relation to the current value), some type of
rounding/precision

I think that the CHARACTER data type should be able to specify the alphabet
and length constraint within the content constraint. However some
modification to the standard regular expression writing would be necessary.
I for one do not want to have to type
\([0-9][0-9][0-9]\)[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] for a phone number.
Perhaps \([0-9](+3)\)[0-9](+3)-[0-9](+4) would be better.

To allow maximum flexibility and precision for numeric values, we should be
able to specify the form (roman/arabic) and a base. The rounding allows us
to constrain the significant digits to some factor of the base. A rounding
type would be needed for the greatest flexibility (round/ceiling/floor).

Temporal values can specify either an instant of time or an extent of time.
They should also be able to be rounded. When an instant is rounded, the
significant digits are to the left; when an extent is rounded, the
significant digits are to the right. To signify that an instant is precise
to the nearest five years, it would be rounded to 0005/00/00 00:00:00. To
signify that an extent is precise to the nearest tenth of a second, it
would be rounded by 0000/00/00 00:00:00.1 .

Given this the "architectural form" for data typing would become:

<!ATTLIST?AnyElement
????XML-TYPE?	   (character|numeric|temporal)??#IMPLIED -- if omitted, 
                                                        default is
character
                                                        with no other
constraints 
                                                        applied --
    XML-TYPE-CONTENT CDATA                 #IMPLIED -- For CHARACTER types
only; 
                                                       default is no
constraint --
    XML-TYPE-MIN     CDATA                 #IMPLIED -- For
NUMERIC/TEMPORAL; 
                                                       default is no
constraint --
    XML-TYPE-MAX     CDATA                 #IMPLIED -- For
NUMERIC/TEMPORAL; 
                                                       default is no
constraint --
    XML-TYPE-ROUNDTO CDATA                 #IMPLIED -- For
NUMERIC/TEMPORAL; 
                                                       default is no
constraint --
    XML-TYPE-RNDMETH (round|ceiling|floor) #IMPLIED -- Round method;
                                                       For NUMERIC/TEMPORAL
                                                       default is "round"
--
    XML-TYPE-FORM    (roman|arabic)        #IMPLIED -- For NUMERIC;
                                                       default is "roman"
--
    XML-TYPE-BASE    CDATA                 #IMPLIED -- For NUMERIC;
                                                       default is "10" --
    XML-TYPE-TYPE    (instant|extent)      #IMPLIED -- required for
TEMPORAL --
>

This changes the number of attributes from 4 to 9 but provides for higher
precision for data constraint.

The examples would become:

For a bank loan; balance, interest rate, and maturity date: 

<!ELEMENT?BALANCE??(#PCDATA)?>
<!ATTLIST?BALANCE??XML-TYPE	?      CDATA?#FIXED?"NUMERIC"
???????????????????XML-TYPE-ROUNDTO? CDATA?#FIXED?"0.01"?
???????????????????XML-TYPE-MIN??    CDATA?#FIXED?"0.00"?>
<!ELEMENT?INTEREST?(#PCDATA)>
<!ATTLIST?INTEREST?XML-TYPE?     CDATA?#FIXED?"NUMERIC"?
                   XML-TYPE-MAX  CDATA #FIXED "100" -- in practice we may
want 
                                                       this to be much
lower --
???????????????????XML-TYPE-MIN??CDATA?#FIXED?"0"?>
<!ELEMENT?MATURITY?(#PCDATA)>
<!ATTLIST?MATURITY?XML-TYPE?         CDATA?#FIXED?"TEMPORAL"
                   XML-TYPE-TYPE?    CDATA #FIXED "INSTANT"
                   XML-TYPE-ROUNDTO  CDATA #FIXED "0000/00/01 00:00:00">

For an airline departure: passenger name, seat number, and departure time: 

<!ELEMENT?LAST-NAME?(#PCDATA)>
<!ATTLIST?LAST-NAME?XML-TYPE?        CDATA?#FIXED?"CHARACTER"
????????????????????XML-TYPE-CONTENT?CDATA?#FIXED?"[A-Z](*20)"?
                                           -- up to 20 repetitions of
[A-Z]-->
<!ELEMENT?FIRST-INITIAL?(#PCDATA)>
<!ATTLIST?FIRST-INITIAL?XML-TYPE?         CDATA?#FIXED?"CHARACTER"
????????????????????????XML-TYPE-CONTENT? CDATA?#FIXED?"[A-Z]"?>
<!ELEMENT?SEAT-ROW?(#PCDATA)>
<!ATTLIST?SEAT-ROW?XML-TYPE?         CDATA?#FIXED?"NUMERIC"
???????????????????XML-TYPE-MIN??    CDATA?#FIXED?"1"
???????????????????XML-TYPE-MAX??    CDATA?#FIXED?"36"
                   XML-TYPE-ROUNDTO  CDATA #FIXED "1"?>
<!ELEMENT?SEAT-LETTER?(#PCDATA)>
<!ATTLIST?SEAT-LETTER?XML-TYPE?         CDATA?#FIXED?"CHARACTER"
??????????????????????XML-TYPE-CONTENT  CDATA?#FIXED?"[A-F]" >
<!ELEMENT?DEPARTURE?(#PCDATA)>
<!ATTLIST?DEPARTURE?XML-TYPE?         CDATA?#FIXED?"TEMPORAL"?
                    XML-TYPE-TYPE?    CDATA #FIXED "INSTANT"
                    XML-TYPE-ROUNDTO  CDATA #FIXED "0000/00/00 00:01:00"
                                       -- to the nearest minute -->
<!ELEMENT?FLIGHT-TIME?(#PCDATA)>
<!ATTLIST?FLIGHT-TIME?XML-TYPE?         CDATA?#FIXED?"TEMPORAL"?
                      XML-TYPE-TYPE?    CDATA #FIXED "EXTENT"
                      XML-TYPE-ROUNDTO  CDATA #FIXED "0000/00/00 00:15:00"
                                        -- to the nearest 15 minutes -->


Well, what do you think?

Eric

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May  7 08:22:02 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:47 2004
Subject: PCDATA
Message-ID: <6256@ursus.demon.co.uk>

In message <3.0.32.19970506181242.009f2d80@pop.intergate.bc.ca> Tim Bray writes:
> At 01:21 AM 5/7/97 GMT, Peter Murray-Rust wrote:
> >How many PCDATA elements would be expected in the file?
> <?XML VERSION="1.0"?>
> <!DOCTYPE CML>
> <CML>
> <XVAR>
> This is a variable
> </XVAR>
> </CML>
> 
> Let's flatten that.  Clearly there can't be any PCDATA before <CML>, so:
> 
> <CML>\n<XVAR>\nThis is a variable\n</XVAR>\n</CML>
>      11      2222222222222222222222       33
> 
> Three pieces of PCDATA.  Uh, I'll check Lark now... if it says anything
> else, that's a bug. -T

No bug.  And Michael SMcQ gave the same answer.  I am not sure what NXP
gives at the moment, I'll have to check.  So *I*, and most of the people
who will be using CML, have a potentially serious problem and I don't know what 
to do.

Ancillary Question:
If this had been run through a validating parser and the DTD had contained
<!ELEMENT CML (FOO|XVAR)*>
I assume the above document would be invalid?  (#PCDATA does not occur in the
CML content model).

But am I not right in thinking that in SGML the 'additional' newlines
are discarded?  If I run this document through sgmls with the above
document, doesn't it validate?  (I'm doing this from memory, so please be 
gentle).  And at the same time throw away the 'spurious' #PCDATA elements?

Problem 1.
For a DTD which makes a restricted use of PCDATA, most documents are going to
have lines of hundreds or thousands of characters long.  The lines above
would have to be:
<CML><XVAR>This is a variable</XVAR></CML>
and this could easily - in some of my applications - be very much longer.
This makes such documents tricky to edit by hand and could cause problems
with some text processing software.

Problem 2.
It is going to be almost impossible to educate an HMTL2XML community that the
two documents above are different.  I have only just realised this problem
today, although I seem to remember in earlier versions of the spec the
behaviour was different?  So I now haven't the slightest idea what I should
be doing - and I thought this was all solved...

Problem 3.
This seems to imply that a WF document *produces different output* if it is 
validated against a DTD.  I accept this is true for SGML, but is it also
true for XML?  If so, I think we shall have an awful problem educating
people.


You will appreciate that I may have clung onto ideas which were parts of 
earlier versions of the draft.  I'd be very grateful for an 'extremely
simple' explanation of what happens with various input of the type above.
If it's what I think, then at the very least I think that the current draft
needs to address this more directly.  Personally I would like some sort
of XML-based switch that allowed a simple behaviour and allowed newlines
for formatting.

The spec says that DEFAULT says the the *application's* default white-space
processing modes (why plural?) are acceptable.  Is the application a DTD or
a program?  If the latter, then we are potentially going to have serious
problems.  If the former, then I don't see how the information is conveyed
from the DTD to the program, if the program is generic (like JUMBO).

P.  (somewhat confused :-).


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Wed May  7 08:22:35 1997
From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula)
Date: Mon Jun  7 16:57:47 2004
Subject: PCDATA
References: <3.0.32.19970506181242.009f2d80@pop.intergate.bc.ca>
Message-ID: <33709E46.157D@edu.uni-klu.ac.at>

Tim Bray wrote:
> 
> At 01:21 AM 5/7/97 GMT, Peter Murray-Rust wrote:
> >How many PCDATA elements would be expected in the file?
> <?XML VERSION="1.0"?>
> <!DOCTYPE CML>
> <CML>
> <XVAR>
> This is a variable
> </XVAR>
> </CML>
> 
> Let's flatten that.  Clearly there can't be any PCDATA before <CML>, so:
> 
> <CML>\n<XVAR>\nThis is a variable\n</XVAR>\n</CML>
>      11      2222222222222222222222       33
> 
> Three pieces of PCDATA.  Uh, I'll check Lark now... if it says anything
> else, that's a bug. -T

I do agree with Tim. I will also check with NXP tonight, to make
sure that this is the answer I get.

-- 
Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May  7 09:37:18 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:48 2004
Subject: Strong Typing in SGML and XML
Message-ID: <6259@ursus.demon.co.uk>

In message <199705070346.UAA02957@m9.sprynet.com> "Eric Albright" writes:
> First, I'd like to concur with the need for a formal specification for data
> typing.
> 
> I had hoped that HyTime's lextype feature would be sufficient. I for one
> would like to hear from the HyTime experts about how they would implement
> the parallel data typing. -- No use reinventing any standard. It may only
> need simplifying and explaining.
> 
> Having said that, I ask when is strong data typing necessary? As far as I
> can tell there is only one place where it is useful -- when the document is
> being created or altered. There will always be data validation that cannot

You may all regard this as poor design, but CML requires the documents to
carry the data types.  To save increasingly complex content models, CML
has only two elements to carry typed data, XVAR (a scalar) and ARRAY
(to carry large amounts of XVARs - an ARRAY looks like

<ARRAY TYPE="FLOAT" SIZE="3">1.2 2.3 3.4</ARRAY>

- remember that some arrays can run to several powers of 10).  At present 
CML uses 4 types (others are obsolete):  STRING, FLOAT,
INTEGER, DATE.  I agree that in principle I can convert to 
<INTEGER>, <FLOATARRAY> and so on, but it makes things more complex (and
the current processing software has to be rewritten.  However, if we are
working towards re-usable components and the whole of the XML community
says they like (say) 4 unique types, then in the interests of interoperability 
I would be shouting for that.  If they prefer to type their variables by
attribute, I'll shout for that.  Neither is trivial to process.
		
> be handled by data typing and as such must be delegated to a validating
> application or a human. e.g.
> <NAME><FIRST>Albright</FIRST><LAST>Eric</LAST></NAME>
> 
> As for comments about the proposal:
> 
> I would like to see a simplified version of the data types. It is very
> important for databases to know the exact size in bytes that a data element
> will occupy. SGML/XML deals with a character string and therefore does not
> care. More important to me are the constraints on the data implicit by a
> given type. I think we need to determine the types of constraints that each
> data type requires and allow for the maximum flexibility without
> sacrificing precision.

I understand the force of your argument.  For both your requirment and mine,
the question is 'should XML support this, or is it up to the "application"?'.
Personally I am in favour of XML steering people towards a common way of
doing things, whether it be in the spec, or Generally Accepted Conventions.
> 
> As far as I can tell, there are three basic types--character, numeric, and
> temporal. Each type requires its own unique constraints:
> 
> CHARACTER - an alphabet, length constraint, content constraint (regular
> expressions)
> 
> NUMERIC - a maximum value, a minimum value, some type of rounding/precision

Some people will feel that the INTEGER/FLOAT distinction is important.  I think
I can live without it.

> 
> TEMPORAL - a maximum value, minimum value, (the maximum and minimum values
> may be constrained in relation to the current value), some type of
> rounding/precision
> 
> I think that the CHARACTER data type should be able to specify the alphabet
> and length constraint within the content constraint. However some

Again I keep asking the XML community the question as to where these 
constraints are applied. Editor (obviously), parser(??), application 
(presumably?).

> modification to the standard regular expression writing would be necessary.
> I for one do not want to have to type
> \([0-9][0-9][0-9]\)[0-9][0-9][0-9]-[0-9][0-9][0-9][0-9] for a phone number.
> Perhaps \([0-9](+3)\)[0-9](+3)-[0-9](+4) would be better.
> 
> To allow maximum flexibility and precision for numeric values, we should be
> able to specify the form (roman/arabic) and a base. The rounding allows us
> to constrain the significant digits to some factor of the base. A rounding
> type would be needed for the greatest flexibility (round/ceiling/floor).
> 
> Temporal values can specify either an instant of time or an extent of time.
> They should also be able to be rounded. When an instant is rounded, the
> significant digits are to the left; when an extent is rounded, the
> significant digits are to the right. To signify that an instant is precise
> to the nearest five years, it would be rounded to 0005/00/00 00:00:00. To
> signify that an extent is precise to the nearest tenth of a second, it
> would be rounded by 0000/00/00 00:00:00.1 .

I assume this must be a frequently solved problem and we shouldn't try to 
reinvent it.  I someone more knowledgeable than me says - 'use the FOO
approach' I'll probably buy it if it's stable and implementable.

[...]
> 
> <!ELEMENTLAST-NAME(#PCDATA)>
> <!ATTLISTLAST-NAMEXML-TYPE        CDATA#FIXED"CHARACTER"
> XML-TYPE-CONTENTCDATA#FIXED"[A-Z](*20)"
>                                            -- up to 20 repetitions of

There has been a regular and repeated cry for regular expressions.  If 
someone comes up with one that is available, I'll buy it.  Surely one of the
very many readers of this list is authoritative about this?

This is a very critical discussion for me, and I expect for others and shows
some of the new things that XML will be used for.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From geirog at falch.no  Wed May  7 10:14:27 1997
From: geirog at falch.no (Geir Ove Gronmo)
Date: Mon Jun  7 16:57:48 2004
Subject: PCDATA
Message-ID: <3.0.1.32.19970507101433.0068fed4@falch.no>

At 02:13 07.05.97 +0100, Tim Bray wrote:
>At 01:21 AM 5/7/97 GMT, Peter Murray-Rust wrote:
>>How many PCDATA elements would be expected in the file?
><?XML VERSION="1.0"?>
><!DOCTYPE CML>
><CML>
><XVAR>
>This is a variable
></XVAR>
></CML>
>
>Let's flatten that.  Clearly there can't be any PCDATA before <CML>, so:
>
><CML>\n<XVAR>\nThis is a variable\n</XVAR>\n</CML>
>     11      2222222222222222222222       33
>
>Three pieces of PCDATA. 

Should this also be true if the XML-SPACE attribute is set to DEFAULT for
the CML element?

What would be the result of the following cases? Should there also be three
pieces of PCDATA, since PCDATA can be an empty string? ([16]?PCData::= [^<&]*)

<CML><XVAR>\nThis is a variable\n</XVAR></CML>

And what about this one:

<CML><XVAR></XVAR></CML>

This is probably some pretty stupid questions, but ... :')

</Gr0ve>

------------------  Geir Ove Gr?nmo  ------------------
  Falch Infotek as, Stanseveien 21, 0902 Oslo, Norway
      Phone: +47 22 90 27 36 Fax: +47 22 90 25 99
 [grove@falch.no | http://www.falch.no/people/geirog]
-------------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From chaotic at maths.tcd.ie  Wed May  7 11:41:54 1997
From: chaotic at maths.tcd.ie (Alan Spencer)
Date: Mon Jun  7 16:57:48 2004
Subject: Strong Typing in SGML and XML 
In-Reply-To: Your message of "Wed, 07 May 1997 07:26:06 GMT."
             <6259@ursus.demon.co.uk> 
Message-ID: <9705071041.aa05447@salmon.maths.tcd.ie>

In message <6259@ursus.demon.co.uk>Peter writes:

> > To allow maximum flexibility and precision for numeric values, we should be
> > able to specify the form (roman/arabic) and a base. The rounding allows us
> > to constrain the significant digits to some factor of the base. A rounding
> > type would be needed for the greatest flexibility (round/ceiling/floor).
> > 
> > Temporal values can specify either an instant of time or an extent of time.
> > They should also be able to be rounded. When an instant is rounded, the
> > significant digits are to the left; when an extent is rounded, the
> > significant digits are to the right. To signify that an instant is precise
> > to the nearest five years, it would be rounded to 0005/00/00 00:00:00. To
> > signify that an extent is precise to the nearest tenth of a second, it
> > would be rounded by 0000/00/00 00:00:00.1 .
> 
> I assume this must be a frequently solved problem and we shouldn't try to 
> reinvent it.  I someone more knowledgeable than me says - 'use the FOO
> approach' I'll probably buy it if it's stable and implementable.
> 
> [...]
> > 
> > <!ELEMENTLAST-NAME(#PCDATA)>
> > <!ATTLISTLAST-NAMEXML-TYPE        CDATA#FIXED"CHARACTER"
> > XML-TYPE-CONTENTCDATA#FIXED"[A-Z](*20)"
> >                                            -- up to 20 repetitions of
> 
> There has been a regular and repeated cry for regular expressions.  If 
> someone comes up with one that is available, I'll buy it.  Surely one of the
> very many readers of this list is authoritative about this?

Hi,
I'm certainly not an authority on regular expressions, but I have been using
the one in perl for many years now and I find it meets all of my requirements.
It can be a bit messy (but aren't all regular experssions!). I'm
sure most of you know how it works, it is quite like the one outlined above.
It may be too complicated for what is necessary, as I'm sure that is a goal
here, to make things as simple as possible and only as complicated as necessary.

The ideas may need to be changed a bit, but the underlying structure is
definitely there, the 'telephone number' example would be similar to that
suggested.

What is the plan as regards things not matching the constraints, I presume
it is just a strict error, ie. not a valid XML document. Is there any plans
to give a flexability to the rules, as to make corrupted data, for example,
parseable, as is the case with HTML, most browsers are fairly smart when
it comes to 'guessing'. This *is* a bad thing most of the time in HTML, as
it promotes guess-work on the part of the inexperienced author. I have 
experienced this with co-workers using WYSIWYG editors - 'It looks good
on my computer, what's wrong with yours'. So I suggest this very lightly,
I don't want to promote that.

As regards to the strong typing, could there be generic types which a particular
application/Style would define, or even go undefined throughout. There
are applications which work with arbitrary percision calcuations (like calc
on UNIX), this would need a generic *real* type. For example,
I have an interest in Mathematical formatting, simillar to that done by LaTeX,
but with a more structured approach, ie. these documents could be parsed as
formatte text or as real mathematical equations/functions/...
For example, in TeX the code: "x^{ijk}_{lmn}" will produce:
      ijk
     x
      lmn

This doesn't define what this *means*, just what it looks like, it could be
powers/indecies.... So if I was to try to define a generic variable *x* and
add the functionality to it, it would make sense.
If I am actually making sense, myself, any input on this would be helpfull.
As far as I see it, if there were generic types, or maybe *a* generic type,
people could extend the basic types using styles to add the necessary
functionality to these types.

I'm not if I am starting to tend towards a type of programming language, but
what the hell.

Thanks,
Alan Spencer.
> 
> This is a very critical discussion for me, and I expect for others and shows
> some of the new things that XML will be used for.
> 
> 	P.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May  7 12:40:44 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:48 2004
Subject: Strong Typing in SGML and XML 
Message-ID: <6268@ursus.demon.co.uk>

Hi Alan, 
	Thanks very much for your contribution.It raises several points.

In message  <9705071041.aa05447@salmon.maths.tcd.ie> Alan Spencer writes:
> In message <6259@ursus.demon.co.uk>Peter writes:
[...]
> 
> Hi,
> I'm certainly not an authority on regular expressions, but I have been using
> the one in perl for many years now and I find it meets all of my requirements.
> It can be a bit messy (but aren't all regular experssions!). I'm
> sure most of you know how it works, it is quite like the one outlined above.
> It may be too complicated for what is necessary, as I'm sure that is a goal
> here, to make things as simple as possible and only as complicated as necessary.

Tim Bray (ERB) has been looking for a RE tools for XML.  The point is (I
think, Tim) that they're not trivial to write and that it's critical that
everyone uses the same one.  So we don't want to build into XML a RE that
isn't easily available.  If someone says, 'here's one in 
Java/C/Scheme/what/ever' that has no copyright restrictions, I think we'd
make progress.
> 
> The ideas may need to be changed a bit, but the underlying structure is
> definitely there, the 'telephone number' example would be similar to that
> suggested.
> 
> What is the plan as regards things not matching the constraints, I presume
> it is just a strict error, ie. not a valid XML document. Is there any plans
> to give a flexability to the rules, as to make corrupted data, for example,
> parseable, as is the case with HTML, most browsers are fairly smart when
> it comes to 'guessing'. This *is* a bad thing most of the time in HTML, as
> it promotes guess-work on the part of the inexperienced author. I have 
> experienced this with co-workers using WYSIWYG editors - 'It looks good
> on my computer, what's wrong with yours'. So I suggest this very lightly,
> I don't want to promote that.

***ERB***
This matter has been discussed at very great length on the WG and the ERB
is closing in on a position.  ERB, I think it could be very useful to 
cross post your position here (or modify it appropriately).
***XML-DEV***
The treatment of errors is an extremely important issue, but it will not
be profitable to discuss it till the ERB has pronounced.  I would also
ask XML-DEV to accept that the ERB position has required much midnight oil
and to try not to repeat the discussions on XML-WG.


> 
> As regards to the strong typing, could there be generic types which a particular
> application/Style would define, or even go undefined throughout. There
> are applications which work with arbitrary percision calcuations (like calc
> on UNIX), this would need a generic *real* type. For example,
> I have an interest in Mathematical formatting, simillar to that done by LaTeX,
> but with a more structured approach, ie. these documents could be parsed as
> formatte text or as real mathematical equations/functions/...
> For example, in TeX the code: "x^{ijk}_{lmn}" will produce:
>       ijk
>      x
>       lmn
> 
> This doesn't define what this *means*, just what it looks like, it could be
> powers/indecies.... So if I was to try to define a generic variable *x* and
> add the functionality to it, it would make sense.
> If I am actually making sense, myself, any input on this would be helpfull.

You are!

I have been asking for some tie for 'parsable math' - i.e. something that
can be input to a machine, rather than being typeset for a human.  I 
accept that math is a wide spectrum and covers everything from research
maths papers to teaching 3 year-olds.  I doubt that a single DTD will
cover this.

The W3C group on math will report on may 15 (HTML-MATH).  This will be 
XML-compatible.  the group is aware of the need for interoperability with
other DTDs and the need for 'parsable' math.  I believe they will cross
post this list.

> As far as I see it, if there were generic types, or maybe *a* generic type,
> people could extend the basic types using styles to add the necessary
> functionality to these types.
> 
> I'm not if I am starting to tend towards a type of programming language, but
> what the hell.

I think it's very important to make sure that none of us re-invent the wheel.
I like Tim's idea of mining SQL for elements, not because I like SQL 
(I don't much) because lots of people have thought hard about it.  For the
same reason I have suggseted that dates be ISO8601 compatible, because
the authors of that have thought of most of the problems.  Similarly *if*
any of the math groups working on DTDs come up with recommmendations we
should treat them very seriously.  If there is (and there must be) a 
statndard for string representations of the types described here, let's
use that.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Guy.Teasdale at bibl.ulaval.ca  Wed May  7 15:26:16 1997
From: Guy.Teasdale at bibl.ulaval.ca (Guy Teasdale)
Date: Mon Jun  7 16:57:48 2004
Subject: Strong Typing in SGML and XML
Message-ID: <3.0.32.19970507082452.006b6100@hermes.ulaval.ca>

>> >> 3. The syntax for dates and so on should match some ISO standard,
>> >>    but I haven't found which one yet.
>> >
>> >Do you mean you there are several and you haven't decided between them?
>> >I thought that people had converged on a single one (I can't remember
>> >the number, it's something like 8601).
>> 
>> I mean I spent half an hour poking around the Web and didn't come
>> up with anything right away.  If someone will send me a pointer to
>> the standard syntax, I'll put it in the draft.
>
>ISO 8601.  Being ISO it isn't on the WWW, but there is a very concise
>summary which I found at http://www.mcs.vuw.ac.nz/  - just look
>for ISO8601 in the search engine.

You will find an article on this standard at:
http://www.ft.uni-erlangen.de/~mskuhn/iso-time.html
"A Summary of the International Standard Date
and Time Notation" by Markus Kuhn  
With other links at the end of this article. The link mentionned to Gary
Houston text doesn't work, try this one:
http://www.mcs.vuw.ac.nz/comp/Technical/SGML/doc/iso8601/ISO8601.html

The official source is in the ISO catalogue at:
http://www.iso.ch/cate/cat.html

ISO 8601:1988
Data elements and interchange formats -- Information interchange --
Representation of dates and
times 

Edition: 1 (monolingual) 
Number of pages: 14
Price code: G
ICS: 01.140.30
Descriptors: calendar dates, data representation, documentation, hours
(time), information interchange


Technical Corrigendum 1:1991 to ISO 8601:1988

Number of pages: 1


Last updated on 1997-05-03 
Guy Teasdale 						t?l: (418) 656-2131 - 2090
Biblioth?que de l'Universit? Laval			fax: (418) 656-7897
Sainte-Foy, Qu?bec
G1K 7P4						Guy.Teasdale@bibl.ulaval.ca

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May  7 15:44:46 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:48 2004
Subject: PCDATA
Message-ID: <6276@ursus.demon.co.uk>

In message <199705071303.JAA12106@nathaniel.ebt> gtn@eps.inso.com (Gavin Nicol) writes:
[...]
> 
> I would suggest that you application look to see of the PCDATA contains
> only a single \n, and if so, toss it.

Thanks for encouraging me on this.  I think it will be such a common 
occurrence that it should be in the XML-lang spec.  I will raise my
head over the parapet again...

> 
> >Problem 3.
> >This seems to imply that a WF document *produces different output* if it is 
> >validated against a DTD.  I accept this is true for SGML, but is it also
> >true for XML?  If so, I think we shall have an awful problem educating
> >people.
> 
> Yes. This is why I said we should keep *all* PCDATA; at least application
> will always know what to expect. RE delenda est (David Durand's and my idea)
> also get's around this problem nicely in a slightly different way.
> 
> I am somewhat dissatisfied with this apect of XML, but can live with it.
                                                         ^^^^^^^^^^^^^^^^^
I think this is true for anyone brought up in the tradition of SGML.  It's
much tougher for a webhacker.  It's not easy to realise that:
<CML>
<!-- a comment -->
<XVAR>...
inserts TWO separate newlines in the parser output from a WF document.  It
fooled me.  Most people would assume there weren't any.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Wed May  7 16:22:22 1997
From: nmikula at edu.uni-klu.ac.at (Norbert Mikula)
Date: Mon Jun  7 16:57:48 2004
Subject: PCDATA
In-Reply-To: <6256@ursus.demon.co.uk>
Message-ID: <Pine.OSF.3.93.970507161703.21299B-100000@edusrv.edu.uni-klu.ac.at>

On Wed, 7 May 1997, Peter Murray-Rust wrote:

> > >How many PCDATA elements would be expected in the file?
> > <?XML VERSION="1.0"?>
> > <!DOCTYPE CML>
> > <CML>
> > <XVAR>
> > This is a variable
> > </XVAR>
> > </CML>

I was running NXP with :

<?XML VERSION="1.0"?>
<!DOCTYPE CML> 
<CML>
<XVAR>
A variable
</XVAR>
</CML>

and the result was :

<CML>
"
"
<XVAR>
"
A variable
"
</XVAR>
"
"
</CML>

(\n is passed along to the application since the parser 
dosn't know what else to do with it.)

I also used the example with a simple DTD :

<?XML VERSION="1.0"?>
<!DOCTYPE CML [
<!ELEMENT CML (XVAR)*>
<!ELEMENT XVAR (#PCDATA)>
]>
<CML>
<XVAR>
A variable
</XVAR>
</CML>

and the result was :

<CML>
<XVAR>
"
A variable
"
</XVAR>
</CML>

-> the whitespace inside CML was recognized to be markup
only.

Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May  7 16:27:02 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:48 2004
Subject: Strong Typing in SGML and XML
Message-ID: <6280@ursus.demon.co.uk>

In message <199705071326.JAA12115@nathaniel.ebt> gtn@eps.inso.com (Gavin Nicol) writes:
> >Tim Bray (ERB) has been looking for a RE tools for XML.  The point is (I
> >think, Tim) that they're not trivial to write and that it's critical that
> >everyone uses the same one.  So we don't want to build into XML a RE that
> >isn't easily available.  If someone says, 'here's one in 
> >Java/C/Scheme/what/ever' that has no copyright restrictions, I think we'd
> >make progress.
> 
> RE processors are easy to implement, and there are a great number of
> them available for free. There are a number of specifications that
> could be used: I would recommend something like the POSIX ones,
> suitably extended.

Good.  Where is a volunteer to crack up a Java one?

> 
> I really do prefer the notation method though. It's much cleaner, and
> only a little more complex to implement.

I'm not arguing against the notation method, though to my limited eyes it
seems to need a revised draft?  The suggestion of regular expressions was
simply that *if* we got one for TEI pointers (as has been urged) we can
use the same one for this.  But if notations are harder that RE's then you
will probably have to look to someone else to implement it.

BTW - I am surprised that after nearly 3 months of this list there aren't
more people coming up with tools.  A lot of the stuff must already exist...


> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May  7 16:46:17 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:48 2004
Subject: PCDATA
Message-ID: <6289@ursus.demon.co.uk>

In message <Pine.OSF.3.93.970507161703.21299B-100000@edusrv.edu.uni-klu.ac.at> Norbert Mikula writes:
> On Wed, 7 May 1997, Peter Murray-Rust wrote:
> 
> > > >How many PCDATA elements would be expected in the file?
> > > <?XML VERSION="1.0"?>
> > > <!DOCTYPE CML>
> > > <CML>
> > > <XVAR>
> > > This is a variable
> > > </XVAR>
> > > </CML>
> 
> I was running NXP with :
[...examples deleted...]

Norbert's answers agree with what I got and also with the consensus
of the group.  It's clear that WF files can give *different* data from
those with some or all of the ELEMENT declarations.  I do not find the 
behaviour intuitive and believe we have to address it in some manner.

I am sympathetic to trashing the whitespace PCDATA elements, but there is
no clear idea of how.  An application like:
<PRE>
<IMG SRC="dot1">
<IMG SRC="dot2">
</PRE>
may wish the result to have 3 newlines as children (i.e. 5 elements in
all).  But equally an app may be frustrated by the extra elements.  It's
easy to ask for the TEI pointer "DESCENDANT(1,PRE)CHILD(1,*)" and expect to get
the dot1.  This can be criticised as bad style but it's as likely to arise
from ignorance rather than sloppiness.  

There has rightly been concern about the conformance of parsers (esp. their
reaction to errors).  This is an area where I suspect conformance is 
non-trivial.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Wed May  7 17:14:40 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:48 2004
Subject: Strong Typing in SGML and XML
In-Reply-To: <199705070346.UAA02957@m9.sprynet.com>
Message-ID: <Y+mwBHAOuFczEwFM@light.demon.co.uk>

In message <199705070346.UAA02957@m9.sprynet.com>, Eric Albright
<eric_albright@sprynet.com> writes
>
>Having said that, I ask when is strong data typing necessary? As far as I
>can tell there is only one place where it is useful -- when the document is
>being created or altered. There will always be data validation that cannot
>be handled by data typing and as such must be delegated to a validating
>application or a human. e.g.
><NAME><FIRST>Albright</FIRST><LAST>Eric</LAST></NAME>

>From a museum perspective, we have found the need for two types of data
validation/strong typing, which we call 'syntax control' and 'vocabulary
control'.  

Syntax control deals with things like the form of personal names.  These
are _not_ analysed in our application, but expressed in a consistent way
suitable for alphabetical sorting, e.g.:

        Light, Richard B.
rather than
        Richard B. Light

The syntax check would pick up non-capitalised words (apart from a 'stop
list' of known weak prefixes), inconsistent use of full stop and/or
spaces after initials, etc.  This starts to be hard work for a regular
expression, and might more easily be supported as a 'notation', for
which an external helper applet is called up in the context of editing.

Vocabulary control involves checking the data content against an
external authority, which could be a simple termlist or a complex
thesaurus.

Another use we make of data syntax is as a short-cut for markup.  (This
was before we knew about SGML, by the way!  The conventions were
originally devised to make optimal use of A5 catalogue cards ...)  We
use colons as a 'field separator', e.g.:

        <person>maker : Light, R.B.</person>
implies:
        <person>
                <role>maker</role>
                <persname>Light, R.B.</persname>
        </person>

and ampersands (definitely pre-SGML!) as keyword separators:

        <place>Burgess Hill & W. Sussex & U.K.</place>
implies:
        <place>
                <placename>Burgess Hill</placename>
                <placename>W. Sussex</placename>
                <placename>U.K.</placename>
        </place>

These practices tie in with the SGML concept of short references, which
are not available in XML.  So a general conclusion I have come to is
that ':' and '&' need to be mapped to suitable subelements, and our
users need to come to terms with more heavily tagged records than they
are used to.  

This is relevant (really!) in the context of Tim's suggestion that
strong typing should apply only to PCDATA-only elements.  In the more
general case of 'data validation' we might well want to validate
elements with substructure.

Richard Light
SGML and Museum Information Consultancy
richard@light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May  7 17:40:44 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:48 2004
Subject: PCDATA
Message-ID: <6298@ursus.demon.co.uk>

Thanks Gavin,

In message <199705071512.LAA12189@nathaniel.ebt> gtn@eps.inso.com (Gavin Nicol) writes:
> >Norbert's answers agree with what I got and also with the consensus
> >of the group.  It's clear that WF files can give *different* data from
> >those with some or all of the ELEMENT declarations.  I do not find the 
> >behaviour intuitive and believe we have to address it in some manner.
> 
> Agreed. I believe that RE delenda est solves the problems.

I am not sure that I was on board for this discussion (I have been told 
that whitespace occupied a large amount of bytes last year :-)  A summary
could be useful - it clearly has a good pedigree.  Is it a language or an
implementation issue?

> 
> >I am sympathetic to trashing the whitespace PCDATA elements, but there is
> >no clear idea of how.
> 
> The SGML rules are not always intuitive either....
> 
> >There has rightly been concern about the conformance of parsers (esp. their
> >reaction to errors).  This is an area where I suspect conformance is 
> >non-trivial.
> 
> Validation of parsers should *certainly* extend to grove construction
> as well as error handling.
> 
Yes.  For those not on the WG, Jon has informed us that the likely major
implementors are keen on conformance , so this must surely be an early issue.
It suggests that we shall need some test data and while this already exists
(torture) I am not sure that the outputs have been rigorously investigated.
Of course there is more than one type of output, and when I compare NXP's
output to Lark's I am comparing an Esis stream to a tree of Elements
(but not a complete grove).  

The discussion here and elsewhere makes it very clear that the *parser*
is a fundamental unit and that wherever possible it should be 
self-contained and independent of the 'application'.  That makes it even more
important for us to specify an API.

Please correct this, but I see three possible outputs from a parser:
	- a grove
	- an esis_stream
	- a tree of elements, possibly with PIs, attached to nodes.
We ought to be able to give outputs for each of these so that implementers
can check.

What concerns me at present is that some of the functions (e.g. XML-SPACE)
may vary with parsers and that this could be extremely difficult to pin
down in a monolithic application.  I'd recommend that what ever of the 
methods above is used, it should be possible to tap into them.

It's also clear that applications must recognise certain *attributes*.  At
present these seem to be:
	XML-SPACE
	XML-LINK 
	ROLE
	HREF
	TITLE
	SHOW
	ACTUATE
	BEHAVIOR

Because most of these are non-trivial (e.g. XML-SPACE extends to its
children, so they have to be stamped with it, but when editing a tree
the attribute may need to disappear from relocated children).  XML-LINK
is quite complex and affects content of elements (XML-LINK="EXTENDED").

Is there a case for, and is it possible to have, a PRE-application module
that deals with attributes and other generic stuff.  This would also
help people to converge on a single interpretation.  I's feel much happier 
about telling a pre-application with carefully argued semantics what to
do with whitespace or link structure validation than trusting to any old 
application.

	P.


> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May  7 22:41:48 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:48 2004
Subject: Strong Typing in SGML and XML
Message-ID: <6311@ursus.demon.co.uk>

In message <libSDtMail.9705071132.28775.altheim@mehitabel> altheim writes:
> Peter@ursus.demon.co.uk (Peter Murray-Rust) writes:
> > In message <199705071326.JAA12115@nathaniel.ebt> gtn@eps.inso.com (Gavin Nicol) writes:
[...]
> > > 
> > > RE processors are easy to implement, and there are a great number of
> > > them available for free. There are a number of specifications that
> > > could be used: I would recommend something like the POSIX ones,
> > > suitably extended.
> > 
> > Good.  Where is a volunteer to crack up a Java one?
> 
> Well, after reading Jeffrey Friedl's book "Mastering Regular Expressions"
> (O'Reilly), I would heavily caution everyone to make sure we advocate and
> develop to a *single* RE specification, as it seems very evident that there

I thought this was taken for granted - that there would be a single RE in
all XML-l[ai]n[gk] specifications.  I also assumed (naively?) that POSIX
defined such an RE, and we merely needed an implementation.  There might well
be subsidiary questions such as 'do we want to implement a subset', 'are there
any clashes between RE syntax and XML syntax', 'are PEs expanded before
evaluating the RE :-)', etc.

> is such a variance between the RE processors in perl, Tcl, sed, awk, vi, etc.
> that having RE inconsistencies among XML applications would be worse than
> having no RE support at all.

Fully agreed.

> 
> If we choose a code base that contains more RE features than the minimal
> set supported by all RE processors, we need to be clear which features
> are part of and required by XML. (This sounds like a mess to me.)

Since XML-LINK-TEI has shrunk since its first airing, I suspect that there
is a desire not to overreach.  Certainly we should not force implementors
to have to work hard to comply to unnecessary features.  (For all I know 
some REs would be sufficiently powerful to act as XML parsers :-)

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From murata at apsdc.ksp.fujixerox.co.jp  Thu May  8 03:48:24 1997
From: murata at apsdc.ksp.fujixerox.co.jp (Murata Makoto)
Date: Mon Jun  7 16:57:48 2004
Subject: Strong Typing in SGML and XML 
In-Reply-To: <6268@ursus.demon.co.uk>
Message-ID: <9705080103.AA00060@lute.apsdc.ksp.fujixerox.co.jp>

Peter Murray-Rust writes:
>Tim Bray (ERB) has been looking for a RE tools for XML.  The point is (I
>think, Tim) that they're not trivial to write and that it's critical that
>everyone uses the same one.  So we don't want to build into XML a RE that
>isn't easily available.  If someone says, 'here's one in 
>Java/C/Scheme/what/ever' that has no copyright restrictions, I think we'd
>make progress.

There is a great tool called Grail, developed by Darrell Raymond and 
Derick Wood.  Grail is available from the URL as below:

	http://www.csd.uwo.ca/research/grail/grail.html

One of the advantages of Grail is that you can modify the syntax of 
regular expressions.

However, Grail is not free (see below).

Makoto
Fuji Xerox Information Systems
 
Tel: 044-812-7230   Fax: 044-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp
-------------------------------------------------------------------------
Is Grail free?

No, Grail is not free. 

We don't charge scholars, students, or researchers for the use of Grail, and we don't charge people who simply want to play with it to satisfy their own curiosity. 

But no commercial use of Grail is permitted without our prior, express, written consent. No part of Grail may be included in a commercial product or used on a commercial problem without our prior, express, written consent. 

It's not that we have something against people making money---we just want to make sure that those who benefit financially from using Grail put some small part of that benefit back into the development and support of Grail. 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Thu May  8 04:11:34 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:48 2004
Subject: Strong Typing in SGML and XML 
Message-ID: <3.0.32.19970507190920.00a02470@pop.intergate.bc.ca>

At 11:20 AM 5/7/97 GMT, Peter Murray-Rust wrote:
>Tim Bray (ERB) has been looking for a RE tools for XML.  The point is (I
>think, Tim) that they're not trivial to write and that it's critical that
>everyone uses the same one.  So we don't want to build into XML a RE that
>isn't easily available.  If someone says, 'here's one in 
>Java/C/Scheme/what/ever' that has no copyright restrictions, I think we'd
>make progress.

Not quite... actually, there are several excellent free RE tools
around.  But I don't know of any good RE tools, free or commercial,
that support Unicode.  16-bit as opposed to 8-bit characters do 
change the scope of the problem.  It's not just regexp, we have
some tricky work to do if we want to do *anything* inside an
element, e.g. token or even character counting, in the 
internationalized environment. - Tim

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From flammia at sls.lcs.mit.edu  Thu May  8 04:18:36 1997
From: flammia at sls.lcs.mit.edu (Giovanni Flammia)
Date: Mon Jun  7 16:57:48 2004
Subject: regular expressions Java classes
Message-ID: <199705080218.WAA04866@maritimus.lcs.mit.edu>

A non-text attachment was scrubbed...
Name: not available
Type: text
Size: 684 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970508/578fc732/attachment.bat
From murata at apsdc.ksp.fujixerox.co.jp  Thu May  8 04:26:52 1997
From: murata at apsdc.ksp.fujixerox.co.jp (Murata Makoto)
Date: Mon Jun  7 16:57:49 2004
Subject: Strong Typing in SGML and XML 
In-Reply-To: <3.0.32.19970507190920.00a02470@pop.intergate.bc.ca>
Message-ID: <9705080227.AA00062@lute.apsdc.ksp.fujixerox.co.jp>

Tim Bray writes:
> But I don't know of any good RE tools, free or commercial,
>that support Unicode.  16-bit as opposed to 8-bit characters do 
>change the scope of the problem.  It's not just regexp, we have
>some tricky work to do if we want to do *anything* inside an
>element, e.g. token or even character counting, in the 
>internationalized environment. - Tim

Grail is type-parameterized.  So, you can create a regular expression 
over whatever classes.

Makoto
Fuji Xerox Information Systems
 
Tel: 044-812-7230   Fax: 044-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Thu May  8 17:21:48 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:49 2004
Subject: Strong Typing in SGML and XML
Message-ID: <3.0.32.19970508081643.00a02540@pop.intergate.bc.ca>

At 09:26 AM 5/8/97 -0400, Gavin Nicol wrote:
>Speaking of which. Do both Lark and NXP handle 16 bit characters?

Yes.  It's hard not to, in Java.  Mind you, Lark at the moment just
reads 8-bit characters from a file (through a method that you 
could subclass). -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From andrewl at microsoft.com  Thu May  8 19:25:21 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:57:49 2004
Subject: Strong Typing in SGML and XML
Message-ID: <7BB61B44F197D011892800805FD4F79291C579@RED-03-MSG.dns.microsoft.com>

Several people have written in recently asking, in effect, "What are the
purposes of this strong typing?  What needs does it solve?"  So I asked
around.  Here are the needs that have been advanced:

1.	Storage optimization.  Various clients want to be able to
optimize storage by keeping numbers in a binary format, strings in a
preallocated structure, etc.

2.	Implied semantics.  E.g. numbers can be added together, if you
know they are numbers. Also, knowing that a number is meant to have a
fixed versus floating precision affects how operations are performed,
what kind of precision is retained during calculations, what errors are
reported, etc.

3.	Parsing and formatting rules. Dates are expected to be in some
standard representation, such as given by ISO 8601 (e.g.).  Floating
point numbers permit scientific notation. Etc.

4.	Different data types need different supplementary attributes,
such as number of digits precision, total size in characters, whether
time zones are present, etc.  (In Tim's proposal, these all overload a
single generic attribute.)

5.	Range restrictions. Dates and other kinds of things measured in
numbers can be limited to a range of values.  All types can be
potentially limited to a set of descrete values (by enumeration or
rule).


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From andrewl at microsoft.com  Thu May  8 20:08:06 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:57:49 2004
Subject: Strong Typing in SGML and XML
Message-ID: <7BB61B44F197D011892800805FD4F7927DD92E@RED-03-MSG.dns.microsoft.com>

I accidently sent this message before completing it.  The full message
is here:


> Several people have written in recently asking, in effect, "What are
> the
> purposes of this strong typing?  What needs does it solve?"  So I
> asked
> around.  Here are the needs that have been cited:
> 
> 1.	Storage optimization.  Various clients want to be able to
> optimize storage by keeping numbers in a binary format, strings in a
> preallocated structure, etc.
> 
> 2.	Implied semantics.  E.g. numbers can be added together, if you
> know they are numbers. Also, knowing that a number is meant to have a
> fixed versus floating precision affects how operations are performed,
> what kind of precision is retained during calculations, what errors
> are
> reported, etc. Knowing that a string is meant to be a 
> URL gives hints on its use. Etc. 
> 
> 3.	Parsing and formatting rules. Dates are expected to be in some
> standard representation, such as given by ISO 8601 (e.g.).  Floating
> point numbers permit scientific notation. Etc. For example,   
> though the number "0.1234E+20" could have been represented as 
> "<mantissa>1234</mantissa><exponent>20</exponent>", and the date
> "19970508T10:47" could have been similarly broken into year, month,
> etc.,
> and this markup would eliminate the need for special parsers for
> numbers and
> dates, it has obvious readability and bloat problems.
> An explicit data type can signal what the internal elements are and
> how
> to parse for them without tags. 
> 
> 4.	Different data types need different supplementary attributes,
> such as number of digits precision, total size in characters, whether
> time zones are present, etc.  (In Tim's proposal, these all overload a
> single, generic attribute.)
> 
> 5.	Range restrictions. Dates and other kinds of things measured in
> numbers can be limited to a range of values.  All types can be
> potentially limited to a set of descrete values (by enumeration or
> rule). For example, an attribute expressing a color
> in terms of wavelength could be limited to 400..700 (nanometers). An
> attribute listing a US egg size could be limited to be among
> "medium," "large," "extra large" and "jumbo." If represented
> numerically,
> it could be limited to "1," "2," "3" and "4."  
> 
> 6.	Passthrough. Sometimes XML is a carrier syntax between systems
> where some participants need to convey implications not covered in
> points 
> 1 through 5 above. For example, a database may make a distinction 
> between CHAR and VARCHAR even though other readers of a document
> don't.
> 
> 
> 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From bosak at atlantic-83.Eng.Sun.COM  Thu May  8 23:33:56 1997
From: bosak at atlantic-83.Eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:57:49 2004
Subject: DOCTYPE misunderstood
In-Reply-To: <s36f14c2.031@rnib.org.uk> (message from David Pawson on Tue, 06 May 1997 11:20:36 +0000)
Message-ID: <199705082132.OAA11351@boethius.eng.sun.com>

[Dave Dawson:]

| From a single source document, marked up in XML, we
| need to produce 4 output transforms, braille, large print, html
| and typeset.

This seems clear enough.

| Additionally, we want (for local use) to be able to 'create'
| 'document type' (our own definition).

Now I'm not quite sure what you're saying.

| Question: Should we be using the doctype as the switch,
| or an input to the output processing application (perhaps as
| a command line option).

An input to the output processing application.

| Our definition on document type goes something along the 
| lines of (for one particular use) - an editors note, a report,
| a memo. [Seems logical to talk about document type in this
| way].

Right.  Note, report, and memo are three different types of documents.
They can be described using three different document type definitions
(DTDs) that list the tags and attributes to be used with each type and
specify structural rules for how they can be used.  This is a fairly
heavy SGML concept that is specified in sections 2.9, 3.2, and 3.3 of
the xml-lang draft but not in a way that anyone would be expected to
understand without quite a bit more explanation.  You don't need to
specify a DTD in XML, and if you don't, you can omit the DOCTYPE line
and just use an XML header.

Since you are trying to coordinate the efforts of multiple authors,
you will eventually have to learn about DTDs, because they are the
primary tool for organizing projects that use XML tagging for
large-scale publishing.  So one of these days you should get a book on
designing SGML DTDs and check out some of the principles.  You will
find that many of the less frequently used DTD constructs have been
left out of XML to make it easier to implement, so you will probably
find the job of learning to construct XML DTDs from an SGML book
rather frustrating, but that's life for early implementors.  By this
time next year we should have some good books on this subject; hang in
there.

In the meantime, you should NOT use doctypes as a way to switch
between output formats from the same source file.  The source file is
conceptually one document type regardless of the output format.

Since we're only about halfway into the larger XML effort, you
presently have a standard way to specify syntax (Part 1, xml-lang) and
a very preliminary standard way to specify hypertext linking (Part 2,
xml-link), but no standard way yet to specify style (that will be Part
3, xml-style) or other output processing.  Some people on this list
are working on Java APIs for XML that would provide one avenue of
standardization, and maybe one of them will jump in at this point and
explain more.

If you want to get a preview of the style language and see some very
simple XML DTDs, check out these examples:

http://sunsite.unc.edu/pub/sun-info/standards/dsssl/egs/10_mail/10_mail.zip
http://sunsite.unc.edu/pub/sun-info/standards/dsssl/egs/20_tstmt/20_tstmt.zip
http://sunsite.unc.edu/pub/sun-info/standards/dsssl/egs/21_shaks/21_shaks.zip

(or .tar.gz).

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Fri May  9 08:58:33 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:49 2004
Subject: DOCTYPE misunderstood
In-Reply-To: <199705082132.OAA11351@boethius.eng.sun.com>
Message-ID: <dc9jLEA+jsczEw38@light.demon.co.uk>

In message <199705082132.OAA11351@boethius.eng.sun.com>, Jon Bosak
<bosak@atlantic-83.Eng.Sun.COM> writes
>
>In the meantime, you should NOT use doctypes as a way to switch
>between output formats from the same source file.  The source file is
>conceptually one document type regardless of the output format.

I've been thinking about the issue of what comes at the head of an XML
document.  This may be stating the obvious, but ...

While it would be generally agreed that you can't gratuitously stick any
old <!DOCTYPE header onto a piece of well-formed XML, I think there is a
case for architecting XML so that you _can_ hold the naked XML without
_any_ header information, and prepend both DOCTYPE and style processing
instructions at delivery time.

One reason is that you might want to author a document in chunks, and
either publish/work with the chunks in their own right, or put those
chunks together via a 'master document' containing lots of entity
references to pull the chunks in.  For the first purpose, the free-
standing chunks will require a DOCTYPE header, not least so you can
create them in a structured XML-aware editor.  For the second purpose,
they need to be 'naked', since you can't pull in an entity with a
DOCTYPE at the beginning, and we don't have the SMGL SUBDOC facility in
XML.

Another reason is that you might have slightly variant DTDs for the same
conceptual document type, and a production process whereby the documents
start of conforming to say an author-friendly DTD, and then progress to
conform to a stricter 'delivery' DTD.  Again, this can only happen if
you can switch in a DTD at document load time.

However, the reason I started along this line of thought was based
around the much more comfortable area of output formats, i.e. style
sheets.  We certainly need an easy way to prepend instructions to bind a
style sheet to a document at delivery time, so that its style is not
bound into the DTD declaration.  A processing instruction 'up front'
would be the obvious way to do this:

<?XML version="1.0"?>
<?XML-STYLE ...>

Richard Light
SGML and Museum Information Consultancy
richard@light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri May  9 10:39:14 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:49 2004
Subject: DOCTYPE misunderstood
Message-ID: <6392@ursus.demon.co.uk>

In message <dc9jLEA+jsczEw38@light.demon.co.uk> Richard Light writes:
[...]
> 
> I've been thinking about the issue of what comes at the head of an XML
> document.  This may be stating the obvious, but ...
> 
> While it would be generally agreed that you can't gratuitously stick any
> old <!DOCTYPE header onto a piece of well-formed XML, I think there is a
> case for architecting XML so that you _can_ hold the naked XML without
> _any_ header information, and prepend both DOCTYPE and style processing
> instructions at delivery time.
> 
> One reason is that you might want to author a document in chunks, and
> either publish/work with the chunks in their own right, or put those
> chunks together via a 'master document' containing lots of entity
> references to pull the chunks in.  For the first purpose, the free-
> standing chunks will require a DOCTYPE header, not least so you can
> create them in a structured XML-aware editor.  For the second purpose,
> they need to be 'naked', since you can't pull in an entity with a
> DOCTYPE at the beginning, and we don't have the SMGL SUBDOC facility in
> XML.

This is a problem I have come up against, and still concerns me.  I would like
to encourage authors to create documents in small reusable chunks, the 
question being whether we use a construction like:

<!DOCTYPE CML [
<!ENTITY chunk1 SYSTEM "chunk1.cml">
... etc...
]>
<CML>
...
&chunk1;
</CML>

with the chunks (say) being:
<MOL>
...
</MOL>


or whether we use something like

<!DOCTYPE CML [
<!ENTITY mini1 SYSTEM "mini1.cml">
]>
<CML>
<XLIST XML-LINK="EXTENDED">
<XVAR XML-LINK="LOCATOR" ACTUATE="AUTO" SHOW="EMBED" HREF="&mini1;"></XVAR>
</XLIST>
</CML>

with mini1.cml being:

<!DOCTYPE CML>
<MOL>
...
</MOL>

Now, I wrote this latter on the fly, and it looks horribly clunky and it's
much more difficult to implement.  And is it *legal*? and will it do
what I want?  The advantage is that the mini version can be used in its
own right and we know what language it's in.  Chunks like:

<A>Foo
<B>Bar</B>
</A>

do not carry their DTD and also unwanted whitespace could easily creep in.
Constructions like:

<A
>Foo<B
>Bar</B
></A
>

might solve some, but not all of the whitespace problem.

Since this must be a Well Investigated Problem, insight would be useful.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From DPawson at rnib.org.uk  Fri May  9 12:16:05 1997
From: DPawson at rnib.org.uk (David Pawson)
Date: Mon Jun  7 16:57:49 2004
Subject: DOCTYPE misunderstood
Message-ID: <s37306c4.045@rnib.org.uk>

One line of my original message read something like
'What was the original intent' of DOCTYPE?

I love the idea of partitioning big docs to work on little ones.
This must be a good idea in any development.
Was there nothing in the thinking of the original geniuses
who started all this off? Or was it simply, this is the first
line of the spec, lets call it ....

#include works for me as a lower mortal, but it won't permit
me to compile an include file unless I draw up an empty
doc with the necessary gubbins in, then #include the
same file, simply to permit compilation.

Will the same mechanism work for XML,
i.e. 
<?XML version ....
<DOCTYPE empty ... dtd >
<empty>
#include sub-file <!-- you choose the words -->

</empty>

Sounds simple enough to do what I might want to do.

Come on gurus, what was it all about in the first place?
It wasn't that long ago that you have forgotten ... was it!

Regards, DaveP


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Boris.Moore at wanadoo.fr  Fri May  9 18:48:41 1997
From: Boris.Moore at wanadoo.fr (Boris Moore)
Date: Mon Jun  7 16:57:49 2004
Subject: XML and HTML browsers
Message-ID: <01BC5CA9.129BCC80@yellow-ami-145.wanadoo.fr>

On Sunday, 4 May 1997, Peter Murray-Rust wrote:

>>I would like to re-use *existing* browser functionality rather
>>than continuing to extend the *generic* aspects of a browser in JUMBO.
>>I'm interested in exploring the general question of how a specialist
>>Java application interacts with a Java-enabled HTML browser.

I cannot reply to the Java and JavaScript aspects of your questions, but I am struck by how closely your description of hoped for interaction between Jumbo and the built-in HTML rendering of the browser relates to work we at RivCom have doing on developing a Netscape Plug-in for XML.

The big difference, (which is why this is not a direct response to your questions), is that we are working in C++, not Java, and we are at the moment catering to Java _disabled_ browsers, and are therefore denying ourselves the use of JavaScript! 

Our plug-in, of which a prototype was demonstrated at the WWW6 XML demo session, takes an XML input stream, together with style-sheet data, and processes it, to generate different HTML streams for different Netscape instances, or different frames within Netscape. The user can click on hotspots or buttons, which send messages to the plug-in. This can result, for example, in modified style settings for one or more instances of one or more element types. (This can include contextual search criteria for the targeted elements). The plug-in then sends the resulting modified HTML to Netscape for display.

I anticipate that the plug-in will at a later point be split into two components.  Firstly, the plug-in dll itself, which will handle only the interfacing with Netscape, including much of the kind of interaction that you describe, plus a bit more. And secondly a component which does all the rest, including processing the XML and style-sheet data.  

The second component could then potentially be replaced by other modules, which would interface with the plug-in dll's API in order to use the Netscape HTML rendering functionality, and receive appropriate callbacks from user input.  Such a module could be written in Java. (Though we have opted for C++, partly for performance reasons).


-------------------------------------
Boris Moore
Software Development
Boris.Moore@Wanadoo.fr
-------------------------------------

  

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From bosak at atlantic-83.Eng.Sun.COM  Fri May  9 20:02:36 1997
From: bosak at atlantic-83.Eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:57:49 2004
Subject: DOCTYPE misunderstood
In-Reply-To: <dc9jLEA+jsczEw38@light.demon.co.uk> (message from Richard Light on Fri, 9 May 1997 07:49:34 +0100)
Message-ID: <199705091800.LAA11983@boethius.eng.sun.com>

| Another reason is that you might have slightly variant DTDs for the
| same conceptual document type, and a production process whereby the
| documents start of conforming to say an author-friendly DTD, and then
| progress to conform to a stricter 'delivery' DTD.  Again, this can
| only happen if you can switch in a DTD at document load time.

Sure, there could be lots of good reasons to use variant doctypes.  My
only point was that switching output formats isn't one of them.

| However, the reason I started along this line of thought was based
| around the much more comfortable area of output formats, i.e. style
| sheets.  We certainly need an easy way to prepend instructions to bind
| a style sheet to a document at delivery time, so that its style is not
| bound into the DTD declaration.  A processing instruction 'up front'
| would be the obvious way to do this:
| 
| <?XML version="1.0"?>
| <?XML-STYLE ...>

This method, proposed by James Clark in this list about a month ago,
is very close to what the ERB seems to be converging on as the way to
do simple stylesheet linking.

Jon

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat May 10 00:32:37 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:49 2004
Subject: DOCTYPE misunderstood
Message-ID: <6413@ursus.demon.co.uk>

In message <s37306c4.045@rnib.org.uk> David Pawson writes:
[...]
> 
> Will the same mechanism work for XML,
> i.e. 
> <?XML version ....
> <DOCTYPE empty ... dtd >
> <empty>
> #include sub-file <!-- you choose the words -->
> 
> </empty>

Hi,
	This was about the first question I asked on comp.text.sgml
2-3 years ago, so it's nice to be in a position to answer it.  

It's not intuitive to someone brought up on C - you keep looking for the
#include in SGML (XML) and it isn't there.  Instead there is a mechanism
involving *entities*.  In simple terms, parameter entities (which use
the %foo; notation) are mainly involved in DTDs, and general entities
(like &bar;) are used to include chunks of data.

You will find them in the draft in 4.2 , 4.3 (and their treatment in 4.4).

In your example you might have something like:
<?XML VERSION="1.0"?>
<!DOCTYPE empty [
<!ENTITY chunk1 SYSTEM "chunk1.txt">
<!ENTITY chunk2 SYSTEM "chunk2.txt">
]>
<!-- note how the [...] is included in the DOCTYPE -->
<EMPTY>
<!-- include chunk1 -->
&chunk1;
<!-- include chunk2 -->
&chunk2;
</EMPTY>

This is OK so long as the contents of the files are well formed.  If 
the document is to be valid, then the complete document after inclusion
of the chunks has to be valid.  If the chunks contain entities, then 
those have to be (recursively) expanded.

Make sure the entities have been declared.  However, XML says that
for WF documents, the processor need not expand entities.  (4.4; 8).
I'm not sure whether the entity still has to be declared in this case.

General point:  Example files of this sort of thing would be very valuable.

	P.


> 
> Sounds simple enough to do what I might want to do.
> 
> Come on gurus, what was it all about in the first place?
> It wasn't that long ago that you have forgotten ... was it!
> 
> Regards, DaveP
> 
> 
> 
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat May 10 00:32:46 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:50 2004
Subject: XML and HTML browsers
Message-ID: <6414@ursus.demon.co.uk>

In message <01BC5CA9.129BCC80@yellow-ami-145.wanadoo.fr> Boris Moore writes:
> On Sunday, 4 May 1997, Peter Murray-Rust wrote:
> 
> >>I would like to re-use *existing* browser functionality rather
> >>than continuing to extend the *generic* aspects of a browser in JUMBO.
> >>I'm interested in exploring the general question of how a specialist
> >>Java application interacts with a Java-enabled HTML browser.
> 
> I cannot reply to the Java and JavaScript aspects of your questions, but 
> I am struck by how closely your description of hoped for interaction 
> between Jumbo and the built-in HTML rendering of the browser relates to 
> work we at RivCom have doing on developing a Netscape Plug-in for XML.

Don't worry about the language aspect - a C++ plugin sounds a very useful 
way forward.  The main disadvantage is that it is platform-specific, but of 
course its performance is better.  Java is probably a better way to develop
methods over the WWW.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Mon May 12 17:58:01 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:57:50 2004
Subject: DOCTYPE misunderstood
In-Reply-To: <199705091222.HAA169162@tigger.cc.uic.edu>
Message-ID: <SmLQYIA$vzdzEwMk@light.demon.co.uk>

In message <199705091222.HAA169162@tigger.cc.uic.edu>, C M Sperberg-
McQueen <cmsmcq@tigger.cc.uic.edu> writes
>On Fri, 9 May 1997 07:49:34 +0100, Richard Light wrote: 
>
>>While it would be generally agreed that you can't gratuitously stick any
>>old <!DOCTYPE header onto a piece of well-formed XML, I think there is a
>>case for architecting XML so that you _can_ hold the naked XML without
>>_any_ header information, and prepend both DOCTYPE and style processing
>>instructions at delivery time.
>
>I think there is a case for saying XML has in fact been so designed,
>and that what you want to do is already possible.
>
>Am I missing something?  

No, it's me.  I think I've been muddled by the RMD rules.  It says that
"If no RMD is provided, an XML processor must behave as though an RMD
had been provided with value ALL".  So I was thinking that a "naked
chunk" would require an RMD of:

<?XML version="1.0" RMD="NONE">

in order that the XML processor knows that it "can parse the containing
document correctly without reading any part of the DTD".  But presumably
this is "parsing" in the sense of reading a _valid_ instance rather than
a _well-formed_ one?

Richard Light.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From bosak at atlantic-83.Eng.Sun.COM  Mon May 12 18:03:22 1997
From: bosak at atlantic-83.Eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:57:50 2004
Subject: DOCTYPE misunderstood
In-Reply-To: <s37306c4.045@rnib.org.uk> (message from David Pawson on Fri, 09 May 1997 11:10:16 +0000)
Message-ID: <199705121600.JAA13415@boethius.eng.sun.com>

[Dave Pawson:]

| One line of my original message read something like
| 'What was the original intent' of DOCTYPE?
| [...]
| Come on gurus, what was it all about in the first place?
| It wasn't that long ago that you have forgotten ... was it!

Well, back before 1986, anyway.  That's when the SGML standard was
published.

"Doctype" means "type of document."  Novels, telephone books, poems,
plays, bills of lading, and patient care records are types of
documents.  They have different structures and need differently named
tags if the tags are going to make sense to a human user or an
intelligent indexer.

You must start reading some of the easily available materials on this
subject before asking others for information.  You can start with the
XML page at W3C (go to http://www.w3.org and click on "XML", then
follow all the pointers from that page).  This reading will include
the XML FAQ and Robin Cover's magnificent SGML web site.  Then you can
hit a couple of the good beginning books on SGML.  I happen to be at
an SGML conference in Barcelona at the moment; I will take a look at
currently available introductory SGML books that might work well for
an XML newbie and reply back here with a list (probably a very short
one).

Some people are working on putting up a general-purpose public XML
mailing list that should become available in the next few weeks.
Until then, I suggest that you post inquiries of the variety "What is
a doctype?" to the newsgroup comp.text.sgml and not to this xml-dev
list, which is for the use of technical experts engaged in the
construction of XML software.

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri May 16 01:59:51 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:50 2004
Subject: SGML97 Europe
Message-ID: <6588@ursus.demon.co.uk>

I've just come back from SGML97 in Barcelona and thought some personal
comments might be useful.

Firstly I was delighted to meet SGML geekdom in the flesh :-)  It's marvellous
finding so many people who were hitherto virtual.  I came away with a strong
feeling of community, and many thanks to everyone who made me welcome.

Without question, XML was the major theme at the meeting, though associated
areas such as DSSSL were also causing a lot of interest.  The ERB has done
a fantastic job in getting this done so efficiently, quickly, and also 
making sure that the world knew how important this was.  It's now very clear
that the major WWW-related companies are taking a very active role in 
exploring the potential of XML, and a recent posting to XML-WG has confirmed
Netscape's interest.  [The XML-spec printing was sponsored by SUN and 
Microsoft.]

I think most people who are not used to virtual working will underestimate
how much time the ERB has spent on the process.  As Eliot Kimber said in 
his closing address, they had to choose between having a family life or an
XML-life.  There were literally hundreds of mails a week.  NOTE:  all mail
on the XML-WG gets read very thoroughly by the ERB - even if it doesn't get 
formally answered at the time.  It's only necessary to make a point once.
The ERB supplemented e-mail with weekly conference phone calls, and this
is how decisions were taken.  Quite apart from XML itself, I personally 
commend the efficiency of the ERB's virtual process and shall try to abstract
from it those aspects which make it successful.  Clear initial guidelines
help, and a wider community which is well versed in abiding by a standard
drawn up under legal guidance :-)

Other points.  XML clearly fills many different roles for different people.
It's clear that people who sell complex SGML applications see different 
benefits (and some concerns) from those who see XML as the next step from 
HTML.  Taking too narrow a view might sometimes cause unnecessary conflicts.
Eliot described XML's position vis-a-vis SGML as low-cost/low-benefit versus
high-cost/high-benefit and stressed the need for the additional components
such as DSSSL, architectural forms, etc.  (Personally I would put XML as 
lowish-cost/medium-benefit :-)  It's important not to argue HTML vs
XML or XML vs SGML as such arguments are often meaningless or based on 
limited views.  Both DTD-less and DTD-full applications will benefit from
XML.  The *use* of XML falls in a spectrum with fuzzy borderlines.

It's clear that DSSSL has a great deal of impetus and the only question is
whether the ERB can work fast enough for everyone else's expectations.  
There many other problems surfacing.  How does XML interact with HTML?
(are there XML plug-ins, should XML DTDs contain subsets of HTML, etc.) 
Strong typing, and APIs (both areas that XML-DEV could work on).  And perhaps
most excitingly for some of us the concept of Information Objects which was
mentioned in several talks.

My understanding of information objects (which has been designed into CML) 
is that documents will frequently contain 'chunks' from several different
sources.  For example chemistry papers frequently contain maths; but there
is no formal syntax for combining two different DTDs in the same document
(watch this space...).  IMO a robust DTD-less XML document will most
likely be an aggregation of well-defined information objects (i.e. 
individually parsable against a DTD), but where every document would be 
likely to have a differing formal DTD.  

It's clear that XML is able to greatly widen the market for *ML.  Since 
*ML will increasingly be co-existing with Object technologies it's
important that the applications are well designed and interoperate cleanly.
One great benefit of *ML is that people who deal with documents can 
understand the power of *ML, whilst they might well 'switch off' when 
confronted with objects.

Implementation is also very rapid - many companies have - or shortly will
have - XML implementations.  This will have high benefits - let's also 
push for low prices :-).  It's becoming even more important that this group
helps to create reference sites with test data, DTDs, etc. so that these
tools can be evaluated.

Once again, a lovely time.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri May 16 13:26:36 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:50 2004
Subject: Mathematical Markup Language (MathML)
Message-ID: <6629@ursus.demon.co.uk>

[To xml-dev, crossposted to CLIC and CHEMIME and Patrick Ion, AMS.
Patrick, please feel free to circulate this to HTML-Math-WG.
This posting is being addressed to both XML-DEV and the Chemical Informatics
community, so please excuse any confusions :-).]

The first draft of MathML was published on May 15th and is enormously
exciting.  It is written to be compatible with XML and to evolve as that
spec evolves, so that we have one of the very first DTDs that has been 
developed in that way.  Since math is common to a very large number of
vertical markets in the ScientificTechnicalMedical market (and many
others) MathML will highlight how domain-specific DTDs and documents can
be re-used in a variety of contexts.

The draft (long, impressive, and in several sections) is at:
http://www.w3.org/pub/WWW/TR/WD-math/
<GENTLE_REQUEST>
Could the authors provide a tar.gz or similar file so that all the sections
including the gifs can be downloaded?  Also could the appendices have names
unique under 8.3 format.  TIA :-)
</>

Note that the Math-WG has addressed the two key aspects of encoding
maths - presentation (cf. TeX) and content (cf. symbolic algebra, 
plotting packages, etc.).  The current document addresses both of these and
having had very useful discussions with Patrick Ion, Martin and Roy Pike and 
Steve Buswell I'm very confident that MathML will cater for a wide range 
of chemical requirements.  Certainly I hope to explore its use for
plotting graphs, extracting functions, using symbolic variables within
chemical discourse and tables, and much more.  In principle it should be
possible to (say) extract a set of force-field equations from a 
molecular mechanics paper and directly manipulate them into a computer
program.

The publication of MathML coincides with the XML-ERB's request for discussion
on multiple namespaces, and the very large emphasis at SGML97 on Information
Objects.  The XML community is now clearly working as fast as possible to
develop the spec so that an XML document can be composed of a variety
of fragments/subdocuments/objects - various names are chosen.  Chemical
Markup Language is being developed along some of the same lines as MathML 
- to be XML-compatible, to use common semantics where appropriate, and
to avoid namespace collisions.  Whatever syntactic mechanism is chosen it
will allow subcomponents of a document to be linked to the appropriate
DTD - very probably distributed over the WWW - and for appropriate semantics
and behaviour to be applied.

The math proposal has enormous implications for technical publications and
documents.  In CML a space has been deliberately left called 'MATH' and now
can be replaced by 'MathML', e.g. as:

<!Entity % mathml.dtd SYSTEM "http://www.w3.org/where/ever/mathml.dtd">
%mathml.dtd;

<!Element CML (lots|of|elements|%mol)*>  <!-- note NOT math -->

<!Element XLIST ANY>   <!-- CML's generic container -->

and in a document instance something like:

<P>The <A HREF="#fn1" XML-LINK="simple" ACTUATE="AUTO" SHOW="EMBED">
function</A> relating...</P>
...
<XLIST TITLE="Equations in the text">
<EXPR ID="fn1" TITLE="quadratic"><MI>x</MI><POWER/><MN>2</MN></EXPR>
</XLIST>

would now insert the expression x^2 at the appropropriate part of the
text.  Note that this use of XML-LINK avoids the complexity of tailoring
the parent DTD to accommodate infinitely variable content models.  Note
that the document suggested above *can be validated* if required, since all
the DTDs involved are included.  Of course this is only possible because
CML and Math have no namespace collisions, and this mechanism will not
be generally applicable.  (MathML has some very short GIs (have I got the
terminology right this time? :-) and is therefore likely to collide with
an arbitrarily chosen DTD).

I'd welcome comments from the chemical community on this and suggest that
posting them to chemime@ic.ac.uk would be the most appropriate (please 
*don't* post directly to the math group!).   On the more general question
of interoperability of information objects we may need to wait a week
or two to see how the XML-WG discussions take that, but I'm very keen to
start trying to get an implementation of math in CML.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From gfrer at luna.nl  Fri May 16 20:41:50 1997
From: gfrer at luna.nl (Gerard Freriks)
Date: Mon Jun  7 16:57:50 2004
Subject: Mathematical Markup Language (MathML)
Message-ID: <v03102804afa25883975e@[194.151.26.6]>

What will be the consequences when efforts like these will be transplanted
to other sectors like Medicine?

E.g. HL7-SGML  http://www.mcis.duke.edu/standards/HL7/committees/sgml/

Greetings

Gerard Freriks, MD

<< start of forwarded material >>


Date: Fri, 16 May 1997 11:44:05 GMT
From: Peter@ursus.demon.co.uk (Peter Murray-Rust)
To: xml-dev@ic.ac.uk
Cc: clic@ic.ac.uk, chemime@ic.ac.uk, ion@math.ams.org
Subject: Mathematical Markup Language (MathML)
Lines: 89
Sender: owner-xml-dev@ic.ac.uk
Precedence: bulk
Reply-To: Peter@ursus.demon.co.uk (Peter Murray-Rust)

[To xml-dev, crossposted to CLIC and CHEMIME and Patrick Ion, AMS.
Patrick, please feel free to circulate this to HTML-Math-WG.
This posting is being addressed to both XML-DEV and the Chemical Informatics
community, so please excuse any confusions :-).]

The first draft of MathML was published on May 15th and is enormously
exciting.  It is written to be compatible with XML and to evolve as that
spec evolves, so that we have one of the very first DTDs that has been
developed in that way.  Since math is common to a very large number of
vertical markets in the ScientificTechnicalMedical market (and many
others) MathML will highlight how domain-specific DTDs and documents can
be re-used in a variety of contexts.

The draft (long, impressive, and in several sections) is at:
http://www.w3.org/pub/WWW/TR/WD-math/
<GENTLE_REQUEST>
Could the authors provide a tar.gz or similar file so that all the sections
including the gifs can be downloaded?  Also could the appendices have names
unique under 8.3 format.  TIA :-)
</>

Note that the Math-WG has addressed the two key aspects of encoding
maths - presentation (cf. TeX) and content (cf. symbolic algebra,
plotting packages, etc.).  The current document addresses both of these and
having had very useful discussions with Patrick Ion, Martin and Roy Pike and
Steve Buswell I'm very confident that MathML will cater for a wide range
of chemical requirements.  Certainly I hope to explore its use for
plotting graphs, extracting functions, using symbolic variables within
chemical discourse and tables, and much more.  In principle it should be
possible to (say) extract a set of force-field equations from a
molecular mechanics paper and directly manipulate them into a computer
program.

The publication of MathML coincides with the XML-ERB's request for discussion
on multiple namespaces, and the very large emphasis at SGML97 on Information
Objects.  The XML community is now clearly working as fast as possible to
develop the spec so that an XML document can be composed of a variety
of fragments/subdocuments/objects - various names are chosen.  Chemical
Markup Language is being developed along some of the same lines as MathML
- to be XML-compatible, to use common semantics where appropriate, and
to avoid namespace collisions.  Whatever syntactic mechanism is chosen it
will allow subcomponents of a document to be linked to the appropriate
DTD - very probably distributed over the WWW - and for appropriate semantics
and behaviour to be applied.

The math proposal has enormous implications for technical publications and
documents.  In CML a space has been deliberately left called 'MATH' and now
can be replaced by 'MathML', e.g. as:

<!Entity % mathml.dtd SYSTEM "http://www.w3.org/where/ever/mathml.dtd">
%mathml.dtd;

<!Element CML (lots|of|elements|%mol)*>  <!-- note NOT math -->

<!Element XLIST ANY>   <!-- CML's generic container -->

and in a document instance something like:

<P>The <A HREF="#fn1" XML-LINK="simple" ACTUATE="AUTO" SHOW="EMBED">
function</A> relating...</P>
...
<XLIST TITLE="Equations in the text">
<EXPR ID="fn1" TITLE="quadratic"><MI>x</MI><POWER/><MN>2</MN></EXPR>
</XLIST>

would now insert the expression x^2 at the appropropriate part of the
text.  Note that this use of XML-LINK avoids the complexity of tailoring
the parent DTD to accommodate infinitely variable content models.  Note
that the document suggested above *can be validated* if required, since all
the DTDs involved are included.  Of course this is only possible because
CML and Math have no namespace collisions, and this mechanism will not
be generally applicable.  (MathML has some very short GIs (have I got the
terminology right this time? :-) and is therefore likely to collide with
an arbitrarily chosen DTD).

I'd welcome comments from the chemical community on this and suggest that
posting them to chemime@ic.ac.uk would be the most appropriate (please
*don't* post directly to the math group!).   On the more general question
of interoperability of information objects we may need to wait a week
or two to see how the XML-WG discussions take that, but I'm very keen to
start trying to get an implementation of math in CML.

	P.


--
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)

<< end of forwarded material >>

Gerard Freriks,huisarts, MD
C. Sterrenburgstr 54
3151JG Hoek van Holland
the Netherlands  		Telephone: (+31) (0)174-384296/ Fax: -386249
				Mobile   : (+31) (0)6-54792800
ARS LONGA, VITA BREVIS


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat May 17 21:05:53 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:50 2004
Subject: MathML (and implications for XML)
Message-ID: <6766@ursus.demon.co.uk>

I have read quickly through the MathML (970515) draft and have some
(hopefully constructive) comments to make - any crossmember of xml-dev
and html-math-wg is welcome to crosspost them.

Before giving detailed comments, I must say that I think it's an
extremely useful document and covers all of the areas that I - as a 
mathematically oriented scientist - would like to see.  The initial
discussion is very useful and I shall borrow some of the flavour of
it when redrafting Chemical Markup Language.

An archetypal XML DTD
---------------------

Since MathML is one of the very first XML DTDs to be published it naturally
sets a style which others may imitate.  In general I think it does this
well, though it is at the mercy of a still fluid XML-lang and XML-link spec.
I appreciate that some of this was probably written some time before the
latest XML drafts.

Specific comments in this area are:

3.1.4 'By default, XML processors remove all leading and trailing whitespace
... between the begin/end tags and collapse any internal w/s to a single
space character'.  My current understanding is that *validating* parsing
removes the start and end w/s but does not collapse the internal w/s, but
that WF-parsing passes the whole lot unchanged included the leading/trailing
w/s.  [I'm usually wrong on this, but it's a problem area :-)].

7.1 ['</' is not allowed in CDATA].  My reading of XML 2.7 is that '</'
is unrecognised within CDATA (indeed only ']]>' can terminate it).  This
might allow significant simplification to MathML 7.1 and allow the 
elimination of two sets of tags.

MathML proposes two generic means of extending functionality, one through
attributes and the other through macros.

7.2.3  the OTHER attribute has the syntax:
	OTHER="name1='val1 name2='val2'..."
and essentially allows a means of adding additional attributes independently
of the DTD.  Personally I'm sympathetic to this (as long as the attributes
are ones *I*'ve though of :-).  This is 'not to encourage software developers
to use this as a loophole for circumventing the MathML core markup'... but
as we all know this is the sort of unchecked semantics that people love
and which soon leads to non-interoperable documents and processors.  I'd be
frightened of it in the Chemical community.  This is a point which
is important for XML in general.

5.3 Macros.  This is the ability to create macros to avoid repetition of
verbose markup and seems particularly appropriate to math.  (I think it has
a similar, but smaller, role in chemistry.)  As far as I can see it is
totally compatible with XML/SGML, ***BUT it requires a pre-processor***
(I have been calling this a pre-parser).  
<PROPOSAL>
There will be a role for a pre-parser in XML and one of its functions will be
to apply macros.  Can we work towards a standard set of operations that a
pre-parser might carry out?
</PROPOSAL>

XML-LINK.  The document is written with little reference to XML-link
(not surprising, since it's new and AFAIK JUMBO is the only tool that 
implements it even at prototype level).  However I think there are at least
the following areas where XML-link mechanism might be alternatives:

7.1 Display and in-line notations.  The draft assumes that the MATH component
of a document is embedded in the HTML at the point that it occurs in natural 
reading.  XML-LINK gives a mechanism for separating the math and the text and
combining them under the flexibility of the linking mechanism.  The problem
occurs in exactly the same way in chemistry - do we encode HCl in-line
or as a display;
	HCl
This is a matter of style which may not be totally within the author's 
control - the publisher or renderer or reader may have the power to alter
it.  Since XML will approach this generically at the LINK level, I have
used constructs like:
<P>this is <A HREF="#HCl" XML-LINK="SIMPLE" ACTUATE="AUTO" SHOW="EMBED">
hydrogen chloride...</P>
...
<MOL ID="HCl">
<FORMULA>
<XVAR CONVENTION="SMILES">Cl</XVAR>  <!-- yes, I really meant to omit H! -->
</FORMULA>
</MOL>

This - in the present JUMBO - will in-line the formula for HCl.  I am sure
that by use of stylesheets and BEHAVIOUR it would be possible to control
your equations to be at the para end, etc.

7.2.4 <MACTION>.  I am sure that it is possible to recast this tag in
terms of XML-LINK BEHAVIOR.  That saves a lot of hassle writing code because
it may already have been done...at least in part.

Communality with future XML DTDs
--------------------------------

As XML develops, CML gets smaller.  This is wonderful.  There are a number
of general components of MathML that will help CML and probably other 
people as well.  A particular example is VECTOR and MATRIX (4.2.9).
It is clear from the XML-WG that many people want a method of representing
(multidimensional) regular arrays of strongly typed data and also the
means for addressing into these.  Some (including me) will try to push
for economy of expression and avoid the <SEP/> syntax.  (At present
CML uses the following matrix syntax:
<ARRAY ROWS="2" COLUMNS="3" TYPE="FLOAT>1 2 3 4 5 6</ARRAY>
and has a kludgy mechanism for repeated arrayElements or arrayElements 
with whitespace.  Since some of our matrices are large I'd quite like to
drop <SEP/>, though recent XML-WG discussion has emphasised that space is
not an issue.
<PROPOSAL>
MathML, CML, and other XML enthusiasts should strive towards a common
*extensible* way of representing arrays and matrices
</PROPOSAL>

Interoperability with HTML
--------------------------

This is a key area and I'm not clear from MathML spec exactly what the 
mechanism is.  AFAIK CML and MathML are the first DTDs to tackle the question
of how to interoperate with HTML.  As we know there are syntactic problems
of how to combine two or more DTDs (DTD fragments).
<AXIOM>
It should ultimately be possible to create a joint HTML/*ML document
which can be validated (i.e. not just well-formed).
</AXIOM>
This raises considerable problems in general since HTML content models 
do not allow for <MATH> or <CML> or other foreign tags.  In CML I 
'solve' this by embedding chunks of HTML within CML documents - i.e. the
CML document 'owns' the HTML.  It's not clear in MathML which document 
contains chunks of the other (this is a general XML/HTML problems which 
has to be addressed).
MathML also provides for a subset of HTML within the <MATH> container - I
assume it's a subset because it has to be processed and rendered by the 
MathML processor and I'm extremely sympathetic to this problem - I've spent
far too much time hacking HTML rendering.
At present I favour a solution where CML (and MathML) are separated from
the HTML and connected by XML-LINK as in the previous section.
<PROPOSAL>
XML should investigate mechanisms for HTML and *ML interoperability
</PROPOSAL>

Interoperability with CML
-------------------------

AFAICS there are no namespace collisons between the MathML tagset and
CML so it's straightforward to write:

<!DOCTYPE CML SYSTEM "cml.dtd" [
<!ENTITY % mathml SYSTEM "http://www.w3.org/some/where/mathml.dtd">
%mathml;
]>
and then use MathML tags.  This is more luck than good planning :-), but
CML has been careful to restrict its tagset.

Linking between variables
-------------------------

If I write:
	x = y + 3           (I)
and later
        2x = y + 4          (II)
I would 'normally' deduce that 
	x = 1               (III)
and
	y = -2              (IV)

However, there is nothing in MathML AFAICS that allows one to specify
that the 'x' in (I) is the same x as in (II).  [Please forgive me if I've
missed this].  For many applications we need to label a variable or function
as having the same value and semantics throughout a document, e.g.

'Determination of <A HREF="#c"> the velocity of light</A>'.

In this example I would point to some central target which represented a 
the variable 'c', though I'm not clear how MathML would manage this in
equations.  This is a very important requirement for re-usable scientific
publications, though perhaps ambitious at this stage.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jimg at digitalthink.com  Wed May 21 18:27:12 1997
From: jimg at digitalthink.com (Jim Gindling)
Date: Mon Jun  7 16:57:50 2004
Subject: Good XML-Relevant SGML Books for Beginners?
Message-ID: <l03010d01afa8cc063979@[207.171.223.56]>

Hi all,

I have been reading like crazy on the web, and have a fair understanding of
the basic XML concepts.  However, I am still puzzled as to exactly how I
can accomplish desirable tasks such as:

+ Converting XML documents to HTML (preferably HTML that uses CSS, and
  preferably using something other than DSSSL, which seems overly complex).
+ Referencing dynamic data within XML documents, that is presumably stored
  in a database, such as student name, quiz scores, et cetera.

I don't expect anybody to answer these questions directly since I
understand that is not the focus of this list; however, I would really
appreciate some guidance in picking one or two good books that will answer
my questions.

Using Amazon.com, I have found the following books that seem relevant.  If
somebody could give me their thoughts on these, or others, I would be very
grateful.

Abcd...Sgml : A User's Guide to Structured Information
Liora Alschuler

Industrial-Strength Sgml: An Introduction to Enterprise Publishing
(Charles F. Goldfarb Series on Open Information Management)
Truly Donovan

The Sgml Implementation Guide: A Blueprint for Sgml Migration
Brian E. Travis, Dale C. Waldt

Sgml on the Web: Small Steps Beyond H.T.M.L.
(Charles F. Goldfarb Series on Open Information Management)
Yuri Rubinsky, Murray Maloney

Others?

Thanks in advance.


Jim Gindling
<jimg@digitalthink.com>
DigitalThink
Software Engineer


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May 21 19:32:01 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:50 2004
Subject: Good XML-Relevant SGML Books for Beginners?
Message-ID: <7018@ursus.demon.co.uk>

In message <l03010d01afa8cc063979@[207.171.223.56]> Jim Gindling writes:
Thanks for positing, Jim - xml-dev has been a bit sleepy recently.

> Hi all,
> 
> I have been reading like crazy on the web, and have a fair understanding of
> the basic XML concepts.  However, I am still puzzled as to exactly how I
> can accomplish desirable tasks such as:
> 
> + Converting XML documents to HTML (preferably HTML that uses CSS, and
>   preferably using something other than DSSSL, which seems overly complex).

If your sole exposure to DSSSL has been the postscript description (~300
pages) I can sympathise.  However, there is a shortened version (DSSSL-O)
and there will soon be examples of how to tweak existing DSSSL documents.
(Jon Bosak has shown how to do this and you'll find his stuff under
www.sil.org/sgml in the DSSSL section.  So get a DSSSL engine - Jade or YADE
and run the examples.  There is general agreement that DSSSL is the only
real way forward for significant work and there are free implementations
of engines.

Remember of course that XML documents don't have to make textual sense and
that to format
<MOL><FORMULA>"1cccccc1"</FORMULA></MOL>  <!-- pseudo CML -->
or <EXPR>x<PLUS/>2</EXPR> <!-- simple MathML -->
you will need application-specific software.  So, in general, converting
XML to HTML depends very much on the XML application.

> + Referencing dynamic data within XML documents, that is presumably stored
>   in a database, such as student name, quiz scores, et cetera.

I expect that we shall see XML2SQL/QL2XML applications very shortly.  There is
a lot of discussion on XML-WG about how to transport data rather than text
and there is a *proposal* from Tim Bray to have strongly-typed data in
XML (e.g. FLOAT, DATE).  Having said that, XML doesn't write the applications
for you - it provides a mechanism to hold information.
> 
> I don't expect anybody to answer these questions directly since I
> understand that is not the focus of this list; however, I would really
                             ^^^^^^^^^^^^^^^^^^
That is correct!  But it's a slackish period.

> appreciate some guidance in picking one or two good books that will answer
> my questions.

There will doubtless be XML books fairly shortly.  ***Note that the spec is 
still a draft and will remain so for some months***.  It will change, without
doubt, as bugs are thrown up.  The timescale for the first frozen release
is late autumn sometime - any more precise dates anyone?

My own feeling is that 'Learn XML in 21 days/48 hours/without tears' will
present the syntax of the language, but won't reveal the full power of
the language.  It's really only by playing with it, talking to SGML geeks
(an honourable term), and tackling real problems that you really get fluent.
That's because managing information is a very rich subject. 
 
> Using Amazon.com, I have found the following books that seem relevant.  If
> somebody could give me their thoughts on these, or others, I would be very
> grateful.

I won't comment on the books mentioned - my impression is that there are about
a dozen specialist SGML books in common use - but be aware that XML 
deliberately does not use a large number of SGML features. 

WHAT WE REALLY NEED ON THIS LIST ARE SOME REAL EXAMPLES OF XML DTDs and DOCs.

I don't think we have soak tested the parsers yet - I have been converting
my DTDs today and think that I've found 2 (minor) bugs in a parser.  
Let's have some announcements of converted DTDs, ENTITY sets, documents, etc.
Without that it's much harder to learn the language.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eldred at tiac.net  Wed May 21 19:48:54 1997
From: eldred at tiac.net (Eric Eldred)
Date: Mon Jun  7 16:57:50 2004
Subject: Good XML-Relevant SGML Books for Beginners?
Message-ID: <3.0.32.19970521134802.007bf370@tiac.net>

I recommend this book:

Sgml on the Web: Small Steps Beyond H.T.M.L.
(Charles F. Goldfarb Series on Open Information Management)
Yuri Rubinsky, Murray Maloney

It was written before XML, and really doesn't get into
a lot of the area where XML will be used.  But it does
show what SGML can do beyond HTML.  And it stimulates
and provokes us to take advantage of SGML's (or XML's)
power to publish in new ways for users that HTML can't.

It also comes with the full Panorama Pro browser, so you
can experiment immediately with writing and reading
SGML, in the interactive way that XML will be providing.

Plus, it is just a really nice, humane book that is easy
enough for SGML beginners to understand, where so many
of the other SGML books are far too technical for newbies.


--
"Eric"    Eric Eldred   
<URL:http://www.tiac.net/users/eldred/>
mailto:eldred@tiac.net   no fax  tel:+1 603 434 7746 x1
USPS:50 E Derry Rd #21, E Derry NH USA 03041-0021

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From spinosaj at scripps.edu  Thu May 22 06:18:14 1997
From: spinosaj at scripps.edu (John C. Spinosa, MD, PhD)
Date: Mon Jun  7 16:57:50 2004
Subject: Meeting Anouncement for HL7-SGML Mixer
Message-ID: <3.0.32.19970521212222.007a0100@pop.mindspring.com>

I am posting this on behalf of Liora Alschuler who is out of the country at
present. This anouncement has been cross posted to several listservs as
well as comp.text.sgml.

************        ANNOUNCEMENT        ***************
  HL7 SGML Mixer: Medical Claims Processing with SGML
    ************   July 24 -- 25  **************

We are pleased to announce the upcoming "HL7 SGML Mixer: Medical Claims
Processing with SGML," a two-day seminar to take place July 24-25 in the San
Diego/La Jolla area. The event is cosponsored by:

* GCARI (Graphic Communications Association Research Institute;
http://www.gca.org)
* HL7 SGML SIG (Health Level 7, SGML Special Interest Group;
http://www.mcis.duke.edu/standards/HL7/committees/sgml/)
* SGMLOpen (http://www.sgmlopen.org)

The event is open to all participants, regardless of affiliation, for a
modest fee to cover our costs. 

AGENDA:	Showcase Tools, Respond to Federal RFP

The two-day session has a double agenda:
1) Introduce SGML-based tools and technology to developers and users of
healthcare information systems.
2) Address the manner in which HL7 can use SGML-based tools and technology
to respond to an RFP being issued by the US Health Care Financing Authority
(HCFA) regarding electronic submission and processing of Medicare and
Medicaid claims.

The HCFA RFP is the first to be issued in compliance with the requirements
of the Health Insurance Portability and Accountability Act (HIPAA, known as
Kennedy-Kassebaum) which mandates creation and use of standardized
electronic medical records. 

It is the intent of HL7, the parent organization of the HL7 SGML SIG, to
respond to the Federal RFP in conjunction with one of the major Medicare and
Medicaid providers. 

FORMAT:	Presentations, Tabletops, Working Sessions

The first day will start with presentations focusing on potential use of
SGML-based solutions in healthcare in general with a focus on the HCFA
scenario. After a kick-off session, two tracks may run concurrently, one
directed at management issues and a second at technical solutions. The
second portion of the day will provide a venue for table-top technology
demonstrations.

Presentations will be chosen by a peer review process with members of all
three organizations participating. The table-top demonstrations are open to
all for a small fee.

The second day will consist of HL7 working sessions to begin formulating our
response to the Federal requirements. These sessions are open to all, with
the caveat that active participants should be well versed in the work and
objectives of HL7 and the HL7 SGML SIG. (Again, all are welcome to observe
and to become active.)

MARK YOUR CALENDARS:

The Mixer RFP, detailing the requirements set out by the Federal government,
will be made available through these same channels the week of June 2.
Submissions for presentations will be due June 23.
Notification of acceptance will be sent out by July 7.

FOR MORE INFORMATION:

The full Mixer RFP with details on the HCFA scenario will be sent out
through these same channels no later than June 6 and will be posted on the
Web sites of the sponsor organizations. You may also contact the
organizations cosponsoring the event or:

Liora Alschuler, HL7 SGML Mixer Program Chair
mixer@the-word-electric.com or 802/785-2623 


=====================================================
John Spinosa, MD, PhD
spinosaj@scripps.edu


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From spinosaj at scripps.edu  Thu May 22 06:18:14 1997
From: spinosaj at scripps.edu (John C. Spinosa, MD, PhD)
Date: Mon Jun  7 16:57:50 2004
Subject: Meeting Anouncement for HL7-SGML Mixer
Message-ID: <3.0.32.19970521212222.007a0100@pop.mindspring.com>

I am posting this on behalf of Liora Alschuler who is out of the country at
present. This anouncement has been cross posted to several listservs as
well as comp.text.sgml.

************        ANNOUNCEMENT        ***************
  HL7 SGML Mixer: Medical Claims Processing with SGML
    ************   July 24 -- 25  **************

We are pleased to announce the upcoming "HL7 SGML Mixer: Medical Claims
Processing with SGML," a two-day seminar to take place July 24-25 in the San
Diego/La Jolla area. The event is cosponsored by:

* GCARI (Graphic Communications Association Research Institute;
http://www.gca.org)
* HL7 SGML SIG (Health Level 7, SGML Special Interest Group;
http://www.mcis.duke.edu/standards/HL7/committees/sgml/)
* SGMLOpen (http://www.sgmlopen.org)

The event is open to all participants, regardless of affiliation, for a
modest fee to cover our costs. 

AGENDA:	Showcase Tools, Respond to Federal RFP

The two-day session has a double agenda:
1) Introduce SGML-based tools and technology to developers and users of
healthcare information systems.
2) Address the manner in which HL7 can use SGML-based tools and technology
to respond to an RFP being issued by the US Health Care Financing Authority
(HCFA) regarding electronic submission and processing of Medicare and
Medicaid claims.

The HCFA RFP is the first to be issued in compliance with the requirements
of the Health Insurance Portability and Accountability Act (HIPAA, known as
Kennedy-Kassebaum) which mandates creation and use of standardized
electronic medical records. 

It is the intent of HL7, the parent organization of the HL7 SGML SIG, to
respond to the Federal RFP in conjunction with one of the major Medicare and
Medicaid providers. 

FORMAT:	Presentations, Tabletops, Working Sessions

The first day will start with presentations focusing on potential use of
SGML-based solutions in healthcare in general with a focus on the HCFA
scenario. After a kick-off session, two tracks may run concurrently, one
directed at management issues and a second at technical solutions. The
second portion of the day will provide a venue for table-top technology
demonstrations.

Presentations will be chosen by a peer review process with members of all
three organizations participating. The table-top demonstrations are open to
all for a small fee.

The second day will consist of HL7 working sessions to begin formulating our
response to the Federal requirements. These sessions are open to all, with
the caveat that active participants should be well versed in the work and
objectives of HL7 and the HL7 SGML SIG. (Again, all are welcome to observe
and to become active.)

MARK YOUR CALENDARS:

The Mixer RFP, detailing the requirements set out by the Federal government,
will be made available through these same channels the week of June 2.
Submissions for presentations will be due June 23.
Notification of acceptance will be sent out by July 7.

FOR MORE INFORMATION:

The full Mixer RFP with details on the HCFA scenario will be sent out
through these same channels no later than June 6 and will be posted on the
Web sites of the sponsor organizations. You may also contact the
organizations cosponsoring the event or:

Liora Alschuler, HL7 SGML Mixer Program Chair
mixer@the-word-electric.com or 802/785-2623 


=====================================================
John Spinosa, MD, PhD
spinosaj@scripps.edu


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From spinosaj at scripps.edu  Thu May 22 06:18:14 1997
From: spinosaj at scripps.edu (John C. Spinosa, MD, PhD)
Date: Mon Jun  7 16:57:50 2004
Subject: Meeting Anouncement for HL7-SGML Mixer
Message-ID: <3.0.32.19970521212222.007a0100@pop.mindspring.com>

I am posting this on behalf of Liora Alschuler who is out of the country at
present. This anouncement has been cross posted to several listservs as
well as comp.text.sgml.

************        ANNOUNCEMENT        ***************
  HL7 SGML Mixer: Medical Claims Processing with SGML
    ************   July 24 -- 25  **************

We are pleased to announce the upcoming "HL7 SGML Mixer: Medical Claims
Processing with SGML," a two-day seminar to take place July 24-25 in the San
Diego/La Jolla area. The event is cosponsored by:

* GCARI (Graphic Communications Association Research Institute;
http://www.gca.org)
* HL7 SGML SIG (Health Level 7, SGML Special Interest Group;
http://www.mcis.duke.edu/standards/HL7/committees/sgml/)
* SGMLOpen (http://www.sgmlopen.org)

The event is open to all participants, regardless of affiliation, for a
modest fee to cover our costs. 

AGENDA:	Showcase Tools, Respond to Federal RFP

The two-day session has a double agenda:
1) Introduce SGML-based tools and technology to developers and users of
healthcare information systems.
2) Address the manner in which HL7 can use SGML-based tools and technology
to respond to an RFP being issued by the US Health Care Financing Authority
(HCFA) regarding electronic submission and processing of Medicare and
Medicaid claims.

The HCFA RFP is the first to be issued in compliance with the requirements
of the Health Insurance Portability and Accountability Act (HIPAA, known as
Kennedy-Kassebaum) which mandates creation and use of standardized
electronic medical records. 

It is the intent of HL7, the parent organization of the HL7 SGML SIG, to
respond to the Federal RFP in conjunction with one of the major Medicare and
Medicaid providers. 

FORMAT:	Presentations, Tabletops, Working Sessions

The first day will start with presentations focusing on potential use of
SGML-based solutions in healthcare in general with a focus on the HCFA
scenario. After a kick-off session, two tracks may run concurrently, one
directed at management issues and a second at technical solutions. The
second portion of the day will provide a venue for table-top technology
demonstrations.

Presentations will be chosen by a peer review process with members of all
three organizations participating. The table-top demonstrations are open to
all for a small fee.

The second day will consist of HL7 working sessions to begin formulating our
response to the Federal requirements. These sessions are open to all, with
the caveat that active participants should be well versed in the work and
objectives of HL7 and the HL7 SGML SIG. (Again, all are welcome to observe
and to become active.)

MARK YOUR CALENDARS:

The Mixer RFP, detailing the requirements set out by the Federal government,
will be made available through these same channels the week of June 2.
Submissions for presentations will be due June 23.
Notification of acceptance will be sent out by July 7.

FOR MORE INFORMATION:

The full Mixer RFP with details on the HCFA scenario will be sent out
through these same channels no later than June 6 and will be posted on the
Web sites of the sponsor organizations. You may also contact the
organizations cosponsoring the event or:

Liora Alschuler, HL7 SGML Mixer Program Chair
mixer@the-word-electric.com or 802/785-2623 


=====================================================
John Spinosa, MD, PhD
spinosaj@scripps.edu


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From amutel at ifhamy.insa-lyon.fr  Thu May 22 09:09:32 1997
From: amutel at ifhamy.insa-lyon.fr (Alexandre Mutel)
Date: Mon Jun  7 16:57:51 2004
Subject: XML & Entities inclusion against Inline Tag facilities.
Message-ID: <199705220709.JAA28224@ifhamy.insa-lyon.fr>

hello,

   In XML specs (like SGML features), they talk about entities inclusions in
   a document... Something like:

   <!DOCTYPE book [
	<!ELEMENT book (#PCDATA) >
	<!ENTITY  including SYSTEM "http://server1.com/index.txt">
   ]>
   <book>
   &including;
   </book>

   Okay,they say that with XML-SGML a document can be built with document-part-
   included using entities facilities.
   HTML doesn't make use of external entities but it can do inline image through
   some tag... In XML specs i doesn't see any reference to TAG or special attri-
   butes that can handle inclusion of document component (text,image,object).

   I would like to know :
	- if in the future, we 'll only use external entities to include a
	  document component ?
	- anyelse, does XML will support special attributes for Tag to specify
	  that this Tag with this attributes can include something?
	- or does this feature will be hardcoded in a browser, making the same
	  mistake than HTML?

   Thanks.

Regards,
Mutel Alexandre.
email: amutel@ifhamy.insa-lyon.fr


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Thu May 22 12:10:17 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:51 2004
Subject: General chat (Meeting Anouncement for HL7-SGML Mixer)
Message-ID: <7065@ursus.demon.co.uk>

In message <3.0.32.19970521212222.007a0100@pop.mindspring.com> "John C. Spinosa, MD, PhD" writes:
John,
	I don't know whether they have all gone to the list, but *I* got three
copies of this announcement :-)	More generally, this list is for XML 
developers and this post doesn't 
relate directly to XML.  [I did have a long and useful chat with Liora at 
SGML97 and so I know that she and other HL7 people are very interested in
XML, and of course she has done a first class job of publicising it.  One
immediate concern is the multiple namespace/DTD fragment/information object/
concern.]  Jon Bosak mentioned two weeks ago that there were movements to 
create an XML interest group (?comp.text.xml?) and I'd be happy to see that.  
Presumably it would require the USENET voting process?

*****

	Since I'm posting to the list anyway, I have been investigating
how to use NXP as a tool for turning JUMBO into a validating editor.  I
think I'm nearly there and will be posting back to Norbert what is required 
as an API.  One thing that I need very clearly back from both Lark and 
NXP is an error flag - 'parse this, please',  'sorry, error'.  The final
'result' will be a tree-structured editor rather than an event-stream
driven tool (I don't think Java is fast enough to allow character-by-character
validation of text processing).  

We haven't had any gossip about tools on this for a long time.  I could use
a nice simple non-validating XML editor - what have people got?  And what
has happened to all the enthusiasm for setting up MIRROR sites?


	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Thu May 22 12:10:31 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:51 2004
Subject: XML & Entities inclusion against Inline Tag facilities.
Message-ID: <7067@ursus.demon.co.uk>

In message <199705220709.JAA28224@ifhamy.insa-lyon.fr> Alexandre Mutel writes:

This is an important subject as I am currently wrestling with the XML linking 
spec at present.  I'd be happy to see a clear exposition of how XML 
includes/transcludes document/fragments, etc.

<INTERLUDE>
Although XML-DEV is not intended as a forum for beginners, there are a number
of questions - like the current one - which are legitimate to discuss if we 
don't have a lot of traffic.  I also think it's easy for developers to
misinterpret parts of the spec (I have done this in a major way and fairly
publicly with XML-LINK I think :-).  Also
<PLUG>
Since I am running a virtual course on XML and Java (see URL), it's useful to
know what questions come up :-)
</PLUG>
</INTERLUDE>

> hello,
> 
>    In XML specs (like SGML features), they talk about entities inclusions in
>    a document... Something like:
> 
>    <!DOCTYPE book [
> 	<!ELEMENT book (#PCDATA) >
> 	<!ENTITY  including SYSTEM "http://server1.com/index.txt">
>    ]>
>    <book>
>    &including;
>    </book>

This is indeed correct and PARSERs are required to implement it.  For
many applications it will simply be an insertion of the text in index.txt
at the point of the entity reference.  So if index.txt contained:

<P>That's all folks!</P>

the parser would create an intermediate instance:

<book>
<P>That's all folks!</P>
</book>

Note that if there is whitespace in the entity, this whitespace is
included in the document.  Also, if there are entity references in the
entity, *these* are then processed.

This facility only works for entities which are XML documents (but see NOTATIONS)
They cannot have a DOCTYPE or subset and must correspond to a wellformed 
document.
(e.g. 

<P>That's all folks!

would not be allowed.  However the spec 4.4(8) says that if the processor
(i.e. the parser) is NOT validating the document, it doesn't have to 
expand the entity.  I assume (contributions, please) that this would be
done through a parser switch (-E expand entities, or similar).  That means 
that your document could still parse (WF) even if the entity was not WF as
long as expansion was disabled.

> 
>    Okay,they say that with XML-SGML a document can be built with document-part-
>    included using entities facilities.
>    HTML doesn't make use of external entities but it can do inline image through
>    some tag... In XML specs i doesn't see any reference to TAG or special attri-
>    butes that can handle inclusion of document component (text,image,object).

This will be done through XML-LINK.  This is much more powerful than HTML as
it can be applied to any element.  Here's how HTML's IMG would look in XML

<!ATTLIST IMG 
    XML-LINK CDATA #FIXED SIMPLE 
    SHOW     CDATA #FIXED EMBED 
    ACTUATE  CDATA #FIXED AUTO
    HREF     CDATA #REQUIRED
>

This defines IMG to be a SIMPLE XML-LINK.  (Its target 'resource' is 
located through HREF just as in HTML's A.  <A> behaves with ACTUATE="USER"
SHOW="REPLACE", i.e. nothing happens till the user clicks it, and then
(usually) the display is replaced by the new 'resource'.  For IMG
the link is traversed immediately it is encountered, and the resource is
embedded in the document (probably near the <IMG> element).

> 
>    I would like to know :
> 	- if in the future, we 'll only use external entities to include a
> 	  document component ?

No, you can use XML-LINK to refer to part of the current document, as well
as to external documents.  If the external documents are XML then it is
often straightforward to include them, but only if they have the same DOCTYPE
If they have different DOCTYPEs we have a namespace problem and we are still
wrestling with that one (e.g.

<CML>
The rate of this reaction is given by 
<A HREF="eqn1.xml">equation 1</A>
</CML>
where eqn1.xml might be written in MathML.  
)

If the external entity is BINARY (i.e. not XML - it may stiil be ASCII) then
a NOTATION is required (e.g. for GIF).

I'll stop there and suggest someone else tells us how to use NOTATION 
because I haven't implemented it yet!!

> 	- anyelse, does XML will support special attributes for Tag to specify
> 	  that this Tag with this attributes can include something?

You can add XML-LINK attributes to ANY element, so you don't have to use
a single one like <TAG>

> 	- or does this feature will be hardcoded in a browser, making the same
> 	  mistake than HTML?

Nothing is hardcoded in JUMBO, which is the first XML browser that I know
of :-).  If a browser manufacturer wishes to limit their browser to 
one particular XML application then good luck to them - maybe their
market is well-defined.  For example, if someone writes an XML browser
specifically for mobile phones, they may well hardcode their application.
I am strongly urging the scientific/technical/medical community to develop
interoperable components and with CML and MathML we are off to a good start.

A generic browser (like JUMBO) has to be prepared to implement XML-LINK and
XML-STYLE independently of the DTD.  It also has to be able to switch
DOCTYPES for different namespaces.  In principle it also has to be able
to find tools to deal with a number of common NOTATIONS (GIF, CGM, etc.)
and I hope that people will produce self-installing tools for those to
save the browser m'facturers having to reinvent it every time. 

For the major horizontal browser m'facturers, we shall have to wait and see.
I'm very much hoping there is a good API into XML browsers so that developers
can avoid having to render HTML, interface with mail, etc.

Let's have your postings...but keep them targeted to the development of XML
tools, resources, documents, tutorials, etc.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eliot at isogen.com  Thu May 22 14:01:35 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:57:51 2004
Subject: XML & Entities inclusion against Inline Tag facilities.
Message-ID: <3.0.32.19970522065649.00c4f168@swbell.net>

At 09:52 AM 5/22/97 GMT, Peter Murray-Rust wrote:
>In message <199705220709.JAA28224@ifhamy.insa-lyon.fr> Alexandre Mutel
writes:

>No, you can use XML-LINK to refer to part of the current document, as well
>as to external documents.  If the external documents are XML then it is
>often straightforward to include them, but only if they have the same DOCTYPE
>If they have different DOCTYPEs we have a namespace problem and we are still
>wrestling with that one (e.g.
>
><CML>
>The rate of this reaction is given by 
><A HREF="eqn1.xml">equation 1</A>
></CML>
>where eqn1.xml might be written in MathML.  
>)

There is *NOT* a name space problem in this case.  The document "eqnl.xml"
is *parsed* outside the scope of the document that references (it is
semantically and functionally identical to a SUBDOC reference in normal
SGML).  Once the document is parsed, the result of that parsing is
combined, by application-specific means, with the document tree of the
referencing document.  At that point, things like content model constraints
are irrelevant and there are *NO* name space problems. 

In a typical implementation, the parsed result of the A element would
include a *pointer* to the parsed result ("grove") of the eqnl.xml
document, rather than literally including that document tree as a direct
child of the CML element or the A element (depending on how you decided to
represent the reference).  Because each document is in its own grove, the
name spaces for the documents are kept separate and there is no conflict.
Applications are free to follow the reference from one grove to another and
behave as if the second document was literally included at the point of
reference.

IT IS VITALLY IMPORTANT to remember the distinction between external text
entities referenced by inline entity reference, which are fragments of the
document string and are always parsed as part of it (when parsed at all),
and references to document entities using addressing from attributes
(either by URL or by attributes with a value prescription of ENTITY or
ENTITIES).

In the latter case, the referenced document is NOT parsed as part of the
referencing document.

Thus, there is a clear semantic difference between the use-by-reference of
text entity references and the use-by-value of document entity references.
[Do I have these two confused? It's early in the morning and I'm still
suffering jet lag.  By use-by-value, I mean you get the thing's value, not
the thing itself.]

The HyTime standard formalizes this notion of use-by-value through the
"value reference" facility, which simply makes explicit the semantic
intended by the A element in the above (that the effective value of the A
element is really the document it refers to).  But it is make very clear
that a value reference is a *semantic* distinction--it doesn't change the
way the source data is parsed.

One confusion factor here is that, unlike SGML today (but not in the near
future), if an XML file has no DOCTYPE declaration it can be used as either
an external text entity (parsed in the context of its reference) or as a
document entity (parsed in isolation), and you can't tell by looking at the
entity which it was intended to be.  In a very real sense, XML is saying
that all external entities are either subdocuments or documents, even
though XML doesn't include the formal notion of subdocument as in SGML.

>If the external entity is BINARY (i.e. not XML - it may stiil be ASCII) then
>a NOTATION is required (e.g. for GIF).
>
>I'll stop there and suggest someone else tells us how to use NOTATION 
>because I haven't implemented it yet!!

Notations serve two primary purposes:

1. To clearly document the data type of an entity
2. To enable the association of processors with data types.

The external identifier of a notation is intended to refer to the
documentation for the notation (e.g., the CGM standard, the GIF spec,
etc.).  It may also be used to associate the notation with a notation
processor.  In a general SGML or XML processing system, you would expect to
find a facility for mapping notations (by name or external ID) to
processors or entries in function libraries, e.g., through some form of
mapping catalog.  An obvious implementation technique on Windows would be
use OLE facilities to integrate the processors for data entities with the
base browser.  Part of the notation mapping would be the information needed
to configure the OLE communication.  I think at least one SGML editor is
implemented in this way.

Notations are somewhat redundant with MIME types, in that you may be able
to determine the data type of an entity by examining the entity or applying
whatever entrail reading gives you the MIME type.  However, notations have
the advantage that they're part of the document.  One way to use notations,
of course, is to map them to MIME types, e.g.:

<!NOTATION gif  SYSTEM "<mime>application/gif" > 
<!-- Here using the syntax of "formal system identifiers" defined in the
     Formal System Identifier Requirements annex of the HyTime standard
     to indicate that the system identifier is in fact a mime-type, which
     we need because it just as easily be a relative path name to a 
     file named "gif". -->

Or whatever the MIME type for GIF is.  If this mapping is done in a catalog
(rather than in the document), the same notation can be mapped to different
things on different systems (MIME types are not universal).

Notations must be used for data ("binary") entities.  They can also be
associated with elements by using attributes with a value prescription of
"NOTATION".  The notation named by the attribute then governs the
interpretation of the element and its content (after parsing, of course).
For example, you might do something like this:

<!DOCTYPE ProgramListing [
<!NOTATION C  PUBLIC "Kernigan and Richie" >
<!NOTATION Cpp SYSTEM >
<!NOTATION Perl SYSTEM >
<!NOTATION Scheme SYSTEM >
<!ELEMENT ProgramListing - - (#PCDATA) >
<!ATTLIST ProgramListing
          language (C | Cpp | Perl | Scheme) NOTATION #IMPLIED
>
]>
<programlisting language=perl>
<![CDATA[
sub do_nothing {
    return(0)
}
]]>
</programlisting>

Depending on the notation, you might provide different formatting of the
source or even automatically extract the content and test it or compile it
or something.

In full SGML, notations can have attributes defined for them, which can be
specified as part of the entity declarations.  Notation attributes are
intended to act as parameters to the processor of the notation.  A typical
example is attributes that describe the nature of a graphic, e.g.:

<!NOTATION TIFF SYSTEM >
<!ATTLIST #NOTATION TIFF
       compression (NONE |CCITTG4) NONE
>
<!ENTITY mytiff SYSTEM "mytiff.tiff" NDATA TIFF
   [ compression=ccittg4 ]
>

Notations and notation attributes are also used for declaring the use of
architectures and configuring their use within a document.  This makes
sense because a document type or architecture is defining the rules for a
particular data type, namely documents that conform to the document type or
architecture, therefore, it is part of the formal definition of a notation.
For example, to derive a document from an architecture, you would do
something like this (in this example, the archtecture is one I made up for
representing bibliography entries):

<!DOCTYPE MyDoc [
<!-- Declare that this document is derived, in part, from the
     Bibliography Entry architecture. -->

<?IS10744 ArcBase BibCat>
<!-- Names following "ArcBase" are names of notations that declare
     architectures. Architecture engine will expect to find those
     notations declared in the document: if it doesn't, it's an error.
  -->

<!NOTATION SGML PUBLIC "ISO 8879:1986//NOTATION Standard Generalized
                        Markup Language//EN" >
<!NOTATION BibCat PUBLIC "-//Kimber//NOTATION Bibliography Entry
Architecture//EN"
 -- A document architecture conforming to the 
    Architectural Form Definition Requirements of
    International Standard ISO/IEC 10744.         --
 >
<!-- The following notation attributes configure the use of the
     architecture and control how architectural recognition and
     processing is done by a general architecture engine such
     as SP. The attribute names are defined in the Architectural
     Forms Definition Requirements annex of the HyTime standard. 
  -->
<!ATTLIST #NOTATION BibCat
       ArcFormA NAME  #FIXED "BibCat" 
       ArcNamrA NAME  #FIXED "BibNames"
       ArcBridF NAME  #FIXED "BibBrid"
       ArcDocF  NAME  #FIXED "BibDoc"
       ArcOptSA NAMES #FIXED "options"
       ArcDTD   CDATA #FIXED "BibCat"
       options  CDATA #FIXED "" -- Specify "marc" to turn on MARC options --
 >
 <!-- The following entity is the "meta-DTD" for the BibCat architecture.
      It is referenced by the ArcDTD notation attribute.  An architecture
      processor uses this meta-DTD to validate the document against the
      meta-DTD. -->
 <!ENTITY BibCat SYSTEM "bibcat11.mdt" CDATA SGML
 >

 <!ELEMENT MyDoc - - (List-of-Books) >
 <!ATTLIST MyDoc
           BibCat  NAME #FIXED "BibCat" -- MyDoc derived from BibCat form --
 >
 ...

Personally, I think it is a serious mistake for XML to not have notation
attributes, in large part because of their use with architectures, which
are of critical importance to the use of XML.

Notations are also used in XML (and in the future, with SGML), to create
"formal processing instructions", where the notation name is the first
keyword of the processing instruction. e.g.:

<!NOTATION MyBrowser PUBLIC "my cool XML browser" >
<?MyBrowser something unique to my browser to control its processing>

This mechanism allows general processors to associate processing
instructions with processors (using the notation-to-processor mapping it
must already provide for entities and elements).  It also enables better
error reporting, because the processor can say 'Cannot find processor or
definition for processing instruction notation "MyBrowser", public ID "my
cool XML browser"',
rather than either silently ignoring the PI or issuing an "Unknown PI ..."
message.

Cheers,

Eliot

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Thu May 22 17:00:46 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:51 2004
Subject: XML & Entities inclusion against Inline Tag facilities.
Message-ID: <7081@ursus.demon.co.uk>

In message <3.0.32.19970522065649.00c4f168@swbell.net> "W. Eliot Kimber" writes:
> At 09:52 AM 5/22/97 GMT, Peter Murray-Rust wrote:
[...]
> ><CML>
> >The rate of this reaction is given by 
> ><A HREF="eqn1.xml">equation 1</A>
> ></CML>
> >where eqn1.xml might be written in MathML.  
> >)
> 
> There is *NOT* a name space problem in this case.  The document "eqnl.xml"
> is *parsed* outside the scope of the document that references (it is
> semantically and functionally identical to a SUBDOC reference in normal
> SGML).  Once the document is parsed, the result of that parsing is
> combined, by application-specific means, with the document tree of the
> referencing document.  At that point, things like content model constraints
> are irrelevant and there are *NO* name space problems. 

Thanks for clarifying this.  Please treat me as the archetypal newcomer
who means well.

Understood.  This is in fact what I do, but I was slightly misled
in the draft by the phrase under 'EMBED':

the 'designated resource should be embedded for the purposes of display or 
processing in the body of the resource and at the location where the traversal
started'.  I (mis)read that to mean that the spec required the remote
resource to be emebedded and then processed (i.e. parsed).  

I also share your concern with the likelihood of linking to a document
without a DOCTYPE which may have tags in common and where there is a 
possibility of confusion.  Since you point out that 'embedding' is really a 
pointer, then the application can keep the namespaces separate, though it
could be easy to make mistakes.

[...]
> One confusion factor here is that, unlike SGML today (but not in the near
> future), if an XML file has no DOCTYPE declaration it can be used as either
> an external text entity (parsed in the context of its reference) or as a
> document entity (parsed in isolation), and you can't tell by looking at the
> entity which it was intended to be.  In a very real sense, XML is saying
> that all external entities are either subdocuments or documents, even
> though XML doesn't include the formal notion of subdocument as in SGML.

Exactly.  And it is possible to see cases where a given file is used in
both ways (a) included through entities and (b) pointed to by LINK.

[... thanks for the explanation of notation ...]  I had not appreciated
the use of the NOTATION to flag PI-types and will adopt this. 

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From bosak at atlantic-83.Eng.Sun.COM  Thu May 22 22:46:42 1997
From: bosak at atlantic-83.Eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:57:51 2004
Subject: [Fwd:] XS discussion begins
Message-ID: <199705222044.NAA19450@boethius.eng.sun.com>

[The following message was just posted to the w3c-sgml-wg list.
Please note that the draft document referred to in the message is
intended for DSSSL implementors already familiar with ISO/IEC 10179,
but readers of this list may find the flow object taxonomy and tables
of characteristics interesting.  All follow-ups should be made to the
DSSSList.]

In the SGML ERB meeting of May 14, it was agreed that preliminary
discussion of xml-style (Part 3 of the XML specification suite) should
take place in parallel with our current task of finishing drafts of
xml-lang and xml-link, but in a different forum in order to prevent
that discussion from interfering with our deadlines for Parts 1 and 2.
Since xml-style has always been defined as based on a subset of DSSSL,
it was agreed that the discussion could begin with a draft that puts
the existing DSSSL Online (dsssl-o) specification in a form that can
easily be made into a Working Draft for XML Part 3 when the time
arrives for us to officially turn our attention to that part of the
activity.

I have now completed such a draft, which can be found at

 http://sunsite.unc.edu/pub/sun-info/standards/dsssl/xs/xs970522.ps.zip

When unzipped, this file should print out with no trouble on most
PostScript printers, and it displays well in the Solaris Image Tool.
If you wish, you can use an RTF viewer by downloading

 http://sunsite.unc.edu/pub/sun-info/standards/dsssl/xs/xs970522.rtf.zip

but in this case, you may get formatting somewhat different from what
I intended.  The RTF file was prepared by taking Jade output into
Microsoft Word and manually inserting page breaks in certain places;
no other hand work was performed.  The PS version was generated
directly from the massaged RTF using Word's LaserWriter II print
driver.  I hope to have an HTML version of the document available in
the next week or so.

In preparing this draft, I have incorporated a number of corrections
to do960816.htm kindly provided by Tony Graham (although I cannot
guarantee that all of his corrections were correctly performed) and
done major surgery on the prose descriptions of the flow object
classes, entirely eliminating the problematic language inherited from
the DSSSL committee draft of September 1995 and starting over with
language from the final committee draft, lightly edited for
consistency with its new context.  This should have fixed a lot of
problems that were formerly caused by forking the version tree but has
no doubt introduced some new ones.  I am relying on reviewers to help
with the work of correcting errors in this new version.

In anticipation of its merger into the XML suite in a couple of
months, the former dsssl-o application profile is now referred to
throughout the document as "xml-style," or more frequently "XS" for
short.  Our unofficial motto (arrived at over drinks in Barcelona) is

                      Nothing exceeds like XS.

Please note, however, that despite its name and its look, this
document is *not* in any way, shape, or form a W3C Working Draft.  In
fact, it is not a W3C document of any kind.  It is just a revision of
the dsssl-o application profile that has been circulating in one form
or another for over a year and a half.  Consequently, discussion of
the draft should take place on the list devoted to DSSSL, which you
can find out about from the http://www.mulberrytech.com/dsssl/dssslist
page.  Since this is not a W3C draft, and since the whole point of
this exercise is to move forward on the groundwork for XML Part 3
without interfering with the work on XML Parts 1 and 2, it is
essential that you DO NOT DISCUSS THIS DRAFT ON THE W3C-SGML-WG LIST.
Please confine all discussions to DSSSList or to other appropriate
public lists (though offhand I can't think of any others that would be
appropriate).

I will follow up on this message with another one to the DSSSList
setting forth some suggested guidelines for the XS discussion.

Jon

----------------------------------------------------------------------
 Jon Bosak, Online Information Technology Architect, Sun Microsystems
----------------------------------------------------------------------
 2550 Garcia Ave., MPK17-101,           |  Best is he that inuents,
 Mountain View, California 94043        |  the next he that followes
 Davenport Group::SGML Open::ANSI X3V1  |  forth and eekes out a good
 ::ISO/IEC JTC1/SC18/WG8::W3C SGML ERB  |  inuention.
----------------------------------------------------------------------


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Ingo.Macherius at tu-clausthal.de  Fri May 23 19:02:27 1997
From: Ingo.Macherius at tu-clausthal.de (Ingo Macherius)
Date: Mon Jun  7 16:57:51 2004
Subject: New XML article
Message-ID: <199705231702.TAA15827@majestix.rz.tu-clausthal.de>

Ladies and Gentlemen,

there is a new magazine article on XML avaliable. It appeared in the
German iX-magazine (http://www.heise.de/ix/) on 14th of May. I consider
it the first german language article aimed to the general public.
iX-magazine generously did the english translation and put online
versions on their server.

http://www.heise.de/ix/artikel/E/1997/06/106/   (english)
http://www.heise.de/ix/artikel/1997/06/106/     (german)

I have to thank Jon Bosak, who agreed to use his 10_mail example, and
Norbert Mikula for proofreading. All errors left are surely mine.
Any comment, correction etc. is welcome.

	++im
-- 
Snail : Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
Mail  : Ingo.Macherius@tu-clausthal.de WWW: http://www.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri May 23 20:18:38 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:51 2004
Subject: New XML article
Message-ID: <7166@ursus.demon.co.uk>

In message <199705231702.TAA15827@majestix.rz.tu-clausthal.de> Ingo Macherius writes:
> Ladies and Gentlemen,
> 
> there is a new magazine article on XML avaliable. It appeared in the
> German iX-magazine (http://www.heise.de/ix/) on 14th of May. I consider
> it the first german language article aimed to the general public.
> iX-magazine generously did the english translation and put online
> versions on their server.

Ingo,
	This is a first class article and I particularly like the diagrams.
Could you ask IX magazine to make sure it stays on the WWW for as long as 
possible?  I would like to be able to point people at it.  (If so, is that 
the permanent URL?).

	If you ever think of drawing diagrams for XML-LINK that would be 
a great help to me :-).

	P.

I hope this serves as a catalyst for other readers of this list - there is
a LOT of work that needs to be done in providing introductions to XML, and
in particular LINK will benefit from having clear expositions.  More examples
of all sorts are needed.

(one small correction - JUMBO is not specific to chemistry.  It will read
any XML document and display it as a tree.  Semantics can then be added at
the element level, by adding (say) MATRIX.class - which could have an
invert() method.  In this sense it is somewhat complementary to stylesheets.)

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From housel at ms7.hinet.net  Sat May 24 00:43:41 1997
From: housel at ms7.hinet.net (Peter S. Housel)
Date: Mon Jun  7 16:57:51 2004
Subject: MathML DTD
Message-ID: <199705232235.GAA25619@ms7.hinet.net>

With the help of SP-1.1.4 (included with the latest snapshot
of Jade) and the new -wxml option to nsgmlsu, I determined that I
was being a bit lax with my mixed content declarations (too
many levels of parentheses), and eventually managed to clean
up my DTD to remove the problem.

The new MathML DTD (which I just started trying to include into my
own DTD) has the same problem.  In fact it's a lot worse.  Has anyone
(with more DTD experience than I have) tried cleaning up the MathML
preliminary DTD so that it follows XML's stricter rules about when
#PCDATA can be declared?

-Peter S. Housel-	housel@ms7.hinet.net


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Ingo.Macherius at tu-clausthal.de  Sat May 24 01:06:59 1997
From: Ingo.Macherius at tu-clausthal.de (Ingo Macherius)
Date: Mon Jun  7 16:57:51 2004
Subject: New XML article
In-Reply-To: <7166@ursus.demon.co.uk> from "Peter Murray-Rust" at May 23, 97 07:08:40 pm
Message-ID: <199705232306.BAA13916@talentix.rz.tu-clausthal.de>

Peter Murray-Rust said:

| > there is a new magazine article on XML avaliable.
|
| Could you ask IX magazine to make sure it stays on the WWW for as long as 
| possible?  (If so, is that the permanent URL?).

Yes, the URL is intended to be permanent.

| 	If you ever think of drawing diagrams for XML-LINK that would be 
| a great help to me :-).

All diagrams were taken from my master thesis. It deals with ways of industrial
strength HTML publishing. My professor advised me not to publish it until I 
have a "quoteable publication" on the topic, which I think I have now.
Sad but true it's German only. But maybe some of the diagrams are of use.
They depict all of ISO/IEC 10179, SGML basics and Jigsaw. Second drawback is
that they were done with a non-mainstream design program.
But if anyone want to use any of them for non-commercial things, just ask.

The (also permanent) URL for the PostScript (107 single sided DINA4 pages)
version of the complete thesis is:

	http://www.tu-clausthal.de/~inim/thesis/thesis_im.zip

There are about 30 pictures throughout the text. Please note that the thesis
was finished before the advent of full XML, so some things are outdated.
Others, as the DSSSL overview and the SGML introduction, may be still usefull.

| (one small correction - 

Thanks to you and others who sent comments and errata via private mail. 
Obviously the translator didn't use the final German article as a basis, so
extinct errors were re-introduced. Some are new in the english version (e.g.
TEI is not HyTime). All known errors will be removed ASAP.

Just a note:
I went through the writer/editor/writer cycle 8 times for this article. The
main problem was the XML terminology. Formally correct sentences are often
not understandable by non-expert readers, while understandable versions are
often formally incorrect. So the cycle often was me writing formally correct
words, my editor "improving" them, me making the sentence correct again etc.
XML/SGML language could turn out to be a major problem for XML. E.g. just
try to explain the differences between markup/element/tag to a HTML person.
In HTML it's all the same ... (at least in common understanding).

	++im
-- 
Snail : Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld
Mail  : Ingo.Macherius@tu-clausthal.de WWW: http://www.tu-clausthal.de/~inim/
Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jenglish at crl.com  Sat May 24 02:04:51 1997
From: jenglish at crl.com (Joe English)
Date: Mon Jun  7 16:57:51 2004
Subject: MathML DTD
In-Reply-To: <199705232235.GAA25619@ms7.hinet.net>
References: <199705232235.GAA25619@ms7.hinet.net>
Message-ID: <199705240001.AA08521@mail.crl.com>


Peter S. Housel <housel@ms7.hinet.net> wrote:

> The new MathML DTD (which I just started trying to include into my
> own DTD) has the same problem.  In fact it's a lot worse.  Has anyone
> (with more DTD experience than I have) tried cleaning up the MathML
> preliminary DTD so that it follows XML's stricter rules about when
> #PCDATA can be declared?

I wouldn't bother just yet.

The MathML DTD does not accurately describe the MathML
language as specified in the rest of the TR.  Since the
prose of the working draft is normative (according to the
warning at the top of appendix A), the DTD fragment is
somewhat less than useful.


--Joe English

  jenglish@crl.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From andrewl at microsoft.com  Sat May 24 03:01:39 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:57:51 2004
Subject: Architectural Forms instead of Namespaces
Message-ID: <7BB61B44F197D011892800805FD4F792A4BFFD@RED-03-MSG.dns.microsoft.com>

Several writers have suggested that architectural forms could be used to
solve the namespaces problem.  Could someone who understands AFs rewrite
the example below to use AFs?

<XML>
	<xml-namespace>
	  <ref>http://www.bigbookstore.com/schema</ref>
	  <as>bk</as>
	</xml-namespace>

	<xml-namespace>
	  <ref>http://www.w3.org</ref>
	  <as>w3</as>
	</xml-namespace>

	<bk:ORDERS>
		<xml-namespace>
		  <ref>http://purl.org/dublincore</ref>
		  <as>dc</as>
		</xml-namespace>

		<xml-namespace>
		  <ref>http://www.shipping.com</ref>
		  <as>sh</as>
		</xml-namespace>

		 <LINEITEM>
			 <dc:NAME>Number, the Language of
				Science</dc:NAME>
			 <dc:AUTHOR>Dantzig</dc:AUTHOR>
			 <PRICE>5.95</PRICE>
			 <sh:ZONE>9</sh:ZONE>
			 <w3:DSIG> 
				<DIGEST>1234567890</DIGEST>
				<SIGNER>AndrewL@microsoft.com</SIGNER>
			</w3:DSIG>
		</LINEITEM >
	</bk:ORDERS>
</XML>

Thanks.

--Andrew Layman
   AndrewL@microsoft.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From andrewl at microsoft.com  Sat May 24 04:45:51 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:57:51 2004
Subject: CONCUR instead of Namespaces
Message-ID: <7BB61B44F197D011892800805FD4F792A4C00A@RED-03-MSG.dns.microsoft.com>

Following up on my earlier request re Architectural Forms, since several
posts have suggested that CONCUR  could be used to solve the namespaces
problem, could someone who understands CONCUR  rewrite the example below
to use CONCUR?

> <XML>
> 	<xml-namespace>
> 	  <ref>http://www.bigbookstore.com/schema</ref>
> 	  <as>bk</as>
> 	</xml-namespace>
> 
> 	<xml-namespace>
> 	  <ref>http://www.w3.org</ref>
> 	  <as>w3</as>
> 	</xml-namespace>
> 
> 	<bk:ORDERS>
> 		<xml-namespace>
> 		  <ref>http://purl.org/dublincore</ref>
> 		  <as>dc</as>
> 		</xml-namespace>
> 
> 		<xml-namespace>
> 		  <ref>http://www.shipping.com</ref>
> 		  <as>sh</as>
> 		</xml-namespace>
> 
> 		 <LINEITEM>
> 			 <dc:NAME>Number, the Language of
> 				Science</dc:NAME>
> 			 <dc:AUTHOR>Dantzig</dc:AUTHOR>
> 			 <PRICE>5.95</PRICE>
> 			 <sh:ZONE>9</sh:ZONE>
> 			 <w3:DSIG> 
> 				<DIGEST>1234567890</DIGEST>
> 				<SIGNER>AndrewL@microsoft.com</SIGNER>
> 			</w3:DSIG>
> 		</LINEITEM >
> 	</bk:ORDERS>
> </XML>
> 
> Thanks.
> 
> --Andrew Layman
>    AndrewL@microsoft.com
> 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jjc at jclark.com  Sat May 24 07:58:29 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun  7 16:57:52 2004
Subject: Architectural Forms instead of Namespaces
Message-ID: <2.2.32.19970524054205.00ddc9fc@jclark.com>

At 18:00 23/05/97 -0700, Andrew Layman wrote:
>Several writers have suggested that architectural forms could be used to
>solve the namespaces problem.  Could someone who understands AFs rewrite
>the example below to use AFs?

If I was designing an architectural form mechanism that could work with just
instances, I would probably do it something like:

<XML>
 <?xml-arch
  arch="IDN//www.bigbookstore.com//ARCH Book Orders//EN"
  form-att="bk"
 ?>
 <?xml-arch
  arch="IDN//www.w3.org//ARCH Digital Signatures 1.0//EN"
  form-att="w3"
 ?>
  <BOOK-ORDERS BK="ORDERS">
   <?xml-arch
    arch="IDN//purl.org//ARCH Dublin Core//EN"
    form-att="dc"
   ?>
   <?xml-arch
    arch="IDN//www.shipping.com//ARCH Shipping//EN"
    form-att="sh"
   ?>
   <LINEITEM BK="LINEITEM">
    <NAME DC="NAME">Number, the Language of Science</NAME>
    <AUTHOR DC="AUTHOR">Dantzig</AUTHOR>
    <PRICE BK="PRICE">5.95</PRICE>
    <SHIPPING-ZONE SH="ZONE">9</SHIPPING-ZONE>
    <DIGITAL-SIGNATURE W3="DSIG"> 
     <DIGEST W3="DIGEST">1234567890</DIGEST>
     <SIGNER W3="DIGEST">AndrewL@microsoft.com</SIGNER>
    </DIGITAL-SIGNATURE>
   </LINEITEM>
  </BOOK-ORDERS>
</XML>

In fact I would always use a DTD subset to get something like this:

<!DOCTYPE XML [
<?xml-arch
 arch="IDN//www.bigbookstore.com//ARCH Book Orders//EN"
 form-att="bk"
?>
<?xml-arch
 arch="IDN//www.w3.org//ARCH Digital Signatures 1.0//EN"
 form-att="w3"
?>
<?xml-arch
 arch="IDN//purl.org//ARCH Dublin Core//EN"
 form-att="dc"
?>
<?xml-arch
 arch="IDN//www.shipping.com//ARCH Shipping//EN"
 form-att="sh"
?>
<!ATTLIST BOOK-ORDERS BK NAME #FIXED "ORDERS">
<!ATTLIST LINEITEM BK NAME #FIXED "ORDERS">
<!ATTLIST NAME DC NAME #FIXED "NAME">
<!ATTLIST AUTHOR DC NAME #FIXED "NAME">
<!ATTLIST PRICE DC NAME #FIXED "PRICE">
<!ATTLIST SHIPPING-ZONE SH NAME #FIXED "ZONE">
<!ATTLIST DIGITAL-SIGNATURE W3 NAME #FIXED "DSIG">
<!ATTLIST DIGEST W3 NAME #FIXED "DIGEST">
<!ATTLIST SIGNER W3 NAME #FIXED "SIGNER">
]>
<XML>
  <BOOK-ORDERS>
   <LINEITEM>
    <NAME>Number, the Language of Science</NAME>
    <AUTHOR>Dantzig</AUTHOR>
    <PRICE>5.95</PRICE>
    <SHIPPING-ZONE>9</SHIPPING-ZONE>
    <DIGITAL-SIGNATURE> 
     <DIGEST>1234567890</DIGEST>
     <SIGNER>AndrewL@microsoft.com</SIGNER>
    </DIGITAL-SIGNATURE>
   </LINEITEM>
  </BOOK-ORDERS>
</XML>

and I would also probably make use of the rules for defaulting the form
attribute so I could instead do:

<!DOCTYPE XML [
<?xml-arch
 arch="IDN//www.bigbookstore.com//ARCH Book Orders//EN"
 form-att="bk"
?>
<?xml-arch
 arch="IDN//www.w3.org//ARCH Digital Signatures 1.0//EN"
 form-att="w3"
?>
<?xml-arch
 arch="IDN//purl.org//ARCH Dublin Core//EN"
 form-att="dc"
?>
<?xml-arch
 arch="IDN//www.shipping.com//ARCH Shipping//EN"
 form-att="sh"
?>
<!ATTLIST SHIPPING-ZONE SH NAME #FIXED "ZONE">
<!ATTLIST DIGITAL-SIGNATURE W3 NAME #FIXED "DSIG">
]>
<XML>
  <BOOK-ORDERS>
   <LINEITEM>
    <NAME>Number, the Language of Science</NAME>
    <AUTHOR>Dantzig</AUTHOR>
    <PRICE>5.95</PRICE>
    <SHIPPING-ZONE>9</SHIPPING-ZONE>
    <DIGITAL-SIGNATURE> 
     <DIGEST>1234567890</DIGEST>
     <SIGNER>AndrewL@microsoft.com</SIGNER>
    </DIGITAL-SIGNATURE>
   </LINEITEM>
  </BOOK-ORDERS>
</XML>

Finally I would probably put the DTD in a separate file:

<!DOCTYPE XML SYSTEM "http://www.jclark.com/dtds/book-order.dtd">
<XML>
  <BOOK-ORDERS>
   <LINEITEM>
    <NAME>Number, the Language of Science</NAME>
    <AUTHOR>Dantzig</AUTHOR>
    <PRICE>5.95</PRICE>
    <SHIPPING-ZONE>9</SHIPPING-ZONE>
    <DIGITAL-SIGNATURE> 
     <DIGEST>1234567890</DIGEST>
     <SIGNER>AndrewL@microsoft.com</SIGNER>
    </DIGITAL-SIGNATURE>
   </LINEITEM>
  </BOOK-ORDERS>
</XML>

James


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat May 24 08:26:48 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:52 2004
Subject: New XML article
Message-ID: <7198@ursus.demon.co.uk>

In message <199705232306.BAA13916@talentix.rz.tu-clausthal.de> Ingo Macherius writes:
Thanks,
[...]
> 
> Just a note:
> I went through the writer/editor/writer cycle 8 times for this article. The
> main problem was the XML terminology. Formally correct sentences are often
> not understandable by non-expert readers, while understandable versions are
> often formally incorrect. So the cycle often was me writing formally correct

I think this is extremely important.  I have been (trying) to interpret 
and implement the XML-LINK spec and getting some of it wrong :-).  XML is 
difficult in places unless you are quite familiar with SGML - I think XML-LINK
will be a major challenge to the drafters.  (I'm sure they'll manage it :-).

This is why I'm keen about diagrams.  XML-link is described in words, and 
it's incredibly easy to read the wrong meaning into them.  For example,
the 'locators' in 'links' locate 'resources' and it's not easy to write
programs until it's absolutely clear what each of these means.  The reverse
is also true - when something is described clearly and precisely it makes it
enormously easier to write code.

> words, my editor "improving" them, me making the sentence correct again etc.
> XML/SGML language could turn out to be a major problem for XML. E.g. just
> try to explain the differences between markup/element/tag to a HTML person.
> In HTML it's all the same ... (at least in common understanding).

I agree.  There are several aspects.  One to protect the newcomer from too
much terminology to start with.  Of course it must always be precise.
Another is to relate abstract terminology to concrete examples where possible.

I have developed an XML application to manage terminology and I'm just about
to start collecting XML terminology for it (unless someone is already doing 
this.  It's loosely based on ISO12620 and displays a (hierarchical) glossary 
as an XML tree.  

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Mon May 26 21:58:46 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:52 2004
Subject: LT XML toolset, parser, developers (fwd)
Message-ID: <7273@ursus.demon.co.uk>

Posted by Henry Thompson and intended for xml-dev - apologies if it's a 
duplicate.

	P.

Forwarded message follows:

> From w3c-sgml-wg-request@w3.org Mon May 26 13:53:09 1997
> Received: from relay-5.mail.demon.net by ursus.demon.co.uk with SMTP
> 	id AA7267 ; Mon, 26 May 97 13:53:07 BST
> Received: from punt-1.mail.demon.net by mailstore for peter@ursus.demon.co.uk
>           id 864642046:05:00596:4; Mon, 26 May 97 11:20:46 BST
> Received: from www19.w3.org ([18.29.0.19]) by punt-1.mail.demon.net
>            id aa0604148; 26 May 97 11:19 BST
> Received: by www19.w3.org (8.8.5/8.6.12) id GAA07682; Mon, 26 May 1997 06:14:26 -0400 (EDT)
> Resent-Date: Mon, 26 May 1997 06:14:26 -0400 (EDT)
> Resent-Message-Id: <199705261014.GAA07682@www19.w3.org>
> X-Authentication-Warning: www10.w3.org: Host stevenson144.cogsci.ed.ac.uk [129.215.144.1] claimed to be stevenson.cogsci.ed.ac.uk
> Message-Id: <1296.199705261013@grogan.cogsci.ed.ac.uk>
> From: "Henry S. Thompson" <ht@cogsci.ed.ac.uk>
> Date: Mon, 26 May 97 11:13:42 BST
> To: w3c-sgml-wg@w3.org, salt@uk.ac.ed.cstr, xml-dev@ic.ac.uk,
>         elsnet-list@uk.ac.ed.cogsci, corpora@no.uib.hd,
>         empiricists@EDU.Stanford.CSLI
> Subject: LT XML toolset, parser, developers' API released
> X-List-URL: http://www.w3.org/pub/WWW/Archives/Public/w3c-sgml-wg/
> X-See-Also: http://www.w3.org/pub/WWW/MarkUp/SGML/Activity
> Resent-From: w3c-sgml-wg@w3.org
> X-Mailing-List: <w3c-sgml-wg@w3.org> archive/latest/4795
> X-Loop: w3c-sgml-wg@w3.org
> Sender: w3c-sgml-wg-request@w3.org
> Resent-Sender: w3c-sgml-wg-request@w3.org
> Precedence: list
> Status: R
> 
> The Language Technology Group is pleased to announce the beta release
> of LT XML, the first publicly available XML toolset written in C.
> 
> For further information and access to the software distribution, see
> 
>   http://www.ltg.ed.ac.uk/software/xml/
> 
> The LT XML tool-kit includes stand-alone tools for a wide range of
> processing of well-formed XML documents, including searching and
> extracting, down-translation (e.g. report generation, formatting),
> tokenising and sorting.
> 
> LT XML is an integrated set of XML tools and a developers' tool-kit,
> including a C-based API. The beta release now available is UNIX-only,
> but a WIN16 version will be available in the near future.
> 
> Sequences of tool applications can be pipelined together to achieve
> complex results.
> 
> For special purposes beyond what the pre-constructed tools can
> achieve, extending their functionality and/or creating new tools is
> easy using the LT XML API, which provides both event-oriented and
> tree-fragment oriented access to the input document stream. Minimal
> applications require less than one-half page of C code to express.
> 
> LT XML is available to anyone free of charge for non-commercial purposes.
> 
> ----------------------
> Henry S. Thompson, Human Communication Research Centre, University of Edinburgh
>       2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
>                Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk  
>                       URL: http://www.ltg.ed.ac.uk/~ht/
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue May 27 15:56:29 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:52 2004
Subject: Commercial job postings?
Message-ID: <3.0.32.19970527065333.00a1fe80@pop.intergate.bc.ca>

Meta-question:

I was talking to some people yesterday who want to do some big bold
things with XML, and were wondering where a good place might be to
look for people.  I told them that XML was easy enough that they
ought to hire people with application expertise and they'll pick up
XML in no time, but they weren't convinced.

Anyhow, would people consider it an abuse of this mailing list if 
the odd job posting started showing up?  I personally think it would
be good, simply as a service to the industry we're trying to create,
and as a useful barometer of what's going on out there. - Tim


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed May 28 08:15:48 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:52 2004
Subject: Commercial job postings?
Message-ID: <7301@ursus.demon.co.uk>

In message <3.0.32.19970527065333.00a1fe80@pop.intergate.bc.ca> Tim Bray writes:
> Meta-question:
[...]
> 
> Anyhow, would people consider it an abuse of this mailing list if 
> the odd job posting started showing up?  I personally think it would
> be good, simply as a service to the industry we're trying to create,
> and as a useful barometer of what's going on out there. - Tim

Personally I wouldn't have a problem, but it's Henry who does the hard
work behind the scenes.  Job listings would certainly give the impression
would certainly  give the impression XML is going places.  [Just give me
a private copy as well :-)].

While I'm posting, this is probably a trivial XML/SGML question, but I am
worried about EMPTY content in WF documents.  As far as I can see, in
validated SGML

<!Doctype A [
<!ELEMENT A - o EMPTY>
]>
<A> 

and 

<!DOCTYPE A [
<!element a - - ANY>
]>
<A></A>

both return the same result, and that seems to be the same with NXP 
both validating and non-validating (the first example uses <A/> of course).
Is there any way that the content could be returned as a null string ("")?

Can attributes have null values?  is

B=""    the same as omitting B when #IMPLIED?

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From lee at sq.com  Wed May 28 16:17:29 1997
From: lee at sq.com (lee@sq.com)
Date: Mon Jun  7 16:57:52 2004
Subject: Commercial job postings?
Message-ID: <9705281416.AA11586@sqrex.sq.com>

Peter Murray-Rust wrote:
> As far as I can see, in
> validated SGML
> 
> <!Doctype A [
> <!ELEMENT A - o EMPTY>
> ]>
> <A> 
> 
> and 
> 
> <!DOCTYPE A [
> <!element a - - ANY>
> ]>
> <A></A>
> 
> both return the same result,
That depends on what you mean by "return".  ISO 8879 doesn't say anything
about anyone returning things at all.  They are not distinguished in ESIS,
but that's just because ESIS is broken in this regard.

> Can attributes have null values?  is
> 
> B=""    the same as omitting B when #IMPLIED?

No, they are not the same -- an attribute value can be zero characters,
as can an element's content.

Lee


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From rrseibel at att.com  Wed May 28 16:52:01 1997
From: rrseibel at att.com (Seibel, Robert R)
Date: Mon Jun  7 16:57:52 2004
Subject: Comercial XML editor recommendations
Message-ID: <c=US%a=_%p=ATT%l=NJ8102PO01-970528145135Z-4179@nj-mailnet.ho.att.com>

XMLers:

I'm a new member to this list so please excuse my ignorance
of what has gone on in the past.

I'm surveying the market for XML editors for my project.
I know the market is in its infancy but does anyone know who
is farthest along? The editor should:

1) let me add my own markup tags into a pull down menu
2) use a predefined template of tags (elements) to start the
    document off
3) let me format those tags using a style sheet
4) permit editing in a WYSIWYG mode according to the
    style sheet
5) be simple to use, hiding detail from the authors

Does anyone have any recommendations???

Thanks.
>Bob Seibel

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Thu May 29 01:12:42 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:52 2004
Subject: Commercial job postings?
Message-ID: <7327@ursus.demon.co.uk>

In message <9705281416.AA11586@sqrex.sq.com> lee@sq.com writes:
> Peter Murray-Rust wrote:
> > As far as I can see, in
> > validated SGML
> > 
> > <!Doctype A [
> > <!ELEMENT A - o EMPTY>
> > ]>
> > <A> 
> > 
> > and 
> > 
> > <!DOCTYPE A [
> > <!element a - - ANY>
> > ]>
> > <A></A>
> > 
> > both return the same result,
> That depends on what you mean by "return".  ISO 8879 doesn't say anything
> about anyone returning things at all.  They are not distinguished in ESIS,
> but that's just because ESIS is broken in this regard.

Understood.  In that case the question might be 
'for WF XML documents with no Element declaration, are:
<A/>
and
<A></A>
identical?

> 
> > Can attributes have null values?  is
> > 
> > B=""    the same as omitting B when #IMPLIED?
> 
> No, they are not the same -- an attribute value can be zero characters,
> as can an element's content.
            ^^^^^^^^^
My implication from this is that in my second example, A has a child of 
unknown type of null content OR that A has a #PCDATA child which a content
of "".  In which case it could have an effect on (say) counting #PCDATA
children.  If so, it might need flagging in the draft...

	P.


> 
> Lee
> 
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Thu May 29 01:12:59 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:52 2004
Subject: Comercial XML editor recommendations
Message-ID: <7328@ursus.demon.co.uk>

In message <c=US%a=_%p=ATT%l=NJ8102PO01-970528145135Z-4179@nj-mailnet.ho.att.com> "Seibel, Robert R" writes:
> XMLers:
> 
> I'm a new member to this list so please excuse my ignorance
> of what has gone on in the past.

Don't worry!  This is an important topic and one that hasn't been discussed.
IMO editors are going to be key for certain aspects of XML.
> 
> I'm surveying the market for XML editors for my project.

I think that there are two extremes to the spectrum  (A) the 'traditional' 
which is the one that I think you allude to - writing and editing text,
sformatting, spellchecking, etc. and (B) the new opportunities, so 
bringing in a graphics, adding an image map, adding some maths, creating a
link database, importing and converting legacy files on the fly.  (B) is
where I am aiming JUMBO at - at present it will edit the structure tree,
import new legacy data and convert on the fly but it doesn't edit text.
It will also be aimed at using NXP to validate vs the DTD.

> I know the market is in its infancy but does anyone know who
> is farthest along? The editor should:
> 
> 1) let me add my own markup tags into a pull down menu
> 2) use a predefined template of tags (elements) to start the
>     document off
> 3) let me format those tags using a style sheet
> 4) permit editing in a WYSIWYG mode according to the
>     style sheet
> 5) be simple to use, hiding detail from the authors

Can't help in detail, but there were several promising prototypes at SGML97,
Stilo, Balise, Frame,  etc. Maybe these vendors would like to say something?

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From michael at textscience.com  Thu May 29 05:53:10 1997
From: michael at textscience.com (Michael Leventhal)
Date: Mon Jun  7 16:57:52 2004
Subject: Comercial XML editor recommendations
In-Reply-To: <7328@ursus.demon.co.uk> from "Peter Murray-Rust" at May 28, 97 11:05:35 pm
Message-ID: <199705290352.UAA08842@shell1.aimnet.com>

Peter Murray-Rust wrote:
> > I'm surveying the market for XML editors for my project.
> 
> I think that there are two extremes to the spectrum  (A) the 'traditional' 
> which is the one that I think you allude to - writing and editing text,
> sformatting, spellchecking, etc. and (B) the new opportunities, so 
> bringing in a graphics, adding an image map, adding some maths, creating a
> link database, importing and converting legacy files on the fly.  (B) is
> where I am aiming JUMBO at - at present it will edit the structure tree,
> import new legacy data and convert on the fly but it doesn't edit text.
> It will also be aimed at using NXP to validate vs the DTD.
> 
> Can't help in detail, but there were several promising prototypes at SGML97,
> Stilo, Balise, Frame,  etc. Maybe these vendors would like to say something?

I'd like to, but I am very concerned about misusing this list for commercial 
purposes, despite the invitation.  I think I can mention that Grif did demo
_two_ XML editors at SGML '97 Europe and WWW6.

I also think I can pursue Peter's point about there being two types
of editors, A and B above, from a technological/philosphical/cultural 
perspective.  Grif also has an A and B which are not exactly what Peter 
describes but sort of close.  The origin was not intended to delineate a 
philosophical distinction although the currents of history may have in fact 
made it so.

Grif's XML editor A is a knock-off from its traditional SGML with
"WYSIWYG to the max" product.  It requires a DTD, enforces structure,
and controls the presentation through a high-end style sheet mechanism.  
XML editor B is a knock-off of Grif's HTML editor, Symposia, and does 
not enforce structure, allows you to add tags at will, is CSS-based
and does the usual HTML-related stuff like allow you to create
(XML) links and image maps, add math, etc.

I initially found the idea of having two XML editors to be possibly 
schizophrenic so I am intrigued by Peter being already in possession of
a two editor world-view, essentially the SGML and the HTML
approaches, DTD-required vs well-formed.  I guess I always assumed
that you'd combine the two, change modes at the flick of a switch,
but somehow encourage more rather than less structure by always
having the capability of showing the user his or her structural
failings.  Of course, the code bases have, by now, divurged greatly
though companies like Grif certainly leveraged their SGML experience
in entering the HTML fray.  But I thought the perspectives were coalescing.  
Is this two editor approach a transitional stage on the way to a more 
glorious evolutionary stage or have we, in fact, distinguished different
types of tasks to which different types of tools have been precisely tailored
to exact nature of the task?

Michael Leventhal

==================================================================
Michael Leventhal  1800 Lake Shore Ave, Ste 14  V (510) 444-2962
VP Technology      Oakland, CA  94606           F (510) 444-1672
GRIF, SA           michael@textscience.com      http://www.grif.fr


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Thu May 29 10:45:44 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:52 2004
Subject: Comercial XML editor recommendations
Message-ID: <7353@ursus.demon.co.uk>

In message <199705290352.UAA08842@shell1.aimnet.com> Michael Leventhal writes:
> Peter Murray-Rust wrote:
[...]
> 
> I'd like to, but I am very concerned about misusing this list for commercial 
> purposes, despite the invitation.  I think I can mention that Grif did demo
> _two_ XML editors at SGML '97 Europe and WWW6.

I appreciate this restraint, thanks - but would like to suggest that we can
relax it a bit *at this stage in XML development*.  My reasoning is as 
follows.  [BTW there is absolutely no *pressure* for any m'facturer to say 
anything here in advance of public release - and no inferences should be
drawn from any apparent silences.  So if there is no response - for
good commercial reasons - fine.  I take it as axiomatic that all major
current SGML m'facturers are *interested* in XML, so silence carries little
information :-)].

Implementations tend to define de facto procedures.  For example when C++ came
out it was an almighty mess.  There were several different compilers, all from
different manufacturers and working to different levels and by different 
mechanisms.  Some used a preprocessor, some were native, some had templates
and all on a varying timescale.  You very soon got not only m'facturer
lockin, but version lockin :-(

The XML-spec is not yet frozen, but people are (rightly IMO) creating
tools in advance of the final spec.  Let's say those tools suddenly
emerged on July 2 (spec is announced July 1. right?) and they take
fundamentally different approaches to the language, that *may* have some 
bearing on language revisions.  We are concerned that XML does not have
multiple conformance levels, and a comparison of editor/parser features may 
help to approach that problem.

Many *document* developers may be wishing to create trial XML documents
or prototype legacy conversion.  It would be reasonable for them to
ask where they could find a (prototype) editor to start with.  They might
then discover that there were significant problems/advantages in XML.

<HINT>
[Some of these problems may also be dealt with if people compile XML
resource pages.]
</HINT>

> 
> I also think I can pursue Peter's point about there being two types
> of editors, A and B above, from a technological/philosphical/cultural 
> perspective.  Grif also has an A and B which are not exactly what Peter 
> describes but sort of close.  The origin was not intended to delineate a 
> philosophical distinction although the currents of history may have in fact 
> made it so.

My motivation here is that I see editing as one of the key steps to getting
XML universally accepted.  Yes, the current text-oriented SGML tools will
be modified/rewritten to give XML editors, but they won't address the
applications that no-one has thought of.  What does a CML editor want?

> 
> Grif's XML editor A is a knock-off from its traditional SGML with
> "WYSIWYG to the max" product.  It requires a DTD, enforces structure,
> and controls the presentation through a high-end style sheet mechanism.  
> XML editor B is a knock-off of Grif's HTML editor, Symposia, and does 
> not enforce structure, allows you to add tags at will, is CSS-based
> and does the usual HTML-related stuff like allow you to create
> (XML) links and image maps, add math, etc.

This is very exciting news.  I would be interested to know more.

> 
> I initially found the idea of having two XML editors to be possibly 
> schizophrenic so I am intrigued by Peter being already in possession of
> a two editor world-view, essentially the SGML and the HTML
> approaches, DTD-required vs well-formed.  I guess I always assumed
> that you'd combine the two, change modes at the flick of a switch,
> but somehow encourage more rather than less structure by always
> having the capability of showing the user his or her structural
> failings.  Of course, the code bases have, by now, divurged greatly
> though companies like Grif certainly leveraged their SGML experience
> in entering the HTML fray.  But I thought the perspectives were coalescing.  
> Is this two editor approach a transitional stage on the way to a more 
> glorious evolutionary stage or have we, in fact, distinguished different
> types of tasks to which different types of tools have been precisely tailored
> to exact nature of the task?

What I want for an editor for the chemical community is, I think, 
generalisable to may other applications.

(a) no discipline-specific tools, but good hooks to link them in
(b) full support for XML-LINK
(c) tree-based editing
(d) attribute editing , controlled by DTD
(e) import of legacy data and conversion on the fly by user-written add-ons
(f) support for whatever solutions XML comes up with for XML-TYPE, XML-LINK
XML-STYLE, XML-MONEY,...
(g) WYSIWYG HTML editing with XML-LINKing to imported subdocuments.
(h) Cunning chemical editing that I think of and develop.

*I* can do the *chemical* bit.  I'd prefer to do it once and not one for A's
tool, one for B's tool, etc.  My current preference for several reasons 
would be Java beans - e.g. there will be a HTML bean, Word bean, Molecule
bean, etc.

I have always felt that posters to comp.text.sgml have been very responsible
in the use of commercial postings.  I think that a listing of current
capabilities of editors would be valuable to readers of this list.  
However if people don't general share this view, please post - either to the
list or me personally - and I will then suggest revised etiquette.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From cbullard at hiwaay.net  Fri May 30 03:43:14 1997
From: cbullard at hiwaay.net (len bullard)
Date: Mon Jun  7 16:57:52 2004
Subject: Comercial XML editor recommendations
References: <199705290352.UAA08842@shell1.aimnet.com>
Message-ID: <338E3096.32F3@hiwaay.net>

Michael Leventhal wrote:
> 
> I thought the perspectives were coalescing.
> Is this two editor approach a transitional stage on the way to a more
> glorious evolutionary stage or have we, in fact, distinguished different
> types of tasks to which different types of tools have been precisely tailored
> to exact nature of the task?
> 
> Michael Leventhal

Possible.  Even in the past, we have seen considerable differences 
between SGML-complete editors that were very powerful and came 
with attendant setup complexity, and editors that just let you 
point to a DTD and get a configured editing interface.

Along the way, some systems whose design parameters did not 
include the complexities of *faithful to the pica* print 
requirements have been used successfully.  At least two 
of these were based on laissez-faire (well-formed 
input/batch validation on request) systems.  These fared 
well in production environments and are still deployed.

Here is another perspective.  What if DTDs came into 
being as a result of measurement of frequency and 
occurrence rather than from design and imposition?
Note I am not talking about DTDs generated by inducing 
markup, but DTDs created as tags are generated by  
users in the course of natural tagging.  Consider 
the habits borne of the HTML users who began to 
unwittingly use content tagging styles almost as 
jokes to delineate thoughts in emails, etc.
It is interesting to speculate what the place of 
genetic DTDs such as could be created from these 
would have since in some ways the resemble a 
natural language emerging from an artificial language 
environment.

Len Bullard
Intergraph Corporation

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From fahrner at pobox.com  Fri May 30 07:08:37 1997
From: fahrner at pobox.com (Todd Fahrner)
Date: Mon Jun  7 16:57:52 2004
Subject: Comercial XML editor recommendations
In-Reply-To: <338E3096.32F3@hiwaay.net>
References: <199705290352.UAA08842@shell1.aimnet.com>
Message-ID: <v03102800afb406b68bdd@[206.245.203.103]>

At 20:42 -0500 5.29.97, len bullard wrote:

> Here is another perspective.  What if DTDs came into
> being as a result of measurement of frequency and
> occurrence rather than from design and imposition?
> Note I am not talking about DTDs generated by inducing
> markup, but DTDs created as tags are generated by
> users in the course of natural tagging.  Consider
> the habits borne of the HTML users who began to
> unwittingly use content tagging styles almost as
> jokes to delineate thoughts in emails, etc.
> It is interesting to speculate what the place of
> genetic DTDs such as could be created from these
> would have since in some ways the resemble a
> natural language emerging from an artificial language
> environment.

Fascinating. A year from now, you could set a spider to identify patterns
of class markup on HTML elements on the Web, and upon semantic analysis
either fold these as elements into an XML version of HTML, or create new
DTDs based on aggregations within narrow subject areas.

Apple's proposal for an extensible metadata format, Meta Content Format
(MCF), expressly passes up SGML with the following explanation:

	The main reason for introducing yet another file format
	is so that we have an interchange format that is not
	beholden to legacy applications that can['t] track the
	changes in the expressiveness of MCF. [1]

If I'm reading correctly, this reasoning may have been valid before XML -
and possibly to some extent even now - but if XML tools develop as a kind
of nursery for "genetic" DTDs, then it will have been rebutted.

[1] http://mcf.research.apple.com/mcf.html

________________________________________
Todd Fahrner
mailto:fahrner@pobox.com
http://www.verso.com/

The printed page transcends space and time. The printed page, the
infinitude of books, must be transcended. THE ELECTRO-LIBRARY.

--El Lissitzky, 1923


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat May 31 01:12:08 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:57:52 2004
Subject: XML-LINK
Message-ID: <7391@ursus.demon.co.uk>

I am trying to understand how XML-LINK might be used and would be
grateful for some gentle hints.  The motivation is to develop a
set of routines in JUMBO that are generic and will support a reasonable
variety of ways in which links might be used.  I am confident that there are
readers of this list who have clear ideas of how links might be used and
I hope they can spend a few minutes to give some *very* simple guidance.

<DISCLAIMER>
As we are all aware the XML-LINK spec is in early draft and is scheduled for
revision before July 1.  It is also widely agreed that some of the
terminolgy needs tightening and that some details of the syntax and the 
semantics need addressing.  So only a general approach is required.
</DISCLAIMER>

I hope it will be seen as helpful if I put forward my current understanding
of what XML-LINK is intended to do, and XML-DEVers can annotate my ramblings.
[They can use XML-LINK do to it :-), accepting that we have no means of 
addressing into my content.]  So here goes...

A link has ends which are called resources.  My current understanding is
that these can be thought of as points in the structure of a document, and
will often coincide with Elements.  I am as yet unclear about the total 
number of possible topolgies of a link, and ask some questions here.

Structure and Behaviour.

My understanding is that a hyperdocument can have a link structure which is
independent of behaviour - it simply represents the structure of the 
information.  I'm happy with this - what I'm less clear about is whether
there are *commonly agreed semantics* for this, or whether it's all
application-dependent.  [If the answer to all my concerns is 'application-
dependent' then it will be a pity because everyone will write individual
link processors and there will be no reusability.]  I'm aware that all these
concerns are catered for by HyTime, but since I am ignorant of HyTime,
answers which refer to that won't be much use to me - ideally they should
be in the context of the current spec.

Thus I assume we can transmit structures like DAGs, linked lists, relational
tables, etc. by the use of XML-LINK without being concerned how they
are going to behave.  At this stage I'd like simply to address structure.

SIMPLE
The simplest link is XML-LINK="SIMPLE" and is an analogue of HTML's <A>
or <IMG>.  My view of it is exemplified by this fictitious XML
document:

<P>This is <A HREF="#foo" ID="A">resource A</A> which points to
<FOO ID="foo">the foo bird</FOO> (see picture 
<IMG HREF="foo.gif" TITLE="foo bird" ACTUATE="AUTO" SHOW="EMBED" ID="gif">)
</P>

Here there are two links, both being unidirectional.  I understand the the 
ends of the first link are the 'point' described by 'ID=A', and the point
described by ID=foo (though this is still being discussed).  If this is true,
then in a **tree-based** tool like JUMBO the ends of the link correspond
to nodes in the tree (labelled by ID=A and ID=foo).  The second link is harder
because the resource in foo.gif is not clear (perhaps it is the inode in
the UNIX system?).  

I have (I believe) implemented SIMPLE links in JUMBO.  Each Node has a method
isLink() which says whether it's the start of a SIMPLE link.  (I may have to
change this nomenclature when the other links become clearer.).  So, for
example, when process()ing a Node, JUMBO looks to see if it isLink() and if so
what does it point at (value of HREF).  It seems to work.

Note that in this model, the resource which is pointed to (ID=foo, or foo.gif)
is not required by XML-LINK to know anything about the link.  I asumme it could
be argued both ways that the pointedAt should/should_not know what is 
pointing at it.  [SHOW and ACTUATE are deliberatly not discussed, although I
think they are straightforward (at least compared to EXTENDED).]


EXTENDED

EXTENDED is a container for an indefinite number of LOCATOR links.  [LOCATOR
has exactly the same syntax as SIMPLE but has presumably different
semanttics.]  EXTENDED does not by itself define a resource and is normally
remote from the resources.  

I can see how a bi-directional link might be constructed from EXTENDED 
[It's other multiplicities I don't feel so happy with.]  Does this 
example capture it?  

<P> Friends, Romans, Countrymen, <WORD ID="W1">lend</WORD> me your 
<WORD ID="W2">ears</WORD></P>.
...
<ANNOTATION XML-LINK="EXTENDED" ID="link1">
<POINTER XML-LINK="LOCATOR" HREF="#W1" ROLE="verb">
<POINTER XML-LINK="LOCATOR" HREF="#W2" ROLE="noun">
</ANNOTATION>
...
We therefore have a bidirectional link between the verb and the noun, so
that each of them can locate the other.  Therefore, in JUMBO, there
has to be a pointer which is available to each Node.  My temptation would be
for each node to carry a hashtable of links to other nodes so that (say)
when W1 was asked what it linked to it would come up with a list of the
Nodes at the other end of its links.  W2 would be such a node.  On the other
hand it might point to the LINK (i.e. link1, and it might be clear from the
'contents' of link1, what the other end was.  Is this too restricted?

I am not clear how this extends to 'multidirectional links'  Here is a typical
problem.

to <WORD ID="W3">bear</WORD> the <WORD ID="W4"> slings</WORD> and 
<WORD ID="W5">arrows</WORD> of
...
<ANNOTATION XML-LINK="EXTENDED" ID="link2">
<POINTER XML-LINK="LOCATOR" HREF="#W3" ROLE="verb">
<POINTER XML-LINK="LOCATOR" HREF="#W4" ROLE="noun">
<POINTER XML-LINK="LOCATOR" HREF="#W5" ROLE="noun">
</ANNOTATION>
...
Here I want to indicate that the verb 'bear' links to two nouns at the
same time and that each noun points to 'bear'.  But it isn't obvious that
this is the case (unless perhaps ROLE is used for that, and that doesn't
seem general).  The topology can be seen as a multidirectional link, with
a single 'end' and a double 'end'  (W3<-->(W4,W5)).  Alternatively it can 
be seen as two bidirectional links grouped together )(W3<-->W4),(W3<-->W5)).  
In either case I don't think I have captured this sufficiently well that it 
is capable of being automatically or semi-automatically processed.

Guidance would be gratefully received, particularly if it makes it clear 
whether there is a generic way of supporting this in code.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From davidsch at microsoft.com  Sat May 31 03:16:57 1997
From: davidsch at microsoft.com (David Schach)
Date: Mon Jun  7 16:57:52 2004
Subject: XML Spec Questions
Message-ID: <011290D45A8ACF119B8B00805FD471D6032C449D@RED-24-MSG.dns.microsoft.com>

In section 4.5 of the latest XML spec it says that if the predefined
entities are declared they must be declared as follows:

<!ENTITY lt        "<">
<!ENTITY gt       ">">
<!ENTITY amp    "&">
<!ENTITY apos   "'">
<!ENTITY quote  '"'>

Is the definition for amp valid XML?  The definition of EntityValue is
given is section 1.5 as:

EntityValue := '"' ([^%&] | PEReference | Reference)* '"'
	| "'" ([^%&] | PEReference | Reference)* "'"

This indicates that "&" is not a valid entity value? 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Sat May 31 05:31:57 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:57:52 2004
Subject: XML Spec Questions
Message-ID: <3.0.32.19970530202559.00a3cdb0@pop.intergate.bc.ca>

At 06:16 PM 5/30/97 -0700, David Schach wrote:
>Is the definition for amp valid XML?  The definition of EntityValue is
>given is section 1.5 as:

No.  It's a bug in the spec.  On the list to fix. -T.	

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jduggan at magma.ca  Sat May 31 06:39:19 1997
From: jduggan at magma.ca (Josh Duggan)
Date: Mon Jun  7 16:57:53 2004
Subject: XML Spec Questions
In-Reply-To: <3.0.32.19970530202559.00a3cdb0@pop.intergate.bc.ca>
Message-ID: <3.0.1.32.19970531003840.00692dbc@mail.magma.ca>

Hi All,

As long as we're pondering the spec; why does ElementDecl's Seq have ", "
as a seperating String. Is this a typo in the spec or do we need to inforce
a space after ','?

Best regards.


Josh Duggan           | Gralen Digitext Inc.
jduggan@magma.ca      | josh@gralen.com
www.magma.ca/~jduggan | www.gralen.com

"Work damn you or I'll beat you
 with your own toner cartridge!" - High
                                   Commander

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From phj at teleport.com  Sat May 31 08:22:21 1997
From: phj at teleport.com (P. Ju)
Date: Mon Jun  7 16:57:53 2004
Subject: XML Spec - timeline?
In-Reply-To: <3.0.32.19970530202559.00a3cdb0@pop.intergate.bc.ca>
Message-ID: <Pine.GSO.3.96.970530231948.29386K-100000@linda.teleport.com>


Hi all.

I am new to this list and am starting work on an XML book.  I've checked
out the XML pages at W3C but have so far been unable to find a definitive
schedule for the first release (not draft) of the XML spec, the XML-link
spec, and the stylesheet standards.

Can you help with this?

Thank you.


Patricia Ju
phj@teleport.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)