From Jon.Bosak at eng.Sun.COM  Sat Aug  2 07:08:47 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:11 2004
Subject: XML Dev Day schedule
Message-ID: <199708020506.WAA20598@boethius.eng.sun.com>

A fine assortment of technical presentations is in store for
participants in XML Developers Day (Le Centre Sheraton Hotel,
Montreal, Thursday, August 21).  In fact, the "day" has had to be
extended into the evening to accommodate a wealth of reports from
early implementors of the new Web technology.  This is going to be a
can't-miss event for anyone hoping to play a significant role in the
coming revolution.

Registration for XML Developers Day can be made through the page for
the 4th International HyTime Conference:

   http://www.gca.org/conf/hytime/hytime97.htm

Participants new to XML should note that in addition to the many
interesting presentations scheduled for the HyTime Conference (August
19-20), a tutorial on XML will be given on Monday, August 18 in the
same location.

Jon Bosak
Dev Day Chair

=========================================================
PRELIMINARY SCHEDULE: XML DEVELOPERS DAY, AUGUST 21, 1997
=========================================================

9:00-9:05 Jon Bosak, Sun Microsystems
   Welcome

9:05-9:30 David Megginson, Microstar
   Java Beans and Architectural Forms

9:30-10:00 Lloyd Harding, Information Automation Assembly
   The Kona Proposal for Electronic Health Care Records

10:00-10:30 Henry Thompson, University of Edinburgh
   A Motivation for the Schema Component of XML-Data

10:30-11:00 ------------------------------ BREAK

11:00-11:30 Daniel Rivers-Moore, RivCom
   XML in the Delivery of Corporate Information

11:30-12:00 Patrick Gannon, CommerceNet
   XML in Component-based Commerce

12:00-1:30 ------------------------------ LUNCH

1:30-2:00 John Tigue, Datachannel
   XAPI-J in Theory and Practice

2:00-2:30 Jeffrey Olson, School of EECS, Washington State University
   Conceptual Knowledge Markup Language, an XML Application

2:30-3:00 Henry Thompson, University of Edinburgh
   The Win95/NT Version of LT XML

3:00-3:30 ------------------------------ BREAK

3:30-4:00 Paul Trevithick, Bitstream
   Highly Designed Pages and Cross-Media Authoring with XML

4:00-4:30 Sarah Slocombe, Apropos Toy & Tool Development
   A Java-based QuarkXpress-to-XML Converter

4:30-5:00 David Slocombe and Rajiv Thanawala, Tata Infotech
   A Visual Recognition Approach to Legacy Document Conversion

5:00-5:30 ------------------------------ BREAK

5:30-6:00 Murray Maloney, Grif
   XML Editing: Well-formed Documents, CSS, and Namespaces

6:00-6:30 Paul Grosso, ArborText
   Some Ideas for XML Editing Interfaces

6:30-7:00 Jonathan Robie, POET Software
   An XML Document Component Database

7:00-7:30 Jeff Eby, Chrystal Software
   XML and a Generic Repository Architecture

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From neil at bradley.co.uk  Sat Aug  2 10:38:48 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:11 2004
Subject: Specification Questions
Message-ID: <199708020838.JAA11135@andromeda.ndirect.co.uk>


Thanks for the feedback, it was very helpful. However, I STILL do not
understand the need for the brackets in the latter half of Mixed:

> <!ELEMENT FOO %-.O; (#PCDATA)>
>  
> > The second line of the rule for [50]Mixed is:
> > 
> >    |  '(' S? %( '#PCDATA' ) S? ')'
> > 
> > I cannot understand the purpose of the inner brackets in this part
> > of the rule.
> 
> I believe it is to allow parameter entity replacement at that spot:
> 
> <!ENTITY % foobar (#PCDATA)>
> <!ELEMENT FOO (%foobar;)>

I understand the explanation, but the first half of the same rule is
as follows:

  '('  S?  %( %'#PCDATA' ( ..........

If   %'#PCDATA'  can appear here, why can't the second part of the
rule be similarly formulated:

  |   '('  S?  % '#PCDATA'   S?  ')'

Am I wrong in thinking this would allow a content of " ( %xyz; ) "?

> > There is also little written about interpretation of line-ending
> > codes. Although the standard states that white space and
> > line-ending codes are ignored in element content, nothing is said
> > regarding the age old problem of line-ending codes in mixed
> > content. 
> 
> The spec makes no special provision for whitespace at the beginning
> and end of elements. I believe that this is intended to be one of
> its simplifications over "regular" SGML. This seeming
> incompatibility is mitigated by an an SGML TC which will allow XML
> to remain compatible with (post-TC) SGML.
> 
>  Paul Prescod

Is it up to the application to decide what to do with any leading line
ending code in these positions then?

I am pleased to be rid of the 'record' concept (using RS and RE)
defined for SGML, particularly as I have tended to use Mac and UNIX
systems which use a single character to end a line (albeit different
ones!). However, I still think there is too little information on the
effect of line ending codes in mixed content. Obviously the safe thing
to do is to make the content of all elements with a mixed content
model fit on a single line, as in:

<p>This is a <b>long</b> paragraph.........................</p>

But with large text blocks, created using text editors, people will
continue to use line ending codes to make it readable on-screen.
Normally, a break between words would be interpreted as a space when
the block is paginated:

<p>This is a <b>long</b> paragraph that is broken over two
lines, with an implied space between 'two' and 'lines'.</p>

Yet what happens when a comment or processing instruction
appears on its own line?

<p>This is a long paragraph that is broken over two
<!-- comment -->
lines, with an implied space between 'two' and 'lines'.</p>

Is this interpreted as "two <!-- comment --> lines...", which reduces
to "two   lines"?


Neil.


-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat Aug  2 11:56:33 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:12 2004
Subject: Specification Questions
Message-ID: <9075@ursus.demon.co.uk>

In message <199708020838.JAA11135@andromeda.ndirect.co.uk> "Neil Bradley" writes:
[...]
[Paul Prescod]
> > The spec makes no special provision for whitespace at the beginning
> > and end of elements. I believe that this is intended to be one of
> > its simplifications over "regular" SGML. This seeming
> > incompatibility is mitigated by an an SGML TC which will allow XML
> > to remain compatible with (post-TC) SGML.

The spec is consistent over this, I think, and says that all characters that 
are not markup should be passed to the application.  This includes whitespace.
My personal view is that without some central guidance at least, the
XML treatment of whitespace will cause problems and incompatibility for
two groups of people:
	- those who are familiar with SGML
	- those who are not familiar with SGML.

The first group are accustomed to SGML parsers (primarily James Clark's) 
carrying out consistent operations on whitespace.  This includes:
	- removing line-ends immediately after and before markup
	- translating markup into a small number of platform-independent codes
		(e.g. ' ' and '\n').

The second group will be familiar with HTML where all whitespace is normalised
according to various rules of varying consistency between useragents/browsers.
Apart from characters within <PRE> and related markup, all whitespace is 
normalised to single spaces, which and line-ends are inserted according to
the user-agent software, not the document's content. Treatment of 'special'
characters (e.g. &nbsp; &#32; and other escaped characters or entities) is
probably inconsistent.  However, in general, whitespace is not a current 
concern of the second group.

***Both groups are in for a serious problem with XML unless there is some 
central guidance.  Otherwise we are at the mercy of any software implementor.
***

<QUESTION>
What whitespace characters can be passed to the application? Regardless of 
what is done with it, is CR+LF treated in the same way as LF or CR alone
in a document?  
</QUESTION>

If not, we shall appear to be in for variations according to what platforms 
the document is created on.  It will be no use telling people that this is 
what the spec says - I had always assumed that one of the attractions of
SGML was that it removed platform-dependent documents.  But reading 
XML-lang [2] suggests that CR and CR+LF produce different results.

The result of parsing, therefore, passes original whitespace to the 
application.  Thus:

<P>two  spaces</P>

and

<P>two spaces</P>

are different documents.

So are:

<P>no line feeds</P>

and

<P>
no line feeds
</P>

The first will confuse anyone accustomed to HTML only.  The second will also
confuse them, and in addition will confuse some current users of SGML.

> > 
> >  Paul Prescod
> 
> Is it up to the application to decide what to do with any leading line
> ending code in these positions then?
> 
> I am pleased to be rid of the 'record' concept (using RS and RE)
> defined for SGML, particularly as I have tended to use Mac and UNIX
> systems which use a single character to end a line (albeit different
> ones!). However, I still think there is too little information on the
> effect of line ending codes in mixed content. Obviously the safe thing
> to do is to make the content of all elements with a mixed content
> model fit on a single line, as in:
> 
> <p>This is a <b>long</b> paragraph.........................</p>
> 
> But with large text blocks, created using text editors, people will
> continue to use line ending codes to make it readable on-screen.
> Normally, a break between words would be interpreted as a space when
> the block is paginated:
> 
> <p>This is a <b>long</b> paragraph that is broken over two
> lines, with an implied space between 'two' and 'lines'.</p>

Yes.  Most people will want to work this way.  Very long lines are a menace
for many types of software.  We must assume (and in many cases encourage)
people will read and even edit XML documents with non-XML tools.

> 
> Yet what happens when a comment or processing instruction
> appears on its own line?
> 
> <p>This is a long paragraph that is broken over two
> <!-- comment -->
> lines, with an implied space between 'two' and 'lines'.</p>
> 
> Is this interpreted as "two <!-- comment --> lines...", which reduces
> to "two   lines"?

No.  it reduces (I think) to:

"...two

lines..."

If there is one single 'obvious' issue which will prevent the take-up of XML 
by 'ordinary' people (like myself) it is whitespace.  The present position
on whitespace is:
	- the rules are clear but not prescriptive
	- the rules are non-intuitive to most people
	- the rules allow many different ways of processing a given document
	- the role of whitespace in a given document will depend on the
		software used to process it

The philosophy of the XML-lang authors is consistently:
	- whitespace is a problem for the application, not the spec.
	- there is no generic way of treating whitespace
[I should make it clear that this isssue has been debated at great length,
and that the present position is the considered opinion of many experts.
I accept it, although I think it will be difficult to work with in practice.]

Without consistent treatment, a document author has to ask

	'which application is going to process my document?'

It means, for example, that the way that whitespace is treated in MathML 
may be different from that in CML and FooML and ... It effectively
destroys the possibility of (sub)document re-use, without a generally agreed
convention.

	I know that XML-lang authors read this group and may therefore
take some of these points on board.

	P.
> 
> 
> Neil.
> 
> 
> -----------------------------------------------
> Neil Bradley - Author of The Concise SGML Companion.
> neil@bradley.co.uk
> www.bradley.co.uk
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Sat Aug  2 21:03:46 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:12 2004
Subject: Specification Questions
Message-ID: <3.0.32.19970802115011.0089c5e0@pop.intergate.bc.ca>

At 09:51 AM 02/08/97 GMT, Peter Murray-Rust wrote:
><QUESTION>
>What whitespace characters can be passed to the application? Regardless of 
>what is done with it, is CR+LF treated in the same way as LF or CR alone
>in a document?  
></QUESTION>

All bytes that are not markup are data, and passed to the application.

Yes, this will be surprising to people who are used to HTML.  Too bad -
HTML's behavior is unacceptable for many classes of applications.  It
would be surprising to those who understand the 8879 rules, but 
experience shows that this group includes only about a dozen people,
and they disagree.  The rule given above has the virtue that it is
short, simple, and easily understood by everyone.  We spent a lot
of time on this, and it's the only sane way to go. -Tim


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From john at datachannel.com  Sun Aug  3 06:21:34 1997
From: john at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:12 2004
Subject: Xapi-J: an architectural detail (long)
Message-ID: <33E407E3.37404793@datachannel.com>

This note explains some of the internal implementation details of Xapi-J

compliant processors. If all you want to do is use an Xapi-J processor,
you do not need to concern yourself with these details. This note is
intended for people who are actually writing Xapi-J processors.

One of the nice features of Java is the clear distinction between
inheritance and interfaces. Xapi-J tries to leverage Java interfaces to
provide
processor users a simple object model and processor implementors wide
latitude
in regards to the processor internals..

Folks don't really have a problem grasping the XML object model:

To get a new XML processor object instance:
     xml.XMLProcessor xmler = new xml.XMLProcessor();

To have a processor read a document:
     xml.IDocument aDocument = xmler.readXML( someInfoSource );

To get the root of a document:
    xml.IElement anElement = aDocument.getRoot();

To get an element's attributes:
     java.util.Enumeration someAttributes = anElement.getAttributes();

That's easy to understand. It is a very simple object model
which can be mapped onto many of the current XML processors without
requiring major rewrites.

The greatest confusion centers around the mechanism used to hide the
specifics of the underlying implementation of the XML processor. This is

the only part of Xapi-J which actually involves real classes as opposed
to simply interfaces. Xapi-J does not contain an XML processor, it
simply says what one could look like. It is up to others to actually
supply the working code which is accessed through the Xapi-J interfaces.

One of the goals of Xapi-J is to create an architecture which (although
powerful and flexible) makes simple things simple. Navigating an XML
document object should be simple and as can be seen from the above code
fragments, it is. And getting an XML processor should be simple. That it

is. If a JVM comes with an installed XML processor, in the interests of
making things easy for developers, that processor should be used by
default. So a developer could simple do a "new xml.XMLProcessor();" and
expect that an XMLProcessor will be instantiated and usable. So if, say,

Microsoft wished to package their JVM with an XMLProcessor, they could
tweek the default constructor for xml.XMLProcessor to where it would
instantiate a com.ms.xml.Parser by default.

(I have tested that Xapi-J can be implemented on top of msxml. I have
the code on my hard drive. If anyone is interested drop me an email and
I'll give you the classes. With this interface adaptor, a developer
could write Java applets which save some download time by using the MS
parser which will be on the IE4 client and only downloading the light
weight adaptor. I would not suggest this. As I have mentioned in an
earlier posting I feel that the msxml object model is serious flawed.
Correcting for it required some non-optimal efficiency code.)

A good architecture makes simple things simple but it doesn't limit a
developer. Say a developer wanted to use an XML processor which was
tweeked for parsing MathML documents (call it MathMLProcessor). Perhaps
a MathMLProcessor could only understand that particular XML application
but via this specialization was able to obtain greater performance than
a general purpose XML processor. It would be great if the developer
could specify that when a "new xml.XMLProcessor()" call occurs a
MathMLProcessor should be instantiated. Xapi-J allows for this via the
following method in the xml.XMLProcessor class:

public static synchronized void setIXMLProcessorFactory(
         IXMLProcessorFactory factorySettee ) throws XMLException

The method signature is that way because:
public:
      accessible from other packages
static:
      applies the the class in general not a particular instance
synchronized:
      thread-safe access to a static method is usually advisable
void:
     standard JavaBean accessor method signature design pattern is:
           TypeOfX getX() AND void setX( TypeOfX xToBe )
setIXMLProcessorFactory:
     this method sets the class's IXMLProcessorFactory
IXMLProcessorFactory:
     an Xapi-J interface for objects which can be asked to create
objects
     which implement the interface IXMLProcessor. During "new
     xml.XMLProcessor()" the factory will be asked to instantiation an
     object which implements IXMLProcessor
factorySettee:
     The object which is to be assigned as the factory
throws XMLException:
     a general XML exception object; might be thrown if the
     factory had already been set (a security concern expressed in the
     regular JDK fashion)

So the developer could do the following:
    XMLProcessor.setIXMLProcessorFactory( new MathMLProcFactory() );
    XMLProcessor xmler = new XMLProcessor();

Here the developer using an Xapi-J compliant processor needs to do just
one special line of code (tell the XMLProcessor class that it should
ask the specified MathMLProcFactory object to create IXMLProcessor's).
After that all the implementation specific details of the
MathMLProcessor are hidden behind the Xapi-J interfaces i.e. just do a
"new XMLProcessor()" and access the document through Xapi-J interfaces.

This is possible because even though the class XMLProcessor is the only
real class in Xapi-J, it is essentially hollow. A XMLProcesssor instance

is not really an XML processor. Xapi-J does not include an XML
processor, just the interface to one. All an XMLProcessor does is
act as a proxy to an object which implements IXMLProcessor. The
IXMLProcessor object is instantiated by the above mentioned factory. So
in the source code for the XMLProcessor class we see something like
the following code fragments:

//  Class static factory code:
private static IXMLProcessorFactory processorFactory;

public static synchronized void setIXMLProcessorFactory(
       IXMLProcessorFactory factorySettee ) throws XMLException
{
processorFactory = factorySettee;
}

//  Instance constructor code:
private IXMLProcessor implementation;

public XMLProcessor ()
{
this.implementation = processorFactory.createIXMLProcessor();
}

//  instance action code:
public IDocument readXML( Object xmlSource ) throws XMLException
{
return implementation.readXML( xmlSource );
}

The execution sequence looks like:
1. The factory is set via XMLProcessor.setIXMLProcessorFactory().
2. Later, a "new XMLProcessor()" happens.
3. In the constructor the factory is asked to return an IXMLProcessor.
4. The IXMLProcessor object is assigned to the field "implementation".
5. Later, a "readXML()" call happens.
6. In readXML(), the XMLProcessor object, acting as a proxy, passes
          the request onto its IXMLProcessor and then,
7. The XMLProcessor object returns whatever is returned to it from its
          IXMLProcessor. I.e. class IXMLProcessor is the real worker.

So the phrase "Xapi-J contains no XML processor" could more precisely be

stated as: Xapi-J does contain a class XMLProcessor but it does not
contain an implementation of the interface IXMLProcessor which is the
real worker/processor in the Xapi-J architecture.

The above is a convoluted dance but to the developer who is simply using

an Xapi-J compliant XML processor it looks really simple on the outside.

(For a very similar "design patter" see java.net.Socket et al.) And only
one
API has to be learned to work with any Xapi-J compliant processor.


--
John Tigue
Sr. Software Architect
DataChannel
http://www.datachannel.com
jtigue@datachannel.com
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 263 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970803/36d826f5/vcard.vcf
From Peter at ursus.demon.co.uk  Sun Aug  3 15:58:14 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:12 2004
Subject: Specification Questions
Message-ID: <9091@ursus.demon.co.uk>


In message <199708020838.JAA11135@andromeda.ndirect.co.uk> "Neil Bradley" writes:
[...]

> <p>This is a long paragraph that is broken over two
> <!-- comment -->
> lines, with an implied space between 'two' and 'lines'.</p>
> 
> Is this interpreted as "two <!-- comment --> lines...", which reduces
> to "two   lines"?

Some additional - hopefully constructive - thoughts on whitespace.

The XML-lang spec does not ( and I suspect will not) give detailed guidance
on how whitespace will be managed.  My impression is that it is up to 
implementers and/or groups like this to come up with particular solutions.
My worry is that these will be inconsistent and not inter-operable.

***
Therefore I propose that those on XML-DEV who care about this problem come
up with some guidelines for implementers. 
***

XML does NOT treat whitespace like SGML and does NOT behave like HTML 
(although it can be configured to do so).  As far as I see them, the rules
are:

'All characters that are not markup are passed to the application'.  (This
is independent of any value of XML-SPACE (see below), processing instructions,
stylesheets, etc.)  These characters include HT, CR, LF, SP, and probably
a number of other Unicode 'whitespace' characters.  What the application
does with them is *undefined* in XML-lang.

Note that this means that CR and LF are passed as separate characters. No
normalisation takes place.  Therefore

Line one\n\rline two

is different from

Line one\nline two

even if they are visually similar on various text editors/displays, etc.
(My impression was that SGML normalised these two strings to the same 
ESIS output - is that right?).

This means that the author/processor 'contract' has to be aware of this.

Note also that *all* line-ends are passed (even immediately before/after
markup) unlike SGML.  Therefore:
<FOO>
line one
</FOO>

and
<FOO>line one</FOO>
are different.

Note also that:
<FOO><BAR>baz</BAR></FOO>
is different from
<FOO>
<BAR>baz</BAR>
</FOO>

The latter contains two pseudo-elements which contain only whitespace
(line-end characters) and FOO therefore has three children.

[Note that to make documents readable, the following trick can be used:
<FOO
><BAR
>baz</BAR
></FOO
>
since whitespace within the tag is ignored.  I do not think newcomers will
adopt this easily, and I suspect it can lead to errors in document editing.]

*** In some cases the document author and the application author are both
aware of this problem and so the whitespace characters inserted by the
author will be processed in the way that they expect.  However, in most cases
I suspect this will NOT be true and that authors will inadvertently create
documents that are processed differently ***

XML provides an attribute XML-SPACE (local to an element BUT inherited by
its children) which can have three values:
	- #IMPLIED (no signals about whitespace handling)
	- PRESERVE (applications preserve all the whitespace)
	- DEFAULT (the *application's* default white-space processing modes
		are acceptable fro this element).

PRESERVE seems clear.  All whitespace is passed to the application.  The 
others seem to be dangerous unless there are some general conventions. 

[Note also that XML parsers or processors have to ensure that children
inherit the XML-SPACE attributes of their parents.  Where does this get
done? In the parser? (It's part of XML-lang), in the processor - in which
case there is ample scope for inconsistent treatment...

Inheritance is already required in two places - XML-SPACE and XML-ATTRIBUTES
(XML-link). This is a generic mechanism and presumably should be implemented
in some package independenetly of the application.  Comments?]

If possible, we should propose a *general* default mechanism for whitespace
handling for XML-SPACE="DEFAULT".  If everyone adopts this, it will greatly
reduce this problem.  Is this a reasonable strategy?

If so, we can propose that the DEFAULT mode for any whitespace processing is
something along the lines (similar to HTML?).  Within an element with
XML-SPACE="DEFAULT"

All whitespace sequences are mapped into a single space character.
All whitespace pseudo-elements are ignored (i.e. whitespace between markup)
All leading and trailing whitespace in #PCDATA is ignored.

Does this cover everything? Is it workable?

Example:
<FOO XML-SPACE="DEFAULT">
<BAR> this
<!-- comment -->
is<!-- comment -->a 
bar
</BAR></FOO>

folds to:
<FOO XML-SPACE="DEFAULT"><BAR>this is a bar</BAR></FOO>

[Note that the Xpointer STRING syntax and the use of pseudo-elements
works on the *raw* data  (i.e. all non-markup characters).  Therefore the
application has to have access to this - it has to maintain a PRESERVEd
version of the document as well as (say) displaying or transforming a
DEFAULTed document.]

I think it's important to address this, since otherwise I predict we shall
have considerable confusion, especially when implementors of authoring or
processing software have not thought this through completely.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Aug  3 15:58:25 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:12 2004
Subject: Xapi-J: an architectural detail (long)
Message-ID: <9092@ursus.demon.co.uk>

John,
Thanks very much for this - including keeping the momentum of this effort.
I encourage other memebrs of this list to react to this posting - John
has obviously worked very hard at this.

In message <33E407E3.37404793@datachannel.com> john@datachannel.com (John Tigue) writes:
[...]
> 
> Folks don't really have a problem grasping the XML object model:
> 
> To get a new XML processor object instance:
>      xml.XMLProcessor xmler = new xml.XMLProcessor();
> 
> To have a processor read a document:
>      xml.IDocument aDocument = xmler.readXML( someInfoSource );
> 
> To get the root of a document:
>     xml.IElement anElement = aDocument.getRoot();
> 
> To get an element's attributes:
>      java.util.Enumeration someAttributes = anElement.getAttributes();
> 
> That's easy to understand. It is a very simple object model
> which can be mapped onto many of the current XML processors without
> requiring major rewrites.

I follow all this.  Can we also go one step further and say how we get
the children of an Element.  I am assuming also that (say) the DTD is 
not a child of root in this model - do you have proposals for all this?
If so, please post them :-) so we can get it finished - we keep going round
and round on this ...

> 
[...]
> 
> (I have tested that Xapi-J can be implemented on top of msxml. I have

Excellent!  What are your thoughts about NXP and Lark?

[...]
> 
> The execution sequence looks like:
> 1. The factory is set via XMLProcessor.setIXMLProcessorFactory().
> 2. Later, a "new XMLProcessor()" happens.
> 3. In the constructor the factory is asked to return an IXMLProcessor.
> 4. The IXMLProcessor object is assigned to the field "implementation".
> 5. Later, a "readXML()" call happens.
> 6. In readXML(), the XMLProcessor object, acting as a proxy, passes
>           the request onto its IXMLProcessor and then,
> 7. The XMLProcessor object returns whatever is returned to it from its
>           IXMLProcessor. I.e. class IXMLProcessor is the real worker.
> 
> So the phrase "Xapi-J contains no XML processor" could more precisely be
> 
> stated as: Xapi-J does contain a class XMLProcessor but it does not
> contain an implementation of the interface IXMLProcessor which is the
> real worker/processor in the Xapi-J architecture.
> 
> The above is a convoluted dance but to the developer who is simply using
> 
> an Xapi-J compliant XML processor it looks really simple on the outside.
> 
> (For a very similar "design patter" see java.net.Socket et al.) And only
> one
> API has to be learned to work with any Xapi-J compliant processor.

I think I have followed John's logic and proposal, and suggest that we take
this as a concrete proposal.  Since it's only likely to be used by a smallish
number of people, its apparent complexity is acceptable.  For example, JUMBO
is able to use more than one parser, but I have to delve into each one to
see how to extract the correct aprts.  This would make it easier overall.

Assuming we accept this I'd like us also to tackle the question of Nodes, 
Elements, etc. Until this is done it's difficult to build application 
software with interchangeable parts. For example, there is a lot of generic
stuff (see my posting on whitespace) that an XML application (?processor)
has to implement, and hopefully we can isolate and standardise on that.

Once again, thanks John.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From john at datachannel.com  Mon Aug  4 00:11:29 1997
From: john at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:12 2004
Subject: Xapi-J: an architectural detail
References: <9092@ursus.demon.co.uk>
Message-ID: <33E502A7.80A66A68@datachannel.com>

Peter Murray-Rust wrote:

> <snip/>
> > To get an element's attributes:
> >      java.util.Enumeration someAttributes =
> anElement.getAttributes();
> ><snip/>
> I follow all this.  Can we also go one step further and say how we get
>
> the children of an Element.

To get an element's children:     java.util.Enumeration children =
anElement.getContents();

This method returns an Enumeration, each object of which implements
IContent. The below paragraphs explain IContent et al.

An XML document can be represented as a tree. In an XML document object
model there are things which are containers (e.g. a document is a
container and so is an element) and also things which are the content of
a container (e.g. a chunk of text is a content or even a element can be,
in the case of one element within another). To model these there are the
IContainer and IContent interfaces. The full source follows:

public interface IContainer
     {
     public Enumeration getContents();
     public void insertContent( IContent aContent, IContent
preceedingContent );
     public void appendContent( IContent aContent );
     public void removeContent( IContent aContent );
     }

public interface IContent
     {
     public void setParent( IContainer aContainer );
     public IContainer getParent();
     public String getData();
     }

These interfaces only express the methods for navigating a tree. A
particular class of objects would need to have some more methods to be
interesting. For example, the interface for an element is IElement. The
full source follows:

public interface IElement extends IContent, IContainer
    {
     public String getType();
     public void setType( String aType );
     public void addAttribute( String name, String value );
     public void removeAttribute( String name );
     public IAttribute getAttribute( String attributeName );
     public java.util.Enumeration getAttributes();
     }

The above states that an IElement can be a container and/or a content
and also has some other methods particular to being an element. So
although IElement does not directly have a method called getContents(),
it gets the method from its superinterface IContainer.

(Note that the Xapi-J method getType() follows the terminology of
XML-LANG and as such it implies completely different semantics than
com.ms.xml.Element.getType(). Xapi-J's getType() returns a String which
is the "Name" from production [33] of the spec. For example, in the
following:
<color>red</color>
The spec clearly says "The Name in the start-and end-tags gives the
element's type" so for the above example in Xapi-J getType() would
return a String with the value "color" not an int with the value 1 (i.e.
MS's ELEMENT constant). Microsoft has chosen an independent model in
which most objects in a document are com.ms.xml.Element and the
particular flavor of "Element" is determined through the getType()
method. In that model all of the following are "Element" types:
DOCUMENT, ELEMENT, PCDATA, PI, BETA, COMMENT, and CDATA.).

> <snip/>

> > (I have tested that Xapi-J can be implemented on top of msxml. I
> have
>
> Excellent!  What are your thoughts about NXP and Lark?
>

Lark maps very easily to Xapi-J. Xapi-J was designed by taking all the
best ideas from the existing processors so the mappings are
straight-forward. NXP is pretty much the standard when it comes to ESIS
output so it defines that part of Xapi-J making the mapping essentially
direct. The only new part is the stuff mentioned in the posting which
started this thread: how does a developer instantiate a processor
through the Xapi-J interfaces. After that it's the regular old NXP
stuff.

Note that since Xapi-J is pretty much just a bunch of interfaces, this
work can easily be fit into a full grove model. The objects in the grove
could implement their grove interfaces and if desirable also implement
the earlier Xapi-J interfaces. A full grove model is being work on by
others so making Xapi-J a full grove model would be a duplication of
effort. The main goal of Xapi-J is simply to make things easier for
developers using the current crop of processors.

<snip/>
--
John Tigue
Sr. Software Architect
DataChannel
http://www.datachannel.com
jtigue@datachannel.com
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 263 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970803/be6c700d/vcard.vcf
From Peter at ursus.demon.co.uk  Mon Aug  4 09:01:09 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:12 2004
Subject: Xapi-J: an architectural detail
Message-ID: <9106@ursus.demon.co.uk>

In message <33E502A7.80A66A68@datachannel.com> john@datachannel.com (John Tigue) writes:
[...]
> Lark maps very easily to Xapi-J. Xapi-J was designed by taking all the
> best ideas from the existing processors so the mappings are
> straight-forward. NXP is pretty much the standard when it comes to ESIS
> output so it defines that part of Xapi-J making the mapping essentially
> direct. The only new part is the stuff mentioned in the posting which
> started this thread: how does a developer instantiate a processor
> through the Xapi-J interfaces. After that it's the regular old NXP
> stuff.

Sounds good to me.  I am particulalry impressed by the fact that you can
make it work with the various parsers, even if they take different approaches
with different terms.

What is your timescale for putting it all together?  Are there any places where 
you need more feedback from the list?  FWIW it gets my vote :-)

	P.

 
-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From richard at light.demon.co.uk  Mon Aug  4 12:24:12 1997
From: richard at light.demon.co.uk (Richard Light)
Date: Mon Jun  7 16:58:12 2004
Subject: Xapi-J: an architectural detail
In-Reply-To: <33E502A7.80A66A68@datachannel.com>
Message-ID: <1r1HGKAKwa5zEwZY@light.demon.co.uk>

In message <33E502A7.80A66A68@datachannel.com>, John Tigue
<john@datachannel.com> writes
>
>An XML document can be represented as a tree. In an XML document object
>model there are things which are containers (e.g. a document is a
>container and so is an element) and also things which are the content of
>a container (e.g. a chunk of text is a content or even a element can be,
>in the case of one element within another). To model these there are the
>IContainer and IContent interfaces. The full source follows:
>
>public interface IContainer
>     {
>     public Enumeration getContents();
>     public void insertContent( IContent aContent, IContent
>preceedingContent );
>     public void appendContent( IContent aContent );
>     public void removeContent( IContent aContent );
>     }
>
>public interface IContent
>     {
>     public void setParent( IContainer aContainer );
>     public IContainer getParent();
>     public String getData();
>     }

These interfaces are mirrored in the SGML/XML Property Set.  In that,
everything is a 'node', each with its own name and a set of properties.
One of those properties is 'subnode' - having a subnode property makes a
node, de facto, into a Container in your terminology.  The complete XML
document can be represented as a 'grove' (tree structure) of these
nodes.

The parent-child relationship between elements of the XML document is
more specific than this.  The full grove includes things like the DTD
and processing instructions, which are nodes in the grove structure but
do not exhibit 'parent-child' relationships to anything else.

Nodes have some 'intrinsic properties', which apply whatever their
particular type might be.  (Again, this mirrors your thinking very
closely.)  These intrinsic properties are:

object Node
  property ClassNm  ; the name of the node's class
  property GrovRoot ; the root of the grove of which the node forms a
part
  property SunPNs   ; the names of all the subnode properties exhibited
by the node
  property AllPNs   ; the names of all the properties exhibited by the
node
  property ChildPN  ; the name of the children property, when this class
of node has children 
  property DataPN   : the data property name (i.e. 'char' or 'string'),
when this class of node contains data
  property DSepPN   ; the data separator property name
  property Parent   ; the node's parent
  property TreeRoot ; the root of the parent-children tree [not the same
as the 'grove root']
  property Origin   ; the node that that this node as one of its subnode
properties
  property OTSRelPN ; the origin-to-subnode relationship property name

I've given the full set of intrinsic node properties, really just to
point out that all of this modeling has already been done before.  Much
of it is too detailed (and perhaps one level too abstract) to apply to
Xapi-J.  However, I'm concerned that Xapi-J developers shouldn't just
ignore the SGML property set and invent their own version.

Expressing the only intrinsic property (parent) that is relevant to this
discussion leads to:

public interface XMLnode
        {
        public XMLnode parent();
        }

We could add in a couple of extra intrinsic properties, so you can get
to the grove root and its origin from any node:

public interface XMLnode
        {
        public XMLnode parent();
        public XMLnode grovroot();
        public XMLnode origin();
        }

I don't think we need separate IContainer and IContent interfaces -
what's wrong with just INode (or XMLnode, as I have it)?

>These interfaces only express the methods for navigating a tree. A
>particular class of objects would need to have some more methods to be
>interesting. For example, the interface for an element is IElement. The
>full source follows:
>
>public interface IElement extends IContent, IContainer
>    {
>     public String getType();
>     public void setType( String aType );
>     public void addAttribute( String name, String value );
>     public void removeAttribute( String name );
>     public IAttribute getAttribute( String attributeName );
>     public java.util.Enumeration getAttributes();
>     }
>
>The above states that an IElement can be a container and/or a content
>and also has some other methods particular to being an element. So
>although IElement does not directly have a method called getContents(),
>it gets the method from its superinterface IContainer.

We can do the same thing here:

public interface XMLelement extends XMLnode
        {
        public String gi();
        public void setType( String aType );
        public void addAttribute( String name, String value );
        public void removeAttribute( String name );
        public XMLattribute getAttribute( String attributeName );
        public XMLattlist atts();
        }

Notice that I've left the middle four declarations more or less
unchanged, for the following reason:  

There is definitely a useful distinction here, between those things
which are _properties_ of a node within an XML document, like the GI of
an element or its list of declared attributes, and _operations_ which
the API lets you carry out on that node.

The SGML/XML property set is entirely about the properties of an
existing instance.  It provides no framework or precedent for API
commands which _alter_ that instance, like SetType (which assigns or
changes the GI of an element).  There, we are rather more on our own!

I'm not sure if the Java API provides for a more elegant way of
specifying a property than the one I've dreamt up - if it does, we
should use it.

Hope this helps.

Richard Light
SGML and Museum Information Consultancy
richard@light.demon.co.uk
3 Midfields Walk 
Burgess Hill
West Sussex RH15 8JA
U.K.
tel. (44) 1444 232067

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From john at datachannel.com  Mon Aug  4 18:54:11 1997
From: john at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:12 2004
Subject: Xapi-J: an architectural detail
References: <1r1HGKAKwa5zEwZY@light.demon.co.uk>
Message-ID: <33E609C6.6EEE0CB9@datachannel.com>

Richard Light wrote:

> In message <33E502A7.80A66A68@datachannel.com>, John Tigue
> <john@datachannel.com> writes
> >
> >An XML document can be represented as a tree. In an XML document
> object
> >model there are things which are containers (e.g. a document is a
> >container and so is an element) and also things which are the content
> of
> >a container (e.g. a chunk of text is a content or even a element can
> be,
> >in the case of one element within another). To model these there are
> the
> >IContainer and IContent interfaces. The full source follows:
> >
> >public interface IContainer
> >     {
> >     public Enumeration getContents();
> >     public void insertContent( IContent aContent, IContent
> >preceedingContent );
> >     public void appendContent( IContent aContent );
> >     public void removeContent( IContent aContent );
> >     }
> >
> >public interface IContent
> >     {
> >     public void setParent( IContainer aContainer );
> >     public IContainer getParent();
> >     public String getData();
> >     }
>
> These interfaces are mirrored in the SGML/XML Property Set.  In that,
> everything is a 'node', each with its own name and a set of
> properties.
> One of those properties is 'subnode' - having a subnode property makes
> a
> node, de facto, into a Container in your terminology.  The complete
> XML
> document can be represented as a 'grove' (tree structure) of these
> nodes.
>

I agree that grove is the way to go. I'm just trying to get all the
current processors on the same track before we move towards the grove
work.

> The parent-child relationship between elements of the XML document is
> more specific than this.  The full grove includes things like the DTD
> and processing instructions, which are nodes in the grove structure
> but
> do not exhibit 'parent-child' relationships to anything else.
>

How will we represent the DTD in order to reflect the effects of the
Bray Namespace Proposal?

> Nodes have some 'intrinsic properties', which apply whatever their
> particular type might be.  (Again, this mirrors your thinking very
> closely.)  These intrinsic properties are:
>
> object Node
>   property ClassNm  ; the name of the node's class
>   property GrovRoot ; the root of the grove of which the node forms a
> part
>   property SunPNs   ; the names of all the subnode properties
> exhibited
> by the node
>   property AllPNs   ; the names of all the properties exhibited by the
>
> node
>   property ChildPN  ; the name of the children property, when this
> class
> of node has children
>   property DataPN   : the data property name (i.e. 'char' or
> 'string'),
> when this class of node contains data
>   property DSepPN   ; the data separator property name
>   property Parent   ; the node's parent
>   property TreeRoot ; the root of the parent-children tree [not the
> same
> as the 'grove root']
>   property Origin   ; the node that that this node as one of its
> subnode
> properties
>   property OTSRelPN ; the origin-to-subnode relationship property name
>
> I've given the full set of intrinsic node properties, really just to
> point out that all of this modeling has already been done before.
> Much
> of it is too detailed (and perhaps one level too abstract) to apply to
>
> Xapi-J.  However, I'm concerned that Xapi-J developers shouldn't just
> ignore the SGML property set and invent their own version.
>

Ignoring the SGML property set would be just plain stupid. I like to
drive cars not re-invent wheels.

> Expressing the only intrinsic property (parent) that is relevant to
> this
> discussion leads to:
>
> public interface XMLnode
>         {
>         public XMLnode parent();
>         }
>
> We could add in a couple of extra intrinsic properties, so you can get
>
> to the grove root and its origin from any node:
>
> public interface XMLnode
>         {
>         public XMLnode parent();
>         public XMLnode grovroot();
>         public XMLnode origin();
>         }
>

I absolutely agree that the Xapi-J interfaces are not done. I have tried
to bring the current processors together while mapping out the basics of
the object model. We will need to add more properties as you point out.
One thing I would like to see is that we return appropriate objects as
much as possible. One particular processor out there does a
getAttribute() where you pass in a String and get back a String. I think
an IAttribute should be returned. This way other convenience methods of
the returned class can be used. For example something like isPercent()
or isNumeric() for an attribute not to mention all the properties of say
a character.

> I don't think we need separate IContainer and IContent interfaces -
> what's wrong with just INode (or XMLnode, as I have it)?
>

We could do that. Or maybe both with something like the following:

public interface XMLNode extends IContainer, IContent
    {
    ...
    }

I went with IContainer and IContent because I can do more precise
polymorphic message handling such that the receiving method can make
more assumptions about what the passed object can do without casting to
the exact class. Casting in Java is a runtime cost (b/ of security) so
more expensive.

> >These interfaces only express the methods for navigating a tree. A
> >particular class of objects would need to have some more methods to
> be
> >interesting. For example, the interface for an element is IElement.
> The
> >full source follows:
> >
> >public interface IElement extends IContent, IContainer
> >    {
> >     public String getType();
> >     public void setType( String aType );
> >     public void addAttribute( String name, String value );
> >     public void removeAttribute( String name );
> >     public IAttribute getAttribute( String attributeName );
> >     public java.util.Enumeration getAttributes();
> >     }
> >
> >The above states that an IElement can be a container and/or a content
>
> >and also has some other methods particular to being an element. So
> >although IElement does not directly have a method called
> getContents(),
> >it gets the method from its superinterface IContainer.
>
> We can do the same thing here:
>
> public interface XMLelement extends XMLnode
>         {
>         public String gi();
>         public void setType( String aType );
>         public void addAttribute( String name, String value );
>         public void removeAttribute( String name );
>         public XMLattribute getAttribute( String attributeName );
>         public XMLattlist atts();
>         }
>

I generally argree. I went for getGI() and setGI() at one point but the
spec forced getType() and setType(). Plus I believe that the work we
produce here will filter down to folks who are far less preoccupided
with XML. For them the term "generic identifier" or even "gi" would be
less readily grasped than "type". Either way, by following the get/set
naming convention we map to JavaBeans. Slightly more wordy than X() and
setX() but the builder tools are geared for recognising getX() and
setX().

> Notice that I've left the middle four declarations more or less
> unchanged, for the following reason:
>
> There is definitely a useful distinction here, between those things
> which are _properties_ of a node within an XML document, like the GI
> of
> an element or its list of declared attributes, and _operations_ which
> the API lets you carry out on that node.
>
> The SGML/XML property set is entirely about the properties of an
> existing instance.  It provides no framework or precedent for API
> commands which _alter_ that instance, like SetType (which assigns or
> changes the GI of an element).  There, we are rather more on our own!
>

At first setType() might seem less than useful. And perhaps type should
be a parameter to the constructor and not modifiable (more on that
later). I got caught in a Java specific detail related to the
following:Class.forName("SomeClass").newInstance()
With this code Java objects can be instantiated from a String of the
class' name. That's handy for object serialization amongst other things;
for example, say you had a repository of classes for specific element
types and you want to instantiate one during a parse. The point is that
in Java newInstance() only works with the default constructor;
parameters cannot be passed in. So there is need for a seperate method
for setting the type of the element. If we wanted to make the type
immutable then perhaps we could specify that the member field "type" can
only be set once. This type of behavior shows up a lot in the JDK.
Inside the property setter the field is checked for null, if not then
produce an exception. Also in the JDK we see String and StringBuffer
where String is immutable and StringBuffer is where strings can be
dynamically built up. Perhaps something like that for Xapi-J

> I'm not sure if the Java API provides for a more elegant way of
> specifying a property than the one I've dreamt up - if it does, we
> should use it.
>

The only point I'm sure on is the getX() and setX() "design pattern".
Most Java devs casually consuming XML will use a JavaBean and we should
plan for that architecture.

> Hope this helps.
>

Deffinately. Thanks.

> Richard Light
> SGML and Museum Information Consultancy
> richard@light.demon.co.uk
> 3 Midfields Walk
> Burgess Hill
> West Sussex RH15 8JA
> U.K.
> tel. (44) 1444 232067
>
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


--
John Tigue
Sr. Software Architect
DataChannel
http://www.datachannel.com
jtigue@datachannel.com
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 263 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970804/bfb1fb6e/vcard.vcf
From andrewl at microsoft.com  Tue Aug  5 00:03:26 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:58:13 2004
Subject: Process: Subjects of Messages (was "A  question and a proposal")
Message-ID: <7BB61B44F197D011892800805FD4F7920133B7A8@RED-03-MSG.dns.microsoft.com>

It would be helpful if authors would give their messages titles that are
meaningful descriptions of the substantive contents of the message.
Something like "A question and a proposal" is cute, but useless for
filing.

--Andrew Layman
   AndrewL@microsoft.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From akirkpatrick at ims-global.com  Tue Aug  5 10:14:39 1997
From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com)
Date: Mon Jun  7 16:58:13 2004
Subject: Xapi-J: an architectural detail
Message-ID: <E0wvelF-0007bb-00@punch.ic.ac.uk>

I really like the combination of IContent and IContainer.
The only question I have is how an element can query
its context in an efficient way? For example, how can
I find the previous element without referring to the parent
container. Presumably then the parent would have to
enumerate all its children to find the previous content
to the element in question. Obviously a particular
application can record the previous element in a variable
but then you get to more complex contexts, like "what
is the previous of my parent".

Any thoughts?
Alfie.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From neil at bradley.co.uk  Tue Aug  5 11:49:56 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:13 2004
Subject: Specification Questions
Message-ID: <199708050949.KAA07792@andromeda.ndirect.co.uk>


Reply-to:      Peter@ursus.demon.co.uk (Peter Murray-Rust)

> Some additional - hopefully constructive - thoughts on whitespace.
> 
> The XML-lang spec does not ( and I suspect will not) give detailed guidance
> on how whitespace will be managed.  My impression is that it is up to 
> implementers and/or groups like this to come up with particular solutions.
> My worry is that these will be inconsistent and not inter-operable.

I agree totally. This was my original concern.

> ***
> Therefore I propose that those on XML-DEV who care about this problem come
> up with some guidelines for implementers. 
> ***

I very much hope this happens.

> XML does NOT treat whitespace like SGML and does NOT behave like HTML 
> (although it can be configured to do so).  As far as I see them, the rules
> are:
> 
> 'All characters that are not markup are passed to the application'.  (This
> is independent of any value of XML-SPACE (see below), processing instructions,
> stylesheets, etc.)  These characters include HT, CR, LF, SP, and probably
> a number of other Unicode 'whitespace' characters.  What the application
> does with them is *undefined* in XML-lang.
> 
> Note that this means that CR and LF are passed as separate characters. No
> normalisation takes place.  Therefore
> 
> Line one\n\rline two
> 
> is different from
> 
> Line one\nline two
> 
> even if they are visually similar on various text editors/displays, etc.
> (My impression was that SGML normalised these two strings to the same 
> ESIS output - is that right?).
> 
> This means that the author/processor 'contract' has to be aware of this.

I think all applications should be expected to either or both 
characters in sequence as a line end signal, so that platform 
dependancies can be eliminated. If there is no good reason to omit 
this taks from the XML-processor itself, I think it should be done 
there.


> *** In some cases the document author and the application author are both
> aware of this problem and so the whitespace characters inserted by the
> author will be processed in the way that they expect.  However, in most cases
> I suspect this will NOT be true and that authors will inadvertently create
> documents that are processed differently ***
> 
> XML provides an attribute XML-SPACE (local to an element BUT inherited by
> its children) which can have three values:
> 	- #IMPLIED (no signals about whitespace handling)
> 	- PRESERVE (applications preserve all the whitespace)
> 	- DEFAULT (the *application's* default white-space processing modes
> 		are acceptable fro this element).
> 
> PRESERVE seems clear.  All whitespace is passed to the application.  The 
> others seem to be dangerous unless there are some general conventions. 

> If possible, we should propose a *general* default mechanism for whitespace
> handling for XML-SPACE="DEFAULT".  If everyone adopts this, it will greatly
> reduce this problem.  Is this a reasonable strategy?

I believe so. In addition, can we not put 'XML-SPACE 
(PRESERVE|IMPLIED) "PRESERVE" in an attribute declaration for an 
element which will always have reserved content. It is common 
practice for a DTD to have some kind of pre-formatted element, such 
as HTML's '<pre>'.


> If so, we can propose that the DEFAULT mode for any whitespace processing is
> something along the lines (similar to HTML?).  Within an element with
> XML-SPACE="DEFAULT"
> 

> All whitespace sequences are mapped into a single space character.
Agreed.

> All whitespace pseudo-elements are ignored (i.e. whitespace between markup)

Ummm. what about 'the <b>bold</b>  <i>italic</i> styles...'?

> All leading and trailing whitespace in #PCDATA is ignored.

I think all applications should remove leading and trailing CR and LF
characters in a mixed content element. But not SP or HT, as this would
be undesirable in the following fragment:

A<emph>  bold  </emph>word.

Although an unusual layout, some people may use it, and it would be
unfortunate if it resulted in 'Aboldword'.


> Example:
> <FOO XML-SPACE="DEFAULT">
> <BAR> this
> <!-- comment -->
> is<!-- comment -->a 
DID YOU INTEND A SPACE SOMEWHERE BETWEEN 'is' AND 'a'?
> bar
> </BAR></FOO>
> 
> folds to:
> <FOO XML-SPACE="DEFAULT"><BAR>this is a bar</BAR></FOO>
> 
> I think it's important to address this, since otherwise I predict we shall
> have considerable confusion, especially when implementors of authoring or
> processing software have not thought this through completely.

Again, I agree, and I think it will be possible to achieve this with 
a bit more discussion in this forum.

> Peter Murray-Rust, domestic net connection

Neil.

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Tue Aug  5 15:12:30 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:13 2004
Subject: XML and whitespace: lets just dump CR and LF!
Message-ID: <199708051317.XAA23619@jawa.chilli.net.au>

> From: Neil Bradley <neil@bradley.co.uk>
> Reply-to:      Peter@ursus.demon.co.uk (Peter Murray-Rust)
 
> > Therefore I propose that those on XML-DEV who care about this problem come
> > up with some guidelines for implementers. 
 
> I very much hope this happens.
  
> > This means that the author/processor 'contract' has to be aware of this.
 

Can I suggest a very different tack?

The problem with CR/LF is one of overloading not of translation or contracts. 
They have too many meanings.  In particular they function both as 
record-start/-end characters and as new-lines.

I suggest that the following approach should be taken. (I think it is the only
realistic solution, especially if we assume that 1) data is usually generated by applications, 2) humans only check and tweak data;
3) we want operating system 
and character set independence, 4) line-breaking is generally done by clients
...so CR/LF is basically a convenience for fitting data into editors, 
not for the purposes of output.)

**A) XML applications should ignore *ALL* CR and LF as a bad joke.  They should
be entirely there for formatting the raw text into nice, eye-sized records.
So CR and LF should never be converted to spaces. (This approach was the
one taken by Interleaf, and I have come to appreciate it.) If you need a 
space, then start the new line with it!  (Ending the previous line is difficult
to see.)

**B) XML applications should mandate the use of the unambiguous Unicode characters
	-- LINE SEPARATOR  &#x2028;
	-- PARAGRAPH SEPARATOR &#x2029;

So if I want to do the equivalent of HTML 
<pre>  X<br>
</pre> 

XML can have:

<pre>&nbsp;&nbsp;X&#x2028;
<pre>

or even

<pre>
&nbsp;
&nbsp;
X


&x2028;<pre>

And it can do this with the text conventions of any operating system.

I certainly think that CR/LF should be not of interest to XML-lang. And I think
they should be of marginal interest to XML applications too. Lets dump them!


Rick Jelliffe 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue Aug  5 16:10:39 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:13 2004
Subject: XML and whitespace: lets just dump CR and LF!
Message-ID: <3.0.32.19970805070023.0081f720@pop.intergate.bc.ca>

At 11:13 PM 05/08/97 +1000, Rick Jelliffe wrote:
>**A) XML applications should ignore *ALL* CR and LF as a bad joke......
>
>I certainly think that CR/LF should be not of interest to XML-lang...
>Lets dump them!

Heh-heh.  If you go look in the proceedings of the 1988 Usenix conference,
you'll find a paper I wrote, on the Oxford English Dictionary project,
which has a section entitled

 '\n' Considered Harmful

I'd love to lose the record-end silliness.  Trouble is, we're stuck with
it until we have better editing tools. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From clloyd at gorge.net  Tue Aug  5 16:49:14 1997
From: clloyd at gorge.net (Chris Lloyd)
Date: Mon Jun  7 16:58:13 2004
Subject: Xapi-J: an architectural detail
In-Reply-To: <E0wvelF-0007bb-00@punch.ic.ac.uk>
Message-ID: <3.0.1.32.19970805074603.006bcb18@gorge.net>

akirkpatrick wrote:
>I really like the combination of IContent and IContainer.
>The only question I have is how an element can query
>its context in an efficient way? For example, how can
>I find the previous element without referring to the parent
>container. Presumably then the parent would have to
>enumerate all its children to find the previous content
>to the element in question. Obviously a particular
>application can record the previous element in a variable
>but then you get to more complex contexts, like "what
>is the previous of my parent".
>
This is where the next step is needed. Tree Iterators can provide efficient
and well abstracted mechanisms for walking the XML tree. Everyone is still
stuck on the schema part of Xpia-j and that is fine. After that is done
then it's time to add classes specifically for navigation.

Keep the schema simple. Don't add members for the previous child, etc.. It
is unnecessary and complex to maintain.

Over the past 2 years, we have been developing an object database system
for SGML. We have gone through the same thought processes as are going on
with xapi-j right now. I think there are a few design considerations to
keep in mind if you want to use iterator classes with the xapi-j schema and
I think eventually you will.

The idea of inheriting from IContainer is a good one. Polymorphism is very
useful when it comes time to write navigation classes. A base class for all
objects in the tree is very important!! We'll call this INode.

It then becomes useful to break the type of nodes into 2 classes.
IContainer and IProperty. An IProperty is always a leaf node of the tree
and an IContainer is not. After that you add your concrete classes such as
IElement.

John Tigue wrote:

>These interfaces only express the methods for navigating a tree. A
>particular class of objects would need to have some more methods to be
>interesting. For example, the interface for an element is IElement. The
>full source follows:


>public interface IElement extends IContent, IContainer
>    {
>     public String getType();
>     public void setType( String aType );
>     public void addAttribute( String name, String value );
>     public void removeAttribute( String name );
>     public IAttribute getAttribute( String attributeName );
>     public java.util.Enumeration getAttributes();
>     }

In the above example the returned interface IAttribute would inherit from
IProperty because it is a leaf node.

A Tree Iterator would already now the structure of an element when it walks
over it an would know how to retrieve the attributes. When it walks on to
an attribute, it knows it's a leaf node because it inherits from IProperty.

Again I stress that every XML object in the tree should inherit from a
single base class even if the base class does not provide any common
interfaces to it's concrete classes. In this way, any XML object can be
passed via a base class reference(Whoops, I almost said pointer). It is
trivia to implement a fast, safe-casting mechanism that uses polymorphism
for casting.

This way, we can later add navigation classes that leverage the polymorphic
nature of the XML tree.

Chris Lloyd
POET Software


>Any thoughts?
>Alfie.
>
>xml-dev: A list for W3C XML Developers
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To unsubscribe, send to majordomo@ic.ac.uk the following message;
>unsubscribe xml-dev
>List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
>
>

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From andrewl at microsoft.com  Tue Aug  5 18:55:19 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:58:13 2004
Subject: Linking and Query question
Message-ID: <7BB61B44F197D011892800805FD4F7920133B7BF@RED-03-MSG.dns.microsoft.com>

Can I create a link that, in effect, contains a query so that it
references one document among a set? For example, if I know that several
versions of a document exist, and I want to reference the latest
version, but I'm willing to accept either of the two prior versions, can
I express that?  If so, how?  Thanks.

--Andrew Layman
   AndrewL@microsoft.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue Aug  5 19:10:53 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:13 2004
Subject: Linking and Query question
Message-ID: <3.0.32.19970805100443.008bd6e0@pop.intergate.bc.ca>

At 09:54 AM 05/08/97 -0700, Andrew Layman wrote:
>Can I create a link that, in effect, contains a query so that it
>references one document among a set? For example, if I know that several
>versions of a document exist, and I want to reference the latest
>version, but I'm willing to accept either of the two prior versions, can
>I express that?  If so, how?  Thanks.

XML-link has no versioning machinery built in... this would be in the
territory of the WebDAV work, if anywhere.  I think (but am not sure)
that there is some machinery for this in the URN work.  Note that versioning
in the general case is a horribly complex problem and tends to have all
sorts of application-specific requirements, so I wouldn't bet too much
in finding a good general solution. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From john at datachannel.com  Tue Aug  5 19:11:44 1997
From: john at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:13 2004
Subject: Xapi-J: an architectural detail
References: <3.0.1.32.19970805074603.006bcb18@gorge.net>
Message-ID: <33E75F5F.CC7EC38E@datachannel.com>

Chris Lloyd wrote:

> akirkpatrick wrote:
> >I really like the combination of IContent and IContainer.
> >The only question I have is how an element can query
> >its context in an efficient way? For example, how can
> >I find the previous element without referring to the parent
> >container. Presumably then the parent would have to
> >enumerate all its children to find the previous content
> >to the element in question. Obviously a particular
> >application can record the previous element in a variable
> >but then you get to more complex contexts, like "what
> >is the previous of my parent".
> >
> This is where the next step is needed. Tree Iterators can provide
> efficient
> and well abstracted mechanisms for walking the XML tree. Everyone is
> still
> stuck on the schema part of Xpia-j and that is fine. After that is
> done
> then it's time to add classes specifically for navigation.

> Keep the schema simple. Don't add members for the previous child,
> etc.. It
> is unnecessary and complex to maintain.

I agree. I think we should follow the Visitor design pattern. Quoting
from Gamma's _Design_Patterns_: "Intent: Represent an operation to be
performed on the elements of an object structure. Visitor lets you
define a new operation without changing the classes of the elements on
which it operates." Here the operation is tree interation.

>
>
> Over the past 2 years, we have been developing an object database
> system
> for SGML. We have gone through the same thought processes as are going
> on
> with xapi-j right now. I think there are a few design considerations
> to
> keep in mind if you want to use iterator classes with the xapi-j
> schema and
> I think eventually you will.
>
> The idea of inheriting from IContainer is a good one. Polymorphism is
> very
> useful when it comes time to write navigation classes. A base class
> for all
> objects in the tree is very important!! We'll call this INode.
>

public interface INode    {
    // What do we put in here?
    }

> <snip/>
> Again I stress that every XML object in the tree should inherit from a
>
> single base class even if the base class does not provide any common
> interfaces to it's concrete classes. In this way, any XML object can
> be
> passed via a base class reference(Whoops, I almost said pointer). It
> is
> trivia to implement a fast, safe-casting mechanism that uses
> polymorphism
> for casting.
>

So the interfaces in Xapi-J would extend INode like this?

public interface INode {...}

public interface IContainer extends INode{...}

public interface IElement extends IContainer {...}

This way an IElement is also an INode so passing via base interface can
be done for any object in the model. We're still dealing purely with
interfaces so vendors are still free to implement their own base
classes. This also could be mapped to CORBA, DCOM, and others.

> This way, we can later add navigation classes that leverage the
> polymorphic
> nature of the XML tree.
> <snip/>

--
John Tigue
Sr. Software Architect
DataChannel
http://www.datachannel.com
jtigue@datachannel.com
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 263 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970805/0d9759e5/vcard.vcf
From eliot at isogen.com  Tue Aug  5 20:10:29 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:13 2004
Subject: Linking and Query question
Message-ID: <3.0.32.19970805130657.00b1692c@swbell.net>

At 09:54 AM 8/5/97 -0700, Andrew Layman wrote:
>Can I create a link that, in effect, contains a query so that it
>references one document among a set? For example, if I know that several
>versions of a document exist, and I want to reference the latest
>version, but I'm willing to accept either of the two prior versions, can
>I express that?  If so, how?  Thanks.

If the reference to a document is via an entity reference, the query can be
part of the system ID for the document.  As system IDs in XML are always
URLs, if you have a way of expressing the query in an URL, you can do it
that way.  If not, then the short answer is "no" (unless there's some
aspect of URLs or TEI extended pointers I've overlooked, which it quite
possible).

In a general SGML system, there are three basic approaches:

1. Define your own application-specific addressing syntax and semantics and
use it, hoping tools will support it or providing your own support (because
the scope of use is totally within your control).

2. Use Formal System Identifiers and make the query part of an entity's
system ID.

3. Use query addressing and make the query part of a direct or indirect
address  (that does not use a declared entity).

The only difference between these three approaches is that two and three
are done within the framework of standardized definitional mechanisms
defined by ISO/IEC 10744:1997 while one is not.  In all three cases you
still have to implement support for the query and provide the necessary
integration with the tools you're using (browser to repository, editor to
repository, etc.).

The Formal System Identifier Definition Requirements (FSIDR) facility of
ISO/IEC 10744:1997 (Annex A.6, reviewable at
http://www.drmacro.com/hythtml/clause-A.6.html) provides a syntax for
associating repository-specific attributes with system IDs.

For example, say you have a repository with a "version" property for
storage objects.  You can refer to this property by declaring the
repository as a "storage manager" and providing an attribute (or
attributes) for specifying the version you want, something like this:

<!-- Use "FSISM" PI to identify the names of storage manager notations: -->
<?IS10744 FSISM MyDocManager>

<!-- Declare notation for storage manager.  Serves to provide local name
     for repository so generic system can call repository's API or 
     human observer can tell what the repository is. -->
<!NOTATION MyDocManager PUBLIC "-//ME//NOTATION FSISM My Document
Manager//EN" >

<!-- Declare attributes for passing parameters to the repository: -->
<!ATTLIST #NOTATION MyDocManager
   version -- The required version.  Syntax is "([<>][=]?)?[0-9]+(\.[0-9]+)?"
              Prefixes for version number:
              <    Anything less than specified version
              >    Anything greater than specified version
              <=   Anything less than or equal to specified version
              >=   Anything greather than or equal to specified version 
              If no prefix specified, only specified version is used.
           --
     CDATA #IMPLIED  -- Default: latest version --
>

Obviously, these declarations can be provided by the storage manager
provider and used by reference from documents--you wouldn't expect authors
to type these things themselves (or even necessarily be aware of their
presence or use).

You then invoke the storage manager by treating the notation name as an
element type name within the system ID:

<!ENTITY A-Doc SYSTEM "<MyDocManager version='>1.2'>mydoc.xml" CDATA SGML >

As the semantics of the tags within a system ID are well defined by the
FSIDR, it is probably reasonable for XML systems to treat the tag name as a
repository notation name even when the formal declarations are not present.
 If the storage manager name is well understood (e.g., "URL"), there's no
problem.  It's probably also reasonable to assume that storage manager
names are generally unique and therefore processing can be associated with
the names directly (rather than by requiring a notation declaration with a
public ID).  This is analogous to being able to map entities by entity name
within an SGML Open catalog.

A processor would provide a way to associate the storage manager notation
MyDocManager with that storage manager's API (i.e., the integrator of the
storage manager would register a DLL or DLL entry point with the notation's
public identifier).  The processor would then pass the value of the version
attribute and the data following the MyDocManager start tag to the API.

If you're not addressing the document as an entity but using some other
query, I don't think XML Link provides a way to do this (because it doesn't
generalize the notion of addressing by query).

The HyTime architecture does generalize addressing by query such that you
can declare a query notation with whatever semantics you want and then use
that query.  The only requirement is that the result of the query be a list
of nodes in groves.  In DOM terms this would mean you get back objects
conforming to the DOM model, rather than the unparsed data of the document
addressed. (All addressing is in terms of the results of parsing, not the
unparsed source.)

For example, to create and use such a query, you could do something like this:

<!-- Declare a notation for my query.  The syntax and semantics of 
     this query are presumably documented somewhere.  The public ID
     of the notation should get an observer to this documentation. -->
<!NOTATION MyDocQuery  PUBLIC "-//ME//NOTATION My Document Query//EN" >

<-- Now declare an element type that uses this query notation for 
    addressing: -->

<!ELEMENT DocLink  -- A hyperlink to another document using a query --
  - - (#PCDATA) -- Content is title of document linked to --
>
<!ATTLIST DocLink
    document CDATA #REQUIRED -- Contains query of document to link to --
    loctype  CDATA #FIXED "document QUERYLOC MyDocQuery" 
      -- Associate referential 'document' attribute with query
         notation 'MyDocQuery' (uses "reference location address" facility)
      --
    HyTime   NAME  #FIXED "hylink" -- This is a HyTime hyperlink --
    anchrole CDATA #FIXED "refmark document"
      -- Roles of the anchors of this link.  DocLink element is reference
         mark. --
    anchcstr CDATA #FIXED "self required" 
      -- Indicate that the first anchor role (refmark) is a "self anchor",
         that is played by the link element itself. --
>

...
<p>See <doclink document="mydoc.xml[version 1.2+]">My document</doclink>...

A HyTime aware processor interprets the above as follows:

1. Sees that Doclink is a hyperlink.  Looks for the required (by HyTime)
"anchrole" attribute, from which it will determine the names of the
attributes used to address the anchors (they are the same as the anchor
role names).

2. Sees that "refmark" is a self anchor, so no addressing attribute is
needed for it.  Sees that second role is "document".  Looks for attribute
named "document".

3. Finds attribute named "document".  Looks for attribute named "loctype"
(location type) to see if a location type has been associated with this
attribute (without location type, the HyTime engine has no way of knowing
what form of addressing is being used [unless the attribute is declared as
IDREF(s) or ENTITY/ENTITIES]).

4. Finds a loctype attribute and sees that the document attribute is a
query location that uses the notation named "MyDocQuery"

5. Looks to see if a notation named MyDocQuery has been declared.  It has.

6. Passes the value of the document attribute to the MyDocQuery API (again,
registered using whatever integration API the browser provides).  The
processor (my document manager in this case), interprets the query and
provides a response.

7. Waits until it gets a response, which had better be a list of objects in
an object model it understands (e.g, grove nodes, DOM objects, etc.).

8. Assuming it gets a response, enables traversal to the returned objects.

XML Link removes the need for the above general processing by providing a
fixed set of query notations that XML Link recognizes (URLs and TEI
extended pointers).  However, this limits your ability to do things these
two query notations don't provide for.  Note also that the XML Link
specification can be defined in terms of the HyTime generalizations such
that any general-purpose HyTime engine can process XML Link documents (and
you would expect HyTime engines to have built-in support for XML Link so
that there would be no additional integration required to process XML Link
documents).

The HyTime mechanism has no "magic"--it just provides a framework within
which the integration you'd have to do in any case can be done.  It simply
provides a way to name things (queries, storage managers) with
universally-unique names (public IDs) and associate these universal names
with local names (notation names).  This framework standardizes the formal
declaration of what you're doing and (hopefully) makes the integration
mechanism consistent across tools, which shoudl make integration easier.
It doesn't remove the need for tools to be plugged together by humans
(either directly or through the definition of API standards like the DOM or
CORBA or ODBC).

Cheers,

Eliot


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From andrewl at microsoft.com  Tue Aug  5 21:12:28 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:58:13 2004
Subject: Linking and Query question
Message-ID: <7BB61B44F197D011892800805FD4F7920133B7D4@RED-03-MSG.dns.microsoft.com>

Thanks.  I like the power and exactness of the example you showed using
FSID.  Now I need to find a way to integrate that with a URI scheme.

--Andrew Layman
   AndrewL@microsoft.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eliot at isogen.com  Tue Aug  5 21:35:39 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:13 2004
Subject: Linking and Query question
Message-ID: <3.0.32.19970805142954.00b09c38@swbell.net>

At 12:11 PM 8/5/97 -0700, Andrew Layman wrote:
>Thanks.  I like the power and exactness of the example you showed using
>FSID.  Now I need to find a way to integrate that with a URI scheme.

Cool. Let me know if I can be of assistance in any way.

Cheers,

Eliot

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Tue Aug  5 22:39:34 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:13 2004
Subject: XML and whitespace: lets just dump CR and LF!
Message-ID: <199708052044.GAA01690@jawa.chilli.net.au>


> From: Tim Bray <tbray@textuality.com>
 
> Heh-heh.  If you go look in the proceedings of the 1988 Usenix conference,
> you'll find a paper I wrote, on the Oxford English Dictionary project,
> which has a section entitled
> 
>  '\n' Considered Harmful
> 
> I'd love to lose the record-end silliness.  Trouble is, we're stuck with
> it until we have better editing tools. -T.
 
I'm not saying to ban the characters, merely to say give them no significance
for an application.  So we can still use our existing editing tools.

For example, using vi or sed to add the unambiguous newline to an existing file,
which will be stuck in an HTML-like <PRE>, it is merely a rule like 
   1,$s/$/\&#x2028;/
which is trivial.  

We can do this only because we are using ISO 10646 as the document character
set: since we have the chance to clear up the mess with a simple convention,
why not take it!


Rick Jelliffe

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From clloyd at gorge.net  Wed Aug  6 00:07:44 1997
From: clloyd at gorge.net (Chris Lloyd)
Date: Mon Jun  7 16:58:13 2004
Subject: Xapi-J: an architectural detail
Message-ID: <3.0.1.32.19970805150432.006ba560@gorge.net>

>John Tigue wrote:
>>
>>I agree. I think we should follow the Visitor design pattern. Quoting
>>from Gamma's _Design_Patterns_: "Intent: Represent an operation to be
>>performed on the elements of an object structure. Visitor lets you
>>define a new operation without changing the classes of the elements on
>>which it operates." Here the operation is tree interation.
>

Yes, Our whole system is based on "Design Patterns". We use the visitor
pattern for formatting output and for operations where we walk a subtree
from stem to stern. They are useful and easy to implement. Visitors are
good for moving an operation outside a class. They are not so good for
defining and extending complex walking tasks.

We find it necessary to have iterators for complex walking tasks. You can
let an iterator drive a visitor as well. We are doing very complex queries
right now using iterators, algorithmns, functions, and operators. The last
three patterns are borrowed from STL. The problem with the last three
patterns for Java is that they are template driven. A very complex tree
walking query/algorithmn can be formulated in a single line of C++. I'm
sure they could be adapted to Java.

We are dealing with tree versioning as well, so we have trees within trees.
You just can't expect to walk this stuff without some well abstracted
navigation patterns.

>I wrote:
>> The idea of inheriting from IContainer is a good one. Polymorphism is
>> very
>> useful when it comes time to write navigation classes. A base class
>> for all
>> objects in the tree is very important!! We'll call this INode.
>>

>
>public interface INode    {
>    // What do we put in here?
>    }
>

It is not necessary for anything to be in the base class. It is just
necessary for there to be one.

For example: An Iterator has a very simple interface

public interface ITreeIterator
{
	ITreeIterator(INode, ICursor iCursor, INodeIterFactory iFactory);
	bool next(); // walk to the next node
	INode current(); // return the current node
	long GetIterLevel(); // return how many tags deep we are from starting
position
}

We construct the iterator with a current node which can be the document
root or any object in the tree. We use a cursor which externalizes the
walking algorithmn(forward, backward, follow links) and we use a Factory
which provides the algorithmns for walking each object type in the tree.

This iterator will return different objects in the tree or maybe even walk
the tree differently depending on the factory and cursor that it is
constructed with. 

This iterator will not work without a common base class because the
iterator knows NOTHING about the types of objects in the tree. It only
knows what an INode is.

>So the interfaces in Xapi-J would extend INode like this?
>
>public interface INode {...}
>
>public interface IContainer extends INode{...}
>
>public interface IElement extends IContainer {...}
>
>This way an IElement is also an INode so passing via base interface can
>be done for any object in the model. We're still dealing purely with
>interfaces so vendors are still free to implement their own base
>classes. This also could be mapped to CORBA, DCOM, and others.

YES!

Chris Lloyd
POET Software

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From h.rzepa at ic.ac.uk  Wed Aug  6 09:46:07 1997
From: h.rzepa at ic.ac.uk (Rzepa, Henry)
Date: Mon Jun  7 16:58:13 2004
Subject: XML-DEV Digest
Message-ID: <v03110700b00dd82abdeb@[155.198.224.86]>

 A number of people have asked for a digest of this list. I forwarded this
request to our postmaster. As soon as it is actioned, I will let this list know
the details.

Dr Henry Rzepa,  Dept. Chemistry,  Imperial College,  LONDON SW7 2AY;
mailto:rzepa@ic.ac.uk; Tel  (44) 171 594 5774; Fax: (44) 171 594 5804.
URL: http://www.ch.ic.ac.uk/rzepa/ 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed Aug  6 17:19:29 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:13 2004
Subject: Xapi-J: an architectural detail
Message-ID: <9160@ursus.demon.co.uk>

In message <33E75F5F.CC7EC38E@datachannel.com> john@datachannel.com (John Tigue) writes:

Firstly many thanks to John for driving this forward and for the positive replies
from several others. let's make sure that we get closure on this fairly shortly
as we don't want to fall back to where we were 4-5 months ago with a lot of
enthusiasm and no final outcome.

I know we are all doing this on a voluntary basis, but if we get it right this 
time we save a lot of problems later.  I have put my JUMBO development on hold
because I really want to get it on top of a decent architecture.  We need to
know precisely what an Element, Node, etc. are :-)

I get the impression from John and others that it is possible to create an
API which does not necessarily suport the property set today, but is capable
of doing it in the future without rewriting.  If so, then perhaps John and
others could suggest where they intend to freeze the current API at. If we don't
set some limits now, there is the danger that we try to be too ambitious.

As soon as an API is established, a benefit will be that we can start to think
about what other features of XML processing need to be covered in a generic 
manner.  I found that quite of lot of JUMBO implementation was generic 
(e.g. checking semantic validity, 'inheritance/default' implementation, etc.
which is not trivial and should be isolated as far as possible from applications.


> 
> Chris Lloyd wrote:
[...]
> > This is where the next step is needed. Tree Iterators can provide
> > efficient
> > and well abstracted mechanisms for walking the XML tree. Everyone is
> > still
> > stuck on the schema part of Xpia-j and that is fine. After that is
> > done
> > then it's time to add classes specifically for navigation.
> 
> > Keep the schema simple. Don't add members for the previous child,
> > etc.. It
> > is unnecessary and complex to maintain.

I agree with this - it should be possible to add these in at a later stage
(e.g. by subclassing in Java).  I have a lot of stuff in JUMBO that implements
treewalking (e.g. TEI Xptrs) and even tree-editing, and my Tree/Node classes 
can have up to 100 methods each. We want to avoid this at this stage :-)

> 
> I agree. I think we should follow the Visitor design pattern. Quoting
> from Gamma's _Design_Patterns_: "Intent: Represent an operation to be
> performed on the elements of an object structure. Visitor lets you
> define a new operation without changing the classes of the elements on
> which it operates." Here the operation is tree interation.
> 
> >
> >
> > Over the past 2 years, we have been developing an object database
> > system
> > for SGML. We have gone through the same thought processes as are going
> > on
> > with xapi-j right now. I think there are a few design considerations
> > to
> > keep in mind if you want to use iterator classes with the xapi-j
> > schema and
> > I think eventually you will.
> >
> > The idea of inheriting from IContainer is a good one. Polymorphism is
> > very
> > useful when it comes time to write navigation classes. A base class
> > for all
> > objects in the tree is very important!! We'll call this INode.
> >

I tend to support this. It makes general management such as editing and display
easier, even if the Node objects are not of the same class.


	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed Aug  6 17:19:57 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:14 2004
Subject: Specification Questions
Message-ID: <9162@ursus.demon.co.uk>

In message <199708050949.KAA07792@andromeda.ndirect.co.uk> "Neil Bradley" writes:
> 
> 
> Reply-to:      Peter@ursus.demon.co.uk (Peter Murray-Rust)
> 
> > Some additional - hopefully constructive - thoughts on whitespace.
> > 
> > The XML-lang spec does not ( and I suspect will not) give detailed guidance
> > on how whitespace will be managed.  My impression is that it is up to 
> > implementers and/or groups like this to come up with particular solutions.
> > My worry is that these will be inconsistent and not inter-operable.
> 
> I agree totally. This was my original concern.
> 
> > ***
> > Therefore I propose that those on XML-DEV who care about this problem come
> > up with some guidelines for implementers. 
> > ***
> 
> I very much hope this happens.
> 
[...]
> 
> I think all applications should be expected to either or both 
> characters in sequence as a line end signal, so that platform 
> dependancies can be eliminated. If there is no good reason to omit 
> this taks from the XML-processor itself, I think it should be done 
> there.
> 
> 
[...]
> 
> I believe so. In addition, can we not put 'XML-SPACE 
> (PRESERVE|IMPLIED) "PRESERVE" in an attribute declaration for an 
            ^^^^^^^
I think you meant DEFAULT - #IMPLIED is when no value is given.

> element which will always have reserved content. It is common 
> practice for a DTD to have some kind of pre-formatted element, such 
> as HTML's '<pre>'.
> 
> 
> > If so, we can propose that the DEFAULT mode for any whitespace processing is
> > something along the lines (similar to HTML?).  Within an element with
> > XML-SPACE="DEFAULT"
> > 
> 
> > All whitespace sequences are mapped into a single space character.
> Agreed.
> 
> > All whitespace pseudo-elements are ignored (i.e. whitespace between markup)
> 
> Ummm. what about 'the <b>bold</b>  <i>italic</i> styles...'?
> 
> > All leading and trailing whitespace in #PCDATA is ignored.
> 
> I think all applications should remove leading and trailing CR and LF
> characters in a mixed content element. But not SP or HT, as this would
> be undesirable in the following fragment:
> 
> A<emph>  bold  </emph>word.
> 
> Although an unusual layout, some people may use it, and it would be
> unfortunate if it resulted in 'Aboldword'.
> 
OK - I had overlooked this.

Taking account of other posts on this subject here and elsewhere, there seems to
be a positive view that a set of Guidelines/Best Practice/Gerally Agreed 
Conventions should be developed, and that XML-DEV is probably the right place.

It's also clear that the more of this that can be done before the XMLProcessor
output gets to the *specific* application - e.g. a browser or transformer - the
better.  We seem to be looking at a filter or layer immediately after/on_top_of
the XMLProcessor.  At the ESIS stream level we could have:

Document ->[Parser] -> ESIS -> [XMLWhitespace] -> NewESIS -> [Application]

and at the API level something that either sits on top of the EventStream or
the  final TreeFactory (or whatever it's called).

(There is a difficulty in filtering any document, in that XPtrs in XML-LINK
would appear to have to operate on the unfiltered document (although this is
not specifically stated, it's implied).  So it might have to be that the 
stream or tree contained 'significant' and 'non-significant' whitespace, and 
that the application would have to be able to recognise the flag.  All Xptr
activity has to take place on *all* whitespace (although I don't think this
is pretty).

The current switch PRESERVE is clear (everything goes through).  It would go
against the spec if it didn't do this. That means (I suppose) that CR+LF is 
different from LF - that's the price paid for PRESERVE. The other option DEFAULT
cannot map onto a set of actions that we all agree for all documents. Therefore
we have to give DEFAULT some hints at the *document* level - presumably through
PIs.

Can we propose, therefore. a set of PIs that would control whitespace 
processing? I would hope that we could keep this to a very small number 
(ca. 3-4).  Is it too simple to suggest that there are two types of markup
(STRUCTURE and TEXT) that need to normalise whitespace?  the former would
deal with things like:
<PRETTY>
  <PRINT>
  </PRINT>
</PRETTY>
where the author did not intend there to be any whitespace, and the second
would deal with
<P>
This is a
long         space in a <B>paragraph</B>.
</P>
where all whitespace would be normalised to a single space as in HTML?

Where a document contained both, the author could use a PI to switch between 
them.

If we could come up with a very simple set of options, it might make it 
sufficiently simple that a standard filter could be devised, or the application
programmer had a much simpler strategy.  Is consensus possible?

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Wed Aug  6 17:20:08 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:14 2004
Subject: XML and whitespace: lets just dump CR and LF!
Message-ID: <9163@ursus.demon.co.uk>

In message <199708051317.XAA23619@jawa.chilli.net.au> "Rick Jelliffe" writes:
[...]
> 
> I suggest that the following approach should be taken. (I think it is the only
> realistic solution, especially if we assume that 1) 
> data is usually generated by applications, 
    Although this will be partly true, I think we still have to expect people
to use text editors for a year or two yet :-). [It's how I create most of
my XML at present :-)].

> 2) humans only check and tweak data;
Yes.  XML must certainly be tweakable. So it mustn't have to have lines 1000 
chars long :-)

> 3) we want operating system 
> and character set independence, 

critical :-)

4) line-breaking is generally done by clients
> ...so CR/LF is basically a convenience for fitting data into editors, 
> not for the purposes of output.)

Yes.

> 
> **A) XML applications should ignore *ALL* CR and LF as a bad joke.  They should
> be entirely there for formatting the raw text into nice, eye-sized records.
> So CR and LF should never be converted to spaces. (This approach was the
> one taken by Interleaf, and I have come to appreciate it.) If you need a 
> space, then start the new line with it!  (Ending the previous line is difficult
> to see.)

Appeals to me :-)

> 
> **B) XML applications should mandate the use of the unambiguous Unicode characters
> 	-- LINE SEPARATOR  &#x2028;
> 	-- PARAGRAPH SEPARATOR &#x2029;
> 
This makes sense unless someone finds a flaw in it...

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From sarah at attd.com  Wed Aug  6 21:55:13 1997
From: sarah at attd.com (Sarah Slocombe)
Date: Mon Jun  7 16:58:14 2004
Subject: Xapi-J: an architectural detail
Message-ID: <3.0.32.19970806155554.006b6f00@mail.lglobal.com>

Greetings!

I've been following this thread with great interest. 
I'm trying to piece together the suggestions so far but
I wonder if I've muddled it already. Perhaps I should just 
wait a bit longer but things are really starting to get 
exciting now!

As I understand it, we've got:

public interface INode{
    public INode getParent();
    public void setParent(INode aContainer);
}

(Or is INode ONLY so things have a common base class/
interface, and shouldn't have any methods? Or does an 
IContainer never need to deal with parents? Or ought even 
parent stuff to be handled by iterators?)

public interface IContainer extends INode{
    public Enumeration getContents();
    public void insertContent(IContent aContent, 
        IContent preceedingContent);
    public void appendContent(IContent aContent);
    public void removeContent(IContent aContent);
}

public interface IContent extends INode{
     public String getData();
}

public interface IElement extends IContent, IContainer{
    public String getType();
    public void setType(String aType);
    public void addAttribute(String name, String value);
    public void removeAttribute(String name);
    public IAttribute getAttribute(String attributeName);
    public java.util.Enumeration getAttributes();
}

So far so good? Now what about IAttribute? John Tigue's
shown:

public interface IAttribute{
    public String getName();
    public void setName(String aName);
    public String getValue();
    public void setValue();
}

Ought this to inherit from IContent? Chris Lloyd spoke of
IContainer vs. IProperty -- are IContent and IProperty the
same thing?

Thanks for any help.


Sarah Slocombe
sarah@attd.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From john at datachannel.com  Wed Aug  6 22:49:28 1997
From: john at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:14 2004
Subject: Xapi-J: an architectural detail
References: <3.0.32.19970806155554.006b6f00@mail.lglobal.com>
Message-ID: <33E8E3E4.AAA99E5@datachannel.com>

Sarah Slocombe wrote:

> <snip/>

> So far so good? Now what about IAttribute? John Tigue's
> shown:
>
> public interface IAttribute{
>     public String getName();
>     public void setName(String aName);
>     public String getValue();
>     public void setValue();
> }
>
> Ought this to inherit from IContent? Chris Lloyd spoke of
> IContainer vs. IProperty -- are IContent and IProperty the
> same thing?
> <snip/>

IContent is for things in things so I think IAttribute would extend
INode and maybe an IProperty but not IContent as it was initially
designed. We haven't nailed down which interfaces are in Xapi-J. I think
Chris was saying that IProperty is for leaves in the parse tree. I've
been maintaining a site which discusses the Xapi-J interfaces at
http://www.datachannel.com/ChannelWorld/xml/dev. You can find the other
Xapi-J interfaces there.

As for stopping at some point and labeling what we have as Xapi-J 1.0, I
think we are very close to a point where we can do that. The real work
is having XML processor providers implementing Xapi-J. I have a
functional XML processor which complies to Xapi-J which I've been using
to test the concepts. It doesn't reflect the latest stuff like INode.
I'll rev the site and the example processor this weekend.


--
John Tigue
Sr. Software Architect
DataChannel
http://www.datachannel.com
jtigue@datachannel.com
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 263 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970806/0f3d455d/vcard.vcf
From andrewl at microsoft.com  Thu Aug  7 00:16:20 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:58:14 2004
Subject: XML-Link: Relative URL expansion
Message-ID: <7BB61B44F197D011892800805FD4F7920133B7FF@RED-03-MSG.dns.microsoft.com>

If an XML-Link element has a relative URL in its href attribute, what is
used as the base for resolving the URL?  I presume the URL of the
containing document. Is this correct?

--Andrew Layman
   AndrewL@microsoft.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Thu Aug  7 00:26:59 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:14 2004
Subject: XML-Link: Relative URL expansion
Message-ID: <3.0.32.19970806152015.00830410@pop.intergate.bc.ca>

At 03:15 PM 06/08/97 -0700, Andrew Layman wrote:
>If an XML-Link element has a relative URL in its href attribute, what is
>used as the base for resolving the URL?  I presume the URL of the
>containing document. Is this correct?

More properly "containing resource", but yes.  Check the XML spec, section
4.3.2, for some more details.  In this connection I'd also recommend a look 
at RFC 1808. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From andrewl at microsoft.com  Thu Aug  7 00:54:02 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:58:14 2004
Subject: FW: First Draft of RDF, differences from my notes.
Message-ID: <7BB61B44F197D011892800805FD4F7920133B80B@RED-03-MSG.dns.microsoft.com>

After reading the RDF paper, I posted the following message to the RDF
working group.  Since the RDF paper is now posted to the XML dev mailing
list, these comments are relevant in the new context.

--Andrew Layman
   AndrewL@microsoft.com

> -----Original Message-----
> From:	Andrew Layman 
> Sent:	Friday, August 01, 1997 4:24 PM
> To:	w3c-labels-wg@w3.org; w3c-dsig-collect@w3.org
> Subject:	First Draft of RDF, differences from my notes.
> 
> Thank you for the early draft of the paper. In reading it over, I've
> found a number of points that differ from my recollection of our
> Boston meeting. Perhaps my notes and memory are wrong on some of these
> points (in which case I welcome correction) but it also appears that
> some new features have crept into the document:
> 
> 2.	We only agreed on ablocks describing single resources. I
> remember discussing having an RDF assertion block describe
> characteristics of more than one resource, but concluding that this is
> a difficult problem with great risk of user confusion. (I'm not
> opposed to solving this problem; just want to note that we did not
> solve it but left it for the future.)
> 
> 2.4	I don't remember us ever finding a satisfactory way for the
> ablock to actually contain its target resource (because the
> subelements of an ablock are interpreted as properties of the ablock's
> target).
> 
> 2.	We discussed the need for a small set of base data types, which
> I believe were strings, numbers and dates/times.  We also talked at
> length about the need to distinguish between a base semantic type such
> as date and a particular format such as ISO8061. The sentance
> beginning "The domain of property values..." does not reflect dates or
> the semantic/format distinction.
> 
> 3.	I don't remember agreement on refTypeAttr.  Did we but I don't
> have it in my notes?
> 
> 3.	We most definitely did not agree that the first namespace
> element sets a default namespace!  We did agree, tentatively, that we
> might make the "as" attribute optional, where its omission could
> signal that it was to be the default namespace for its containing
> element (with the caveat that this needs more thought).  We also
> discussed that a namespace attribute on the containing element might
> be a better way to achieve the same effect.
> 
> 3.	I remember discussing listItem, but don't remember ever nailing
> it down precisely or agreeing on it.
> 
> Example 5.1.1.	This simply needs to be clarified. I think what
> is meant is that an ablock with no href has as its implied target the
> entirety of the enclosing document.
> 
> 5.2.3	The note at the bottom makes the assertion that a downlevel
> application can blindly concatenate together elements it does not
> understand. My recollection is that we discussed this, concluded that
> such a policy is dangerous and presumes to dictate processing. We did
> agree to investigate adding some standard attribute that might signal
> when such a policy is reasonable. We identified three values for such
> an attribute: (a) ignore the unknown element, (b) ignore the unknown
> tag, (c) application cannot process this element or any peer.
> 
> I don't mean these comments to be interpreted as disagreements with
> any aspect of the RDF design, but rather as a report on differences
> between my notes and the current paper.
> 
> --Andrew Layman
>    AndrewL@microsoft.com
> 
> -----Original Message-----
> From:	Ralph R. Swick [SMTP:swick@w3.org]
> Sent:	Friday, August 01, 1997 9:49 AM
> To:	w3c-labels-wg@w3.org; w3c-dsig-collect@w3.org
> Subject:	First draft of RDF specification for review
> 
> The first draft of the Resource Description Framework Model and Syntax
> specification (Lassila & Swick, eds.) is now ready for your review and
> comment.
> 
>   http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html
> 
> I would like to ask this working group's permission to distribute
> this draft to w3c-xml-sig.  xml-sig is the forum where technical
> discussions of XML are ocurring and they particularly need to see
> our requirements for the namespace tag.  The only reason I ask your
> consent is that while xml-sig is a W3C Members forum, it has quite
> a few non-Member invited experts.  I will distribute this draft to
> that list at 1600UTC on Monday, August 5 unless I hear serious
> objections before then.
> 
> Thanks to all who have contributed thus far, and to each of you who
> will take the time to review and make suggestions for improvement.
> 
> -Ralph and Ora

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From andrewl at microsoft.com  Thu Aug  7 00:54:41 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:58:14 2004
Subject: RDF Specification: Ambiguity of the ABLOCK
Message-ID: <7BB61B44F197D011892800805FD4F7920133B80C@RED-03-MSG.dns.microsoft.com>


--Andrew Layman
   AndrewL@microsoft.com

> -----Original Message-----
> From:	Andrew Layman 
> Sent:	Friday, August 01, 1997 4:55 PM
> To:	w3c-labels-wg@w3.org; w3c-dsig-collect@w3.org
> Subject:	RE: First draft of RDF specification for review
> 
> The example shown in 5.1.4 shows an interesting aspect of RDF:
> 
> <color>
>   <ablock>
>     <hue>1</hue>
>     <lightness>45</lightness>
>     <saturation>70</staturation>
>   </ablock>
> </color>
> 
> Color is a property with three sub-elements. However, it is not
> written that way. Instead it is shown containing an ablock, which then
> has three sub-elements.
> 
> What is the target of this ablock?  Section 5.1.1 implies that an
> ablock without an href has as its target the containing document.
> Here, the rule seems to be that the target is the immediate parent.
> 
> Why do we need this ablock?  Why do we not just have a color that
> itself has three sub-elements, as in
> 
> <color>
>   <hue>1</hue>
>   <lightness>45</lightness>
>   <saturation>70</staturation>
> </color>
> 
> I think the reason we don't is that the RDF rule about properties is
> that they must be binary. That is, the target of the color property
> must be a single object. In actuality here, we have what amounts to a
> quaternary relation, so we have interposed this "ablock" element in
> order to reify the quaternary relation.
> 
> I don't think this is the same kind of ablock at all as used in 5.1.1.
> In fact, I don't think that "ablock" is the right element. The literal
> interpretation of 5.1.4 is that the target of the color relation is a
> typeless thing with three properties. Should not the target be a
> color?  As in
> 
> <color>
>     <colorHSV>
>         <hue>1</hue>
>         <lightness>45</lightness>
>         <saturation>70</staturation>
>     </colorHSV>
> </color> 
> 
> We could also reach this conclusion by thinking about the colorHSV as
> a datatype describing how to interpret its subelements to produce a
> color. (This point has implications for general thinking about data
> types.)
> 
> --Andrew Layman
>    AndrewL@microsoft.com
> 
> -----Original Message-----
> From:	Ralph R. Swick [SMTP:swick@w3.org]
> Sent:	Friday, August 01, 1997 9:49 AM
> To:	w3c-labels-wg@w3.org; w3c-dsig-collect@w3.org
> Subject:	First draft of RDF specification for review
> 
> The first draft of the Resource Description Framework Model and Syntax
> specification (Lassila & Swick, eds.) is now ready for your review and
> comment.
> 
>   http://www.w3.org/Member/9708/WD-rdf-syntax-970801.html
> 
> I would like to ask this working group's permission to distribute
> this draft to w3c-xml-sig.  xml-sig is the forum where technical
> discussions of XML are ocurring and they particularly need to see
> our requirements for the namespace tag.  The only reason I ask your
> consent is that while xml-sig is a W3C Members forum, it has quite
> a few non-Member invited experts.  I will distribute this draft to
> that list at 1600UTC on Monday, August 5 unless I hear serious
> objections before then.
> 
> Thanks to all who have contributed thus far, and to each of you who
> will take the time to review and make suggestions for improvement.
> 
> -Ralph and Ora

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From andrewl at microsoft.com  Thu Aug  7 00:57:41 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:58:14 2004
Subject: FW: content: sequence?
Message-ID: <7BB61B44F197D011892800805FD4F7920133B80D@RED-03-MSG.dns.microsoft.com>

The following is a message to the RDF working group regarding sequence
in RDF.  This led to some subsequent discussion in which I argued that
if sequence is a generally useful concept, 3a is the best answer.  We
also discussed the relative merits of indicating sequence on the
containing element vs. the contained.

--Andrew Layman
   AndrewL@microsoft.com

> -----Original Message-----
> From:	Andrew Layman 
> Sent:	Monday, August 04, 1997 2:55 PM
> To:	w3c-labels-wg@w3.org; w3c-dsig-collect@w3.org
> Subject:	RE: content: sequence?
> 
> We did not reach agreement on how best to handle sequence in Boston,
> though we did agree that there are times in RDF when sequence is
> significant and other times when it is not. We discussed the
> possibility of having an attribute on an element signalling to an
> application when it could ignore sequence. This was generally agreed
> to as a direction, but we did not agree on what the appropriate
> default should be.
> 
> There were three approaches discussed:
> 
> 1.	a.	Sequences are always important on some (tbd) elements
> (e.g. "list") and never on others.
> 	b.	Sequences are not important on some (tbd) elements (e.g.
> "ablock"), but are significant on all others.
> 
> 2.	Sequence-significance could be indicated by an attribute,
> required on elements defined by RDF, and presumably unavailable on
> other elements. 
> 
> 3.	Sequence-significance could be indicated by an attribute that
> can be used on any element. If omitted, and if no default was given in
> a schema, then 
> 		a.	The application should follow the XML precedent
> of treating sequence as significant (after all, it might be).
> 		b.	The application should treat sequence as
> insignificant (after all, that takes less processing).
> 
> Separately, we briefly discussed whether sequence-significance should
> be lexically inherited, but this dissolved into the general difficulty
> of lexical inheritance.
> 
> By my calculation, the only options fully compatible with XML without
> implying any sort of contextual processing or lexical inheritance are
> 1a, 2 and 3a.
> 
> --Andrew Layman
>    AndrewL@microsoft.com
> 
> -----Original Message-----
> From:	Tim Bray [SMTP:tbray@textuality.com]
> Sent:	Saturday, August 02, 1997 12:15 PM
> To:	w3c-labels-wg@w3.org; w3c-dsig-collect@w3.org
> Subject:	content: sequence?
> 
> The draft does not, unless, I missed it, allow for sequence in the RDF
> model.  This is going to be widely required in all sorts of classes of
> metadata (examples on request).  I don't think RDF 1.0 is worthwhile
> without
> sequence.  
> 
> Suggestion: RDF already has a list primitive.  If I say
> 
> <list id="l001">
> 
> <item>Panorama</item><item>Navigator</item><item>Notepad</item></list>
> 
> <ablock href="http://...somewhere...">
>  <AppToOpenWith href="#l001" reftype="indirect">
>  </ablock>
> 
> then I think we have a sequenced property value.  Does this work?
> 
> Cheers, Tim Bray
> tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Thu Aug  7 18:46:54 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:14 2004
Subject: XML and whitespace: lets just dump CR and LF!
Message-ID: <199708071652.CAA06410@jawa.chilli.net.au>

 
> From: Eric Baatz <eric.baatz@East.Sun.COM>
 
> > XML applications should ignore *ALL* CR and LF as a bad joke.
> 
> That doesn't seem reasonable from my point of view, although an option to do 
> so might be reasonable.  For example, my XML application, which reads text 
> and speaks it, is likely to be fed existing text that is only lightly marked 
> up with XML and that uses CR/LF (or newlines) and whitespace to convey 
> important information.  My application needs to see that information to 
> operate in an acceptable manner.  For example, input could be narrative 
> paragraphs denoted by adjacent newlines (or CR/LF's), poetry (lots of 
> prosodic information is in the the breaks and whitespace), or columns of 
> text (such as newspapers) and numbers (such as spreadsheets) that have not 
> been reduced to a single logical flow of characters.

Under the current proposals, white-space is preserved or defaulted. (This 
relates to labelling data for applications, not on how the application
presents it.) So there is no way to indicate whether newlines are hard returns 
or soft returns.

I think this hearkens back to XML last year, when the idea was around that 
XML without declarations would be mainly used for closed-systems, where the
recieving end had been built with a specific DTD in mind. 

Now it seems that this is not a big factor in the WG's mind, as the
XML-ATTRIBUTE discussion show: the WG wants to support systems that work
with many DTDs, even if they are not declared.  (I, of course, think this
is a mistaken change in direction for XML, but I bow to collective wisdom.)

Under a closed-system approach, it made sense to say "default" or "preserve",
since "default" and "preserve" might have some determinate meaning.  Under
the new all-singing-all-dancing direction for XML, I think they make little sense.

If XML-SPACE is just "preserve" or "default", then document instance's
newline coventions must be tailored for each application.  But what if we
are processing against an architectural form? Then every instance must
use the the newline conventions belonging to the meta-Document Type Definition.
And what if you have different AFs active at different parts of the document,
or even applicable concurrently on some elements? Then all the meta-DTD's 
newline conventions must match, or you must adopt different conventions
at different parts of the document.  

A hard return should be explicitly marked up: whether it is an attribute or
a PI or a <BR/> element or &#x2028;, it should not be stuck outside the
element in CSS or DSSSL--it is part of the data, not an artifact of formatting.

(I suppose that the Remappers will think it desirable to define a new standard
XML attribute that specifies which convention you use (PI, attribute, <BR>,
character reference, entity reference) to signify hard returns, and then
provide other attributes to let us cope with existing DTDs that have churlishly
adopted their own, prior, conventions.  But I think it is simpler to merely
say "The only way to signify hard returns in XML is  &#x2028;" )

If you have gotten rid of hard returns, then next we need to sort out
newlines that are soft returns in data from newlines that are in 
(or "attributable to") markup or element content.  For this distinction,
XML-SPACE may be good enough, in a brutish way.  But I think that the
Interleaf option, of making newlines not significant for presentation, is
superior, for the reasons given before.  I would also add another: it
may simplify indexing into character strings--if you decide "CR and LF
are not significant for presentation or indexing" then you get rid of 
the problem of documents needing to tell you which newline conventions they
have adopted: you don't care, and the users are free to translate between
different conventions without impacting indexes into documents (all other 
things being equal).


Rick Jelliffe


P.S. An Omnimark program to markup an existing well-formed HTML-in-XML 
document would be merely to add to a XML normaliser:

TRANSLATE "%n" WHEN ANCESTOR IS PRE        
	OUTPUT "&#x2028;%n"

TRANSLATE "%n" 
	OUTPUT "%n "

This does not seem too complex at all. 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From shawnhsu at ARC.unm.EDU  Thu Aug  7 21:53:23 1997
From: shawnhsu at ARC.unm.EDU (Xu, Xiang)
Date: Mon Jun  7 16:58:14 2004
Subject: Publisher Seeking XML Authors
Message-ID: <3.0.1.32.19970807135137.006adf58@arc.unm.edu>

Hi:

We are a computer book publishing company by the name of Bigi International
USA. Asking who will be interested in writing books about XML. Please reply
to us as soon as possible.
Thanks

Best Regards
-Xiang Xu
=================================
Xu, Xiang
Bigi International USA Inc.
email: shawnhsu@arc.unm.edu
http://www.bigiintl.com
Tel:(505)830-1443(O), (505)232-8223(H)
FAX:(505)830-1448
2501 San Pedro Blvd., NE, Suite 208
Albuquerque, NM 87110, USA


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From john at datachannel.com  Fri Aug  8 02:32:08 1997
From: john at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:15 2004
Subject: DOM and Xapi-J
Message-ID: <33EA6992.B6615F6D@datachannel.com>

Some questions have arisen as to the possibility of conflicting overlap
between the DOM and Xapi-J. For those areas where they do overlap, I see
Xapi-J as eventually being a proper subset of the DOM. For example:

The DOM is language independent. Xapi-J is Java only.

The DOM is platform independent. Xapi-J is for the Java platform only.
Xapi-J is designed to be a stylistically consistent extention to the JDK
which embeds it even further into Java e.g. see the recent thread
entitled "Xapi-J: an architectural detail"

The DOM covers HTML and XML. Xapi-J only covers XML.

Eventually, I would think that Xapi-J compliant processors would be seen
as having a DOM-compliant object model of an XML document because they
will eventually use the DOM's Java language bindings exactly. There are
also many other features of the DOM requirements which are not reflected
in Xapi-J. The parts of Xapi-J related to how a developer instantiates a
processor and optionally get ESIS parse events out of one of these
JavaBeans does not overlap with the DOM work.

I think we can declare Xapi-J 1.0 complete at any time now. When the DOM
is done I think Xapi-J should be reved to be a direct subset of the
DOM's object model using the DOM's object model and method signatures
exactly. That is the only part I see where there is overlap and it would
be a shame to have two very similar but different object models of an
XML document. The original goal of Xapi-J was to come up with a unified
model/api for Java developers who are using/writing XML processors. To
not reflect the work of the DOM WG would defeat the whole idea.

--
John Tigue
Sr. Software Architect
DataChannel
http://www.datachannel.com
jtigue@datachannel.com
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 263 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970808/3df6ada0/vcard.vcf
From Peter at ursus.demon.co.uk  Fri Aug  8 10:06:49 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:15 2004
Subject: DOM and Xapi-J
Message-ID: <9260@ursus.demon.co.uk>

In message <33EA6992.B6615F6D@datachannel.com> john@datachannel.com (John Tigue) writes:
[...]
> 
> I think we can declare Xapi-J 1.0 complete at any time now. When the DOM

I think this is a great achievement, and I'd like to thank John both for the
API and for continuing the momentum. Also thanks to everyone who has 
contributed ideas.

This group is not, of course, part of the formal process of XML under W3C, but 
I believe that anyone involved in XML development will take Xapi-J as a central
reference. I would suggest that those who have pages publicising XML resources 
should include this.  John - is there now a definitive URL that should be used?


> is done I think Xapi-J should be reved to be a direct subset of the
> DOM's object model using the DOM's object model and method signatures
> exactly. That is the only part I see where there is overlap and it would
> be a shame to have two very similar but different object models of an
> XML document. The original goal of Xapi-J was to come up with a unified
> model/api for Java developers who are using/writing XML processors. To
> not reflect the work of the DOM WG would defeat the whole idea.

It seems clear to me that there will continue to be revisions to many parts
of XML (we do not yet have a definitive version). So revision of Xapi-J to 
be consistent with DOM will be one of several such adjustments or extensions.

I hope that other reference documents will come out of public debate on XML-DEV.
I think there are going to be a large number of problems which are not defined
by the spec and which are not felt appropriate for discussion by the WG (the 
formal W3C body) or the SIG (now not public).   From time to time it is 
suggested that 'this is an implementation problem, - perhaps XML-DEV would be 
appropriate?'  

Some of us are concerned that uncoordinated (though well-meant) implementation
of XML applications and tools will create a range of inconsistent approaches.
At present XML-DEV is the only forum for discussing these and I think we have
a critical role here. Obviously any contributions are voluntary, not part of the
W3C process, but if we continue to come up with well-thought out documents or 
proposals they should have an important role. 

Some areas where I think guidance for implementers is critically needed NOW 
(in rough priority) are:
	- whitespace processing
	- error processing
	- treatment of defaults and inheritance
	- interpretation of XML-LINK constructs
 
Volunteers? Should we adopt any sort of informal process?  Thoughts? :-) Perhaps
those who are able to be present at XML-DEV day might wish to discuss this?

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Alain.Michard at inria.fr  Fri Aug  8 13:57:44 1997
From: Alain.Michard at inria.fr (Alain Michard)
Date: Mon Jun  7 16:58:15 2004
Subject: FPIs as locators in XML-links ?
Message-ID: <199708081157.NAA21705@yana.inria.fr>

I may miss something (and in that case thanks for the help!) , but I
feel that the current specification of locator in XML-LINK don't open an
easy way to avoid including URLs in XML documents.
It is true that a URL can be considered as a unique id of one physical copy
of a  resource, but this id is transient : machines may be changed, and
"publishers" (any entity putting content on the Web) may decide to migrate a
document repository from one place to another.
If locators in XML-LINKS are URLs, this implies that in case of change of
URL, many authors
-a- should ideally be notified in some way of the change if it is relevant
for them (ie: if they have in their own documents links pointing to
resources which have changed of URL)
-b- have to retrieve all the document they have published which contain a
link to the modified URL;
-c- have to edit all these documents.
That's in fact exactly the situation with the HTML-based Web.

I feel that the SGML practice to use  Public Identifiers and to store
mappings of PUBLIC identifiers to SYSTEM identifiers in a Catalogue file
facilitates greatly the management of large collection of documents:
- a "publisher" may distribute updates of his public catalogue to the
community with which he shares a number of resources;
- the catalogue is the only file you need to edit when any Public ID gets
associated to a new physical resource;
- in case of mirror copies of Web sites, the catalogue may be an easy mean
to impose to your browser to look for a given document at a given site,
without having to specify it at each traversal of a link;

Moreover, including URLs in XML documents appears contradictory to the
general phylosophy of SGML, which I guess could be resumed as "Ensure
long-term life of documents".

I would be very interested to read some comments from SGML experts on the
list, to help me understand the reasons why the XML draft specs
exclude -so far- using Public IDs in links.

Best Regards

Alain Michard
Mediaculture - Direction du D?veloppement
INRIA  -  Domaine de Voluceau
BP 105
F-78153 Le Chesnay Cedex - France
Tel: +33 1 3963 5472   Fax: +33 1 3963 5114


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Fri Aug  8 18:02:27 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:15 2004
Subject: FPIs as locators in XML-links ?
Message-ID: <3.0.32.19970808084641.008d9320@pop.intergate.bc.ca>

At 01:58 PM 08/08/97 +0200, Alain Michard wrote:
>I would be very interested to read some comments from SGML experts on the
>list, to help me understand the reasons why the XML draft specs
>exclude -so far- using Public IDs in links.

Two reasons, really.  XML-Link is designed specifically for use in 
the context of the Web, and on the the Web, things exist if they
can be addressed by URI's, otherwise not.

Secondly, whereas PUBLIC identifiers are very interesting and useful,
it is not the case that virtually every server and desktop in the
world comes with excellent free machinery to use them across the network,
which is the case with URLs.  -Tim

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Fri Aug  8 19:18:37 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:15 2004
Subject: FPIs as locators in XML-links ?
Message-ID: <3.0.32.19970808101550.008dfaf0@pop.intergate.bc.ca>

At 01:08 PM 08/08/97 -0400, Paul Prescod wrote:
>> Two reasons, really.  XML-Link is designed specifically for use in
>> the context of the Web, and on the the Web, things exist if they
>> can be addressed by URI's, otherwise not.
>> 
>> Secondly, whereas PUBLIC identifiers are very interesting and useful,
>> it is not the case that virtually every server and desktop in the
>> world comes with excellent free machinery to use them across the network,
>> which is the case with URLs.  -Tim
>
>1. Don't these arguments apply equally to XML-Lang? 

No.  Links on the Web are based on URI's; that's a fact of life.  If 
you want another kind of link that isn't, go ahead and build it, but
our mandate was to build a Web-oriented hyperlinking facility.  There is
no Web machinery that knows anything about FPI's.

>2. You've argued why PUBLIC identifiers will sometimes not be useful.
>You haven't argued why they will *never* be useful. They were put into
>XML Lang because some argued that they will sometimes need them. That
>applies to XML-Link equally.

Yes, you've said this many times.  So far, the WG membership is
unconvinced.

>3. What about entities declared through system identifiers? Why can't I
>link to them through their entity names?

Because that's not how things are done on the Web.  Of course, you in
XML you *can* say 
<a href="http://&host;/&path;/&base-file;#&xpointer;">

> What is the point of "binary"
>entities if the "standard" linking and transclusion mechanism can't use
>them? Or to go the other way, why wouldn't the standard linking and
>transclusion mechanism be able to use the standard mechanism for mapping
>external resources into document names?

The key point is the use of the word "standard".  The use of entities
and PUBLIC identifiers is standard only in the world of SGML.  For
interoperation with the universe of Web documents, the only standard
way to do things is the URI mechanism.

To summarize, we were not trying to extend the SGML entity mechanism
to do network hypertext; we were trying to extend the existing Web
hypertext mechanism to be usable in XML.

Anyhow, this argument is over.  Sorry. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From neil at bradley.co.uk  Sat Aug  9 12:15:01 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:15 2004
Subject: 5 Whitespace Rules
Message-ID: <199708091014.LAA11574@andromeda.ndirect.co.uk>


I think it's time to pin down some rules or guidelines regarding 
the use of whitespace. I am not suggesting that the following is 
exhaustive or totally unambiguous, but maybe it is a starting point 
for discussion. I would really like to see a small list of rules such 
as the following being defined, as I am sure it will help avoid potentially 
damaging confusion arising when products arrive and prove to be 
incompatible.

One of the problems of defining rules for XML has been the grouping 
of line-end codes with space separating characters under the 'S' 
rule. By separating these concepts, it is quite easy to define rules 
with are both backward compatible with SGML and HTML (very important 
in its own right) and also intuitive.

While the idea of ignoring all line-end codes and manually inserting spaces at 
the start of each line to compensate is at first sight attractive, 
it is certainly not intuitive, and there are plenty of text files in existence 
(including SGML and HTML files, of course), which do not follow this 
convention.

--------------------
An application should remove or transform whitespace characters 
received from the XML-processor according to the following 5 rules:

RULE 1. Every CR and LF code is regarded as a line-end signal, except 
when it immediately follows the other code ([CR][LF] or [LF][CR]), in which 
case it is discarded (and is also ignored, so has no effect on 
calculations for the next character). This rule applies even in 'preserved' content.

/*
This rule standardizes input from documents prepared on Mac, Unix and 
MS-DOS/Windows platforms.

[CR] ---> line-end
[LF] ---> line-end
[CR][LF] ---> line-end
[LF][LF] ---> line-end, line-end
[CR][CR] ---> line-end, line-end
[CR][LF][CR][LF] ---> line-end, line-end (because both LF's are 
ignored)

By including this rule in preserved content, we avoid alternate blank 
lines appearing in documents prepared on an MS-DOS system but viewed 
on another system.
*/

RULE 2. A line-end code (or codes) immediately following a start-tag, PI or 
declaration, or immediately preceding an end-tag, is discarded (except in 
preserved content).

/*
 <note>[CR][CR]<p>[CR]This is a para in a note.[CR]</p>

becomes:

 <note><p>This is a para in a note.</p>

But the CRs below are not removed (they are later converted to a space - see rule 
4):

 <p>Here is an[CR]
 <em>emphasised</em>[CR]
 word.</p>

becomes:

 <p>Here is an <em>emphasised</em> word.</p>  
*/

RULE 3. All other whitespace in element content  is  discarded.

/*
 <note>[SP][TAB]<p>This is a para in a note...

becomes (in validated input):

 <note><p>This is a para in a note...

Note that only the presence of spaces and tabs in element content, 
which is not common, will cause discrepancies between validated and 
non-validated processing.
*/

RULE 4.  Line-end codes are discarded when preceded by a hard 
or soft ('&#176;') hyphen (and a soft hyphen is also discarded).
Remaining line-end codes are treated as spaces.

/*
 A[CR]
 line-[CR]
 end code sep&#176;[CR]
 erates lines.

becomes:

 A line-end code seperates lines.
*/

RULE 5. Consecutive whitespace characters (including translated 
line-end codes) are reduced to a single space, except in preserved mode.

/*
 These lines are divide by a space[SP][CR]
 and carriage[SP][TAB][SP]return.

becomes:

 These lines are divided by a space and carriage return.
*/
------------------------------

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jamesr at steptwo.com.au  Sat Aug  9 12:29:43 1997
From: jamesr at steptwo.com.au (James Robertson)
Date: Mon Jun  7 16:58:15 2004
Subject: 5 Whitespace Rules
In-Reply-To: <199708091014.LAA11574@andromeda.ndirect.co.uk>
Message-ID: <3.0.2.32.19970809202726.00a9abe0@magna.com.au>

At 23:13 8/08/97 +0000, you wrote:
  | 
  | I think it's time to pin down some rules or guidelines regarding 
  | the use of whitespace. I am not suggesting that the following is 
  | exhaustive or totally unambiguous, but maybe it is a starting point 
  | for discussion. I would really like to see a small list of rules such 
  | as the following being defined, as I am sure it will help avoid
potentially 
  | damaging confusion arising when products arrive and prove to be 
  | incompatible.

  | An application should remove or transform whitespace characters 
  | received from the XML-processor according to the following 5 rules:

    [snip]

Hear hear. These are practical, useful rules, and I can find no
fault with them. They are certainly much more backwards-compatible than
the suggested solution of ignoring all line-end characters.

My vote: make it so ...

J 

-------------------------
James Robertson
Step Two Designs
Newton & SGML Consultancy
jamesr@steptwo.com.au

"Beyond the Idea"

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From paul at arbortext.com  Sat Aug  9 15:10:17 1997
From: paul at arbortext.com (Paul Grosso)
Date: Mon Jun  7 16:58:15 2004
Subject: 5 Whitespace Rules
Message-ID: <3.0.32.19970809060751.00698840@pophost.arbortext.com>

At 23:13 1997 08 08 +0000, Neil Bradley wrote:
>RULE 3. All other whitespace in element content  is  discarded.

>
>Note that only the presence of spaces and tabs in element content, 
>which is not common, will cause discrepancies between validated and 
>non-validated processing.

This is the crux of the problem.  As soon as you say something about
element content, you get different results from the document when you
process the DTD and when you don't.  

You don't say explicitly what happens when you don't process the DTD,
but I assume your Rule 3 doesn't do anything in that case.  Therefore,
your Rule 5 will turn all line-end codes into a space, and it is
extremely common to have line-end codes in element content.  So your
Rule 3 will cause you to end up with lots of spaces when you process
in the absence of  a DTD that you wouldn't get when you process in the
presence of the DTD.

>
>RULE 4.  Line-end codes are discarded when preceded by a hard 
>or soft ('&#176;') hyphen (and a soft hyphen is also discarded).
>Remaining line-end codes are treated as spaces.

This might be a nice heuristic for incoming WP files, but it doesn't
agree with SGML.  If I had "a - b" in my document and a line-end
happened to occur after the -, you'd turn my file into "a -b".

paul

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From neil at bradley.co.uk  Sat Aug  9 15:59:40 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:15 2004
Subject: 5 Whitespace Rules
Message-ID: <199708091359.OAA22023@andromeda.ndirect.co.uk>


> Reply-to:      Paul Grosso <paul@arbortext.com>

> At 23:13 1997 08 08 +0000, Neil Bradley wrote:
> >RULE 3. All other whitespace in element content  is  discarded.
> 
> >
> >Note that only the presence of spaces and tabs in element content,
> >which is not common, will cause discrepancies between validated and
> > non-validated processing.
> 
> This is the crux of the problem.  As soon as you say something about
> element content, you get different results from the document when
> you process the DTD and when you don't.  

Yes, but as I say, the problem only arises if people put spaces or
tabs in element content, which in my experience is very unusual.

> You don't say explicitly what happens when you don't process the
> DTD, but I assume your Rule 3 doesn't do anything in that case. 
> Therefore, your Rule 5 will turn all line-end codes into a space,
> and it is extremely common to have line-end codes in element
> content.  So your Rule 3 will cause you to end up with lots of
> spaces when you process in the absence of  a DTD that you wouldn't
> get when you process in the presence of the DTD.

No, Rule 2 has already dispensed with these CR and LF codes. I 
should have made it clear that this rule applies to non-validated
input.  So...

 <chapter>[CR]
 <note>[CR]
 <p>[CR]
 This is a para in a note[CR]
 </p>[CR]
 </note>[CR]
 ...

becomes

 <chapter><note><p>This is
 a para in a note</p></note>...

...before Rules 3 and 5 are applied.

This was my whole point about separating line-end code processing from
spacing character processing.

> >
> >RULE 4.  Line-end codes are discarded when preceded by a hard or
> >soft ('&#176;') hyphen (and a soft hyphen is also discarded).
> >Remaining line-end codes are treated as spaces.
> 
> This might be a nice heuristic for incoming WP files, but it doesn't
> agree with SGML.  If I had "a - b" in my document and a line-end
> happened to occur after the -, you'd turn my file into "a -b".

Yes, well, I can only suggest this is unlikely to happen, and in any
case Rule 4 is only a suggestion for paginating applications. I am
open to suggestions here, but for now I am far more concerned about
the Rules 1 to 3.

> paul

Neil.

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ak117 at freenet.carleton.ca  Sat Aug  9 16:02:51 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun  7 16:58:15 2004
Subject: PSGML-XML
Message-ID: <199708091359.JAA00319@localhost>

A couple of weeks ago, I patched PSGML to add an XML mode that enables
XML-specific delimiters, parsing, and error-reporting (in other words,
it's a real, native XML DTD-driven editor).

(QUERY: IS THIS THE FIRST NATIVE XML EDITOR AVAILABLE?)

I'm waiting to hear back from Lennart Staflin on integrating this into
the main distribution; in the mean time, I'm looking for some alpha
testers who meet the following criteria:

1) You are familiar with both SGML and XML.
2) You are an intermediate to advanced Emacs user (as a minimum, you
   should know how to byte-compile modules, modify the load path, and
   set start-up variables).
3) You are currently using PSGML 1.0.1 and are familiar with its
   commands.

If you're interested, please send me a message, and I'll send you the
patches against PSGML 1.0.1 next week.

*** I am _not_ prepared to provide help on Emacs configuration (etc.)
    at the alpha stage, so please don't reply unless you are either an
    experienced Emacs user or you have easy access to one.

For your information, here are the current features:

************************************************************************
XML FEATURES CURRENTLY SUPPORTED
************************************************************************

- understands "/>" TAGC for empty elements, and inserts it by default
- requires "?>" PIC for processing instructions
- always quotes attribute value literals
- Reports the following DTD errors:
   * use of AND-connector in content model in element declaration
   * use of name group for element type in element declaration
   * use of omitted tag minimization in element declaration
   * use of CDATA or RCDATA declared content
   * use of inclusion or exclusion exceptions
   * declaration of external CDATA, SDATA, or SUBDOC entities
   * declaration of internal CDATA, SDATA, PI, STARTTAG, ENDTAG, MS,
     or MD entities
   * declaration of data attributes
   * use of name group for associated element type in ATTLIST
   * declaration of NAME, NAMES, NUMBER, NUMBERS, NUTOKEN, or NUTOKENS
     attributes
   * declaration of #CURRENT or #CONREF attributes
   * a public identifier that is not accompanied by a system identifier
- Reports the following general errors:
   * data entity references in data
   * nested comments (enforces XML-style comments)
   * use of tag minimization


************************************************************************
XML FEATURES NOT YET SUPPORTED
************************************************************************

- allow SYSIDs to be URLs
- validate that mixed content follows XML restrictions
- validate that marked sections in DTD are either INCLUDE or IGNORE
- validate that marked sections in content are CDATA (no parameter entities)
- validate that XML declaration is present
- probably many others that I've missed


All the best,


David

-- 
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From john at datachannel.com  Sun Aug 10 06:46:53 1997
From: john at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:15 2004
Subject: XML sample application
References: <852564DB.0067F3F4.00@bna-03.bna.com>
Message-ID: <33ED4848.E15DDF93@datachannel.com>

sdarya@bna.com wrote:

> I tried the demo on MS IE 3 (Windows95). I get a program exception
> error
> and IE crashes. Do I have to have Netscape?
>
> <snip/>

If you are referring to the Java applet at
http://www.datachannel.com/xml/viewer, it has been shown to work on all
major browsers on all major platforms. This XML viewer has been
thoroughly tested and tech-supported past all serious problems. Please
do not post tech support questions about DataChannel demo code to
xml-dev; I do not believe that they are interested in such matters. If
anyone has any questions, please e-mail me directly at
john@datachannel.com. It will be my pleasure to help get the viewer
running on your machine.


--
John Tigue
Sr. Software Architect
DataChannel
http://www.datachannel.com
jtigue@datachannel.com
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 263 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970810/f80bc645/vcard.vcf
From murata at apsdc.ksp.fujixerox.co.jp  Mon Aug 11 06:11:23 1997
From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto)
Date: Mon Jun  7 16:58:15 2004
Subject: XML-Link: Relative URL expansion
In-Reply-To: <3.0.32.19970806152015.00830410@pop.intergate.bc.ca>
Message-ID: <9708110411.AA01143@lute.apsdc.ksp.fujixerox.co.jp>

Tim Bray writes:
>More properly "containing resource", but yes.  Check the XML spec, section
>4.3.2, for some more details.

Let me point out a minor issue.  When the XML document is not stored 
in anything but directly appears in the stream given to the XML parser, we 
do not know what is the "containing resource".  Probably, relative URL's 
in such XML documents are errors?

MURATA Makoto (FAMILY Given)
Fuji Xerox Information Systems
 
Tel: 044-812-7230   Fax: 044-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From neil at bradley.co.uk  Mon Aug 11 11:48:30 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:15 2004
Subject: Whitespace rules (v2)
Message-ID: <199708110948.KAA20836@andromeda.ndirect.co.uk>

Due to some useful feedback, and further thoughts of my own, I would 
like to amend my list of 5 whitespace rules in a few respects.

For people who read the previous set of rules, the corrections are:

a) block-enclosing elements must be identified via list or style 
sheet
b) PI, Comment and empty element processing has totally changed
c) all rules explicitly apply to both validating and non-validating applications
d) the rules are explicitly to be applied in sequence

The new rules can be summarized as:

1. normalize line-end codes
2. Remove block surrounding whitespace
3. Remove leading/trailing block line-ends
4. Join lines and de-hyphenate
5. Remove surplus spaces in text

------WHITESPACE RULES------

A formatting application should remove or transform whitespace characters 
received from the XML-processor according to the following 5
rules. These rules are to be applied in sequence, by both validating and 
non-validating applications.

Note 1: PI's, comments and empty elements may be removed, and at 
any point in the process. 

Note 2: in some cases, 'line-end' codes (CR and LF) are distinguished 
from 'spacing' characters (SP and TAB), but the term 'whitespace' 
continues to indicate all these characters


----------
RULE 1. Every line-end code is regarded as a line terminator, except
when it immediately follows the other code ([CR] following [LF] or 
[LF] following [CR]), in which case it is discarded (and is also
ignored, so has no effect on calculations for the next character).
This rule also applies in 'preserved' content.
---
Note: this rule standardizes input from documents prepared on Mac, Unix and
MS-DOS/Windows platforms.

[CR] ---> line-end
[LF] ---> line-end
[CR][LF] ---> line-end
[LF][CR] ---> line-end
[LF][LF] ---> line-end, line-end
[CR][CR] ---> line-end, line-end
[CR][LF][CR][LF] ---> line-end, line-end (because both LF's are 
ignored)

Note: by including this rule in preserved content, we avoid alternate blank
lines appearing in documents prepared on an MS-DOS system but viewed
on another system.


----------
RULE 2. All whitespace preceding the start-tag and following the end-tag 
of a 'block enclosing' element is discarded.
---
Note: a non-validating applications must refer to a style sheet or
configuration file to identify 'block enclosing' elements (perhaps by 
applying this rule to elements not specified as in-line elements).
As a validating application cannot easily determine this rule from the
content model (the first mixed content element in the hierarchy is 
block enclosing, as well as all outer layers), it may choose the same 
approach. 


Note:

 <chapter>[SP]<note>[SP][TAB]<p>This is a[SP]<em>para</em>...

becomes:

 <chapter><note><p>This is a[SP]<em>para</em>

and:

 <p>Para 1.</p>[CR]
 <p>Para 2.</p>

becomes:

 <p>Para 1.</p><p>Para 2.</p>

Note: If PI's, comments or empty elements remain in the data stream,
they are deemed transparent to this process, so:

 [SP]<!--comment--><p>Some text...

becomes:

 <!--comment--><p>Some text...


----------
RULE 3. A sequence of one or more line-end codes immediately
following a start-tag, or immediately preceding an end-tag, are
discarded (except in preserved content).
---
Note:

 <note>[CR]
 <p>[CR]
 This is a para in a note.[CR]
 </p>

becomes:

 <note><p>This is a para in a note.</p>

Note: If PI's, comments or empty-elements remain in the data stream, 
they are deemed transparent to this process, so:

 <p><!-- a comment -->[CR]
 some text...

becomes:

 <p><!-- a comment -->some text...


----------
RULE 4.  A remaining line-end code is converted into a space, except when it is 
preceded by a normal (hard) hyphen, or by a soft hyphen ('&#176;'), 
in which case it is removed (a soft hyphen is also then removed). 
---
Note:

 A[CR]
 line-[CR]
 end code sep&#176;[CR]
 erates lines.

becomes:

 A line-end code seperates lines.

Note: PI's, comments and empty elements are treated as text, so:

 <p>Some[CR]
 <!-- comment -->[CR]
 text.

becomes:

 <p>Some[SP]<!-- comment -->[SP]text.

Note: if a space is required after the hyphen, it must be inserted before the 
line-end:

 4 -[SP][CR]
 3 = 1

becomes:

 4 -[SP][SP]3 = 1 


----------
RULE 5. Consecutive whitespace characters (including translated 
line-end codes) are reduced to a single space, except in preserved
mode.
---
Note:

 4 -[SP][SP]3 = 1 

becomes:

 4 -[SP]3 = 1 

Note: if PI's, comments or empty elements are removed after rule 5:

 <p>Some[SP]<!-- comment -->[SP]text.

has already become:

 <p>Some[SP][SP]text.

but now becomes:

 <p>Some[SP]text.

Note: Multiple spaces can be preserved using the non-break space
character ('&#160;').

 <p>Some&#160;&#160;&#160;spaces.
------------------------------

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From agreene at bitstream.com  Mon Aug 11 14:40:38 1997
From: agreene at bitstream.com (Andrew Greene)
Date: Mon Jun  7 16:58:15 2004
Subject: Whitespace rules (v2)
In-Reply-To: <199708110948.KAA20836@andromeda.ndirect.co.uk>
	(neil@bradley.co.uk)
Message-ID: <19970811123638.AAA6033@AGREENE-PC.bitstream.com>

I'm troubled by one aspect of that suggestion:

> RULE 4.  A remaining line-end code is converted into a space, except
> when it is preceded by a normal (hard) hyphen, or by a soft hyphen
> ('&#176;'), in which case it is removed (a soft hyphen is also then
> removed).                                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  ^^^^^^^

That could alter the semantics of the data stream. The incoming data
stream may have been broken at that point, but we don't want to lose
the fact that such a break is legal -- it may be required again down-
stream.

So, using your example, I think that

> A[CR]
> line-[CR]
> end code sep&#176;[CR]
> arates lines.

should become

 A line-end code sep&#176;arates lines.

and not, as you suggest,

> A line-end code seperates lines.

An individual application may choose to ignore soft hyphens when it
displays (or otherwise handles) the data. 

Does that make sense?

- Andrew Greene


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From m.hampson at ic.ac.uk  Mon Aug 11 15:18:39 1997
From: m.hampson at ic.ac.uk (m.hampson@ic.ac.uk)
Date: Mon Jun  7 16:58:16 2004
Subject: Testing digest - please ignore
Message-ID: <E0wxuMR-0000CJ-00@sphinx.cc.ic.ac.uk>

Testing digest - please ignore
-- 
   +--------------------------------------------------------------------+
   | Martyn Hampson          |    Tel:    0171 594 6973                 |
   | Imperial College        |    Fax:    0171 594 6958                 |
   | Computer Centre         |    E-Mail: M.Hampson@ic.ac.uk            |
   | London SW7 2BP, ENGLAND |    "Don't just do something, sit there!" |
   +--------------------------------------------------------------------+


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From paul at arbortext.com  Mon Aug 11 16:54:14 1997
From: paul at arbortext.com (Paul Grosso)
Date: Mon Jun  7 16:58:16 2004
Subject: Whitespace rules (v2)
Message-ID: <3.0.32.19970811093956.006e1d78@pophost.arbortext.com>

At 22:48 1997 08 10 +0000, Neil Bradley wrote:
>----------
>RULE 2. All whitespace preceding the start-tag and following the end-tag 
>of a 'block enclosing' element is discarded.
>---
>Note: a non-validating applications must refer to a style sheet or
>configuration file to identify 'block enclosing' elements (perhaps by 
>applying this rule to elements not specified as in-line elements).
>As a validating application cannot easily determine this rule from the
>content model (the first mixed content element in the hierarchy is 
>block enclosing, as well as all outer layers), it may choose the same approach. 
>
>Note:
>
> <chapter>[SP]<note>[SP][TAB]<p>This is a[SP]<em>para</em>...
>
>becomes:
>
> <chapter><note><p>This is a[SP]<em>para</em>
>
>and:
>
> <p>Para 1.</p>[CR]
> <p>Para 2.</p>
>
>becomes:
>
> <p>Para 1.</p><p>Para 2.</p>

What if a block enclosing element is contained within a block enclosing
element?  You appear to be trying to use different terms to describe
what is effectively the issue of element content versus mixed content.

How is requiring a style sheet or configuration file to indicate which
elements are "block enclosing" different from having a DTD or partial
set of declarations to indicate which elements have element content?

paul

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From capt at augusta.inf.elte.hu  Mon Aug 11 18:35:42 1997
From: capt at augusta.inf.elte.hu (Miskovics Gabor)
Date: Mon Jun  7 16:58:16 2004
Subject: XML browser, stylesheet
Message-ID: <33EF3F9A.6E5C5BD8@augusta.inf.elte.hu>

Hi!

I'm looking for XML browsers, XML stylesheet DTDs, and XML stylesheets.
Can anyone help me?

Bye,
	Capt
-- 
Miskovics Gabor
    E-mail: capt@augusta.inf.elte.hu
    Web:    http://augusta.inf.elte.hu/~capt

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From neil at bradley.co.uk  Mon Aug 11 18:42:00 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:16 2004
Subject: Whitespace rules (v2)
Message-ID: <199708111641.RAA18361@andromeda.ndirect.co.uk>

Paul Grosso wote:

> At 22:48 1997 08 10 +0000, Neil Bradley wrote:
> >----------
> >RULE 2. All whitespace preceding the start-tag and following the end-tag 
> >of a 'block enclosing' element is discarded.
> >---
> >Note: a non-validating applications must refer to a style sheet or
> >configuration file to identify 'block enclosing' elements (perhaps by 
> >applying this rule to elements not specified as in-line elements).
> >As a validating application cannot easily determine this rule from the
> >content model (the first mixed content element in the hierarchy is 
> >block enclosing, as well as all outer layers), it may choose the 
same approach. 
> 
> What if a block enclosing element is contained within a block enclosing
> element?  You appear to be trying to use different terms to describe
> what is effectively the issue of element content versus mixed content.
> 
> How is requiring a style sheet or configuration file to indicate which
> elements are "block enclosing" different from having a DTD or partial
> set of declarations to indicate which elements have element content?

The point about style-sheets etc is that even a non-validating 
formatting application will require one, and it can get its 
information from that source. A validating formatter can do the same 
thing, and it is arguably easier than referring to the DTD, which does not 
directly identify block enclosing elements. A Paragraph element with mixed 
content is a block enclosing element, but an embedded Emphasis 
element, also with mixed content, is not! Of course, block enclosing 
elements CAN be identified from the DTD, it is *just* a matter of finding 
the outer-most element with mixed content, and I am not ruling out 
this approach, just saying a validating processor "may choose the 
same approach" as a non-validating processor for convenience.

I know this is far from ideal, and I hope someone can suggest 
something better. If not, I would still prefer this rule to nothing, 
or to ignoring all line-end codes.

Neil.

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From h.rzepa at ic.ac.uk  Tue Aug 12 17:57:10 1997
From: h.rzepa at ic.ac.uk (Rzepa, Henry)
Date: Mon Jun  7 16:58:16 2004
Subject: Digests for xml-dev
Message-ID: <v03110715b0163801ce02@[155.198.224.86]>

Anyone wishing to receive weekly digests (on Monday) of the xml-dev list
should subscribe as follows

mailto:majordomo@ic.ac.uk  the request

subscribe xml-dev-digest

(if possible, do  NOT use the form
subscribe xml-dev-digest yourothermailaddress, since I have to moderate
such requests, and this may not happen instantly!)

If you wish to STOP receiving daily postings, you should
mailto:majordomo@ic.ac.uk  the request
unsubscribe xml-dev.

Members of either list will be able to post messages to xml-dev@ic.ac.uk

Dr Henry Rzepa,  Dept. Chemistry,  Imperial College,  LONDON SW7 2AY;
mailto:rzepa@ic.ac.uk; Tel  (44) 171 594 5774; Fax: (44) 171 594 5804.
URL: http://www.ch.ic.ac.uk/rzepa/ 


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jimg at digitalthink.com  Wed Aug 13 21:01:51 1997
From: jimg at digitalthink.com (Jim Gindling)
Date: Mon Jun  7 16:58:16 2004
Subject: Proceedings for the 4th International HyTime Conference?
Message-ID: <01BCA7E0.487807B0.jimg@digitalthink.com>

Hi all,

Does anybody know if proceedings for the 4th International HyTime Conference 
(especially XML Developer's Day) can be obtained by us poor souls who are 
unable to attend?

Thanks in advance.


Jim Gindling
DigitalThink
Software Engineer

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From srn at techno.com  Wed Aug 13 23:35:20 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:58:16 2004
Subject: Proceedings for the 4th International HyTime Conference?
In-Reply-To: <01BCA7E0.487807B0.jimg@digitalthink.com> (message from Jim
	Gindling on Wed, 13 Aug 1997 11:58:57 -0700)
Message-ID: <199708132130.RAA00829@bruno.techno.com>

> Does anybody know if proceedings for the 4th International HyTime Conference 
> (especially XML Developer's Day) can be obtained by us poor souls who are 
> unable to attend?

I can't speak for XML Developers' Day.  Jon?

As for the HyTime Conference, what we have done in the past is to
accept anything any speaker wishes to provide to the public and place
it on the Web, subject to some editing and added value if resources
permit.  You must realize, though, that getting such materials off the
Web is a poor substitute for attending a conference, and not every
speaker is able (for a variety of reasons) to publish everything.

There is another issue here, too.  The GCA can't function without
revenue, and it's not clear that the practice of giving away HyTime
conference proceedings can be continued indefinitely.  In general,
sales of conference proceedings represent a revenue stream for the
GCA.  It is possible that access to such things as HyTime and XML
conference proceedings on the Web may eventually become a "GCA members
only" (or even a pay-per-view!)  privilege.  But please note that I do
not speak for the GCA on this or any other matter, nor have I received
any indication that such a plan is under consideration.  I'm just
pointing out that Adam Smith's invisible hand can be expected to have
its effect here at the appropriate time and in the appropriate way,
once the XML and HyTime conferences have sufficient momentum.

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Wed Aug 13 23:58:18 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:16 2004
Subject: Proceedings for the 4th International HyTime Conference?
In-Reply-To: <01BCA7E0.487807B0.jimg@digitalthink.com> (message from Jim Gindling on Wed, 13 Aug 1997 11:58:57 -0700)
Message-ID: <199708132156.OAA26522@boethius.eng.sun.com>

[Jim Gindling:]

| Does anybody know if proceedings for the 4th International HyTime
| Conference (especially XML Developer's Day) can be obtained by us poor
| souls who are unable to attend?

I don't know about the HyTime Conference, but the Dev Day
presentations are specifically intended to be up-to-the-second
reports, so there are no proceedings in the ordinary sense.

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From pshams at hotmail.com  Thu Aug 14 04:10:56 1997
From: pshams at hotmail.com (Parvez Shams)
Date: Mon Jun  7 16:58:16 2004
Subject: XML parsers,browsers comparisn
Message-ID: <19970814020943.12586.qmail@hotmail.com>

Hello,

I am working on a project with XML. We will be using Symposia for our 
"proof of concept" phase. I am curious to know if anyone did a comparisn 
between all other available XML browsers, parsers, processors. If there 
is such resource is available, please let me know.

Thank you for your help.

Cheers,
Parvez Shams

______________________________________________________
Get Your Private, Free Email at http://www.hotmail.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Fri Aug 15 11:55:59 1997
From: nmikula at edu.uni-klu.ac.at (Norbert Mikula)
Date: Mon Jun  7 16:58:16 2004
Subject: Yet Another XML Article
Message-ID: <Pine.OSF.3.93.970815114635.22596A-100000@edusrv.edu.uni-klu.ac.at>


For those that are not on comp.text.sgml :

http://www.ifi.uio.no/~larsga/download/xml/xml_eng.html

Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Martin.Beet at ncl.ac.uk  Fri Aug 15 16:58:04 1997
From: Martin.Beet at ncl.ac.uk (Martin Beet)
Date: Mon Jun  7 16:58:16 2004
Subject: purpose CDATA sections
Message-ID: <33F46B94.2DF5@ncl.ac.uk>

Hi

I'm in the process of writing (yet) an(other) introduction to XML and
I'm currently plodding through the standard.

The only purpose of the CDATA section (CDSect) I can think of is for
showing code examples. Am I missing something?

Regards, Martin
---------------
University of Newcastle Dept. of Computing Science | Tel:+44 191 2226157
Claremont Tower, Newcastle upon Tyne, NE1 7RU, UK  | Fax:+44 191 2228232

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ak117 at freenet.carleton.ca  Fri Aug 15 17:13:11 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun  7 16:58:16 2004
Subject: purpose CDATA sections
In-Reply-To: <33F46B94.2DF5@ncl.ac.uk>
References: <33F46B94.2DF5@ncl.ac.uk>
Message-ID: <199708151507.LAA02872@localhost>

Martin Beet writes:

 > I'm in the process of writing (yet) an(other) introduction to XML and
 > I'm currently plodding through the standard.
 > 
 > The only purpose of the CDATA section (CDSect) I can think of is for
 > showing code examples. Am I missing something?

That's the general idea, but it's a little narrow.  Here are a few
uses of CDATA marked sections, off the top of my head:

- source code
- excerpts from system log files
- user sessions with a shell (like bash or command.com)
- sample XML markup
- ASCII art
- mathematical text and other special notations (such as embedded TeX)

Here's a non-source-code example:

<caution>
<para>If the teletype machine displays the following text, please
leave the building as quickly as possible:</para>
<output><![CDATA
------------------------------------------------------------------------
			  Earthquake Warning
------------------------------------------------------------------------
]]></output>
</caution>


Good luck with the introduction,


David

-- 
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ebaatz at barbaresco.East.Sun.COM  Fri Aug 15 17:16:56 1997
From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS)
Date: Mon Jun  7 16:58:16 2004
Subject: purpose CDATA sections
Message-ID: <libSDtMail.199708151115.834.ebaatz@barbaresco>

>  The only purpose of the CDATA section (CDSect) I can think of is for
>  showing code examples. Am I missing something?

By "code" do you mean XML markup?

Text other than XML markup can contain characters that might be mistaken
for XML and therefore should be escaped.  It may be more convenient
to stick the entire text into a CDATA rather than individually escaping
each character that an XML processor is sensitive to.  For example:

<EMAIL-HEADER>
<![CDATA[From: Martin Beet <Martin.Beet@ncl.ac.uk>]]>
</EMAIL-HEADER>


Similarly for more specialized text, such as the native commands of a
speech synthesizer (where I don't have any control over the syntax
accepted by the synthesizer):

<SYNTHCMDS ID="Croaker"><![CDATA[<voice=bullfrog><ribbit=1>]]></SYNTHCMDS>

The CDATA method may be easier to generate programatically and it may
be viewed as more readable than individually escaping characters.


Eric Baatz
Sun Microsystems Laboratories
2 Elizabeth Drive, MS UCHL03-207                 (508) 442-0257
Chelmsford, MA 01824                        fax: (508) 250-5067
USA                                    Internet: eric.baatz@east.sun.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From liamquin at interlog.com  Sat Aug 16 07:27:37 1997
From: liamquin at interlog.com (Liam Quin)
Date: Mon Jun  7 16:58:16 2004
Subject: Whitespace rules (v2)
In-Reply-To: <199708110948.KAA20836@andromeda.ndirect.co.uk>
Message-ID: <Pine.BSI.3.95.970816011323.12788A-100000@shell1.interlog.com>

On Sun, 10 Aug 1997, Neil Bradley wrote:

> [...]
> RULE 2. All whitespace preceding the start-tag and following the end-tag 
> of a 'block enclosing' element is discarded.
> ---
> Note: a non-validating applications must refer to a style sheet or
> configuration file to identify 'block enclosing' elements (perhaps by 
> applying this rule to elements not specified as in-line elements).

No -- "blockness" is not at all the same as element content.
For example, you have to allow for a run-in heading, which starts out
looking like an HTML H3 (say) except that the rest of the paragraph
follow on on the same line.  So it isn't a block in the paragraph sense.

> As a validating application cannot easily determine this rule from the
> content model (the first mixed content element in the hierarchy is 
> block enclosing, as well as all outer layers), it may choose the same 
> approach. 

I think this is too complicated, as well as being not 100% right.
I don't think there's a single "right" solution.  This is why it's
best to allow the parser to pass _all_ whitespace back to the application,
although it is certainly useful if a DTD-aware parser, even if it isn't
validating, distinguishes element content whitespace from PCDATA whitespace
in some way.

More than this is a bad idea, I think.


> Note: If PI's, comments or empty elements remain in the data stream,
> they are deemed transparent to this process, so:
>  [SP]<!--comment--><p>Some text...
> 
> becomes:
> 
>  <!--comment--><p>Some text...

Note that if you have a very large comment, you might need a lot of
lookahead here.

> RULE 3. A sequence of one or more line-end codes immediately
> following a start-tag, or immediately preceding an end-tag, are
> discarded (except in preserved content).

This means that
<Paragraph>This is<Emphasis>
very
</Emphasis>strange.</Paragraph>

becomes
<Paragraph>This is<Emphasis>very</Emphasis>strange.</Paragraph>

or, if you format withut distinguishing emphasis,
<Paragraph>This isverystrange.</Paragraph>

which I don't think is what you want.

But SGML itself is broken in this regard.

> RULE 4.  A remaining line-end code is converted into a space, except when it is 
> preceded by a normal (hard) hyphen, or by a soft hyphen ('&#176;'), 
> in which case it is removed (a soft hyphen is also then removed). 
> ---
> Note:
> 
>  A[CR]
>  line-[CR]
>  end code sep&#176;[CR]
>  erates lines.
> 
> becomes:
> 
>  A line-end code seperates lines.

Well, note that there is no hyphen in that paragraph!!
The character "-" in ISO 8859-1 (Latin 1) and ASCII is _not_ a hyphen.
It is a minus sign.

The hyphen is 0255 octal (173 decimal).  It is a hyphen, not a soft hyphen.
There is no soft hyphen in Latin 1.

I don't have the necessary copy of Unicode in front of me, but last time
I checked (Unicode 1.1) it was the same in this regard, and also in having
the ` character be a spacing grave accent, not a single quote.

This should be done by applications.  I wouldn't want your mesage:
    ----------
    RULE 5. Consecutive whitespace characters (including translated 
turrning into
    ----------RULE 5. Consecutive whitespace characters (including translated 
for example.

> Note: Multiple spaces can be preserved using the non-break space
> character ('&#160;').
> 
>  <p>Some&#160;&#160;&#160;spaces.
Er, is this defined in Unicode or in ISO 10646??

Lee

-- 
Liam Quin --  the barefoot typographer -- Toronto
lq-text: freely available Unix text retrieval

email address:
l i a m q u i n    at host:    i n t e r l o g   dot   c o m


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat Aug 16 17:18:21 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:16 2004
Subject: WD-xml-970807 (fwd)
Message-ID: <9499@ursus.demon.co.uk>

Forwarded message follows:

>From Dan Connolly (W3C):

> 
> Please distribute this announcement far and wide.
> 
> ============
> http://www.w3.org/TR/
> 
> Extensible Markup Language (XML) 
>      7 August 1997, Tim Bray, Jean Paoli, C.M. Sperberg-McQueen 
> ============
> 
> http://www.w3.org/TR/WD-xml-970807
> http://www.w3.org/TR/WD-xml-970807.html
> http://www.w3.org/TR/WD-xml-970807.xml
> http://www.w3.org/TR/WD-xml-970807.ps
> http://www.w3.org/TR/WD-xml-970807.ps.zip
> 
[...]
> 
> -- 
> Dan Connolly, W3C Architecture Domain Lead
> http://www.w3.org/People/Connolly/
> 
> 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From neil at bradley.co.uk  Sat Aug 16 19:52:00 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:16 2004
Subject: Whitespace rules (v2)
Message-ID: <199708161751.SAA28294@andromeda.ndirect.co.uk>

Dear Liam,

Thanks for the feedback.

> > [...]
> > RULE 2. All whitespace preceding the start-tag and following the end-tag 
> > of a 'block enclosing' element is discarded.
> > ---
> > Note: a non-validating applications must refer to a style sheet or
> > configuration file to identify 'block enclosing' elements (perhaps by 
> > applying this rule to elements not specified as in-line elements).
> 
> No -- "blockness" is not at all the same as element content.
> For example, you have to allow for a run-in heading, which starts out
> looking like an HTML H3 (say) except that the rest of the paragraph
> follow on on the same line.  So it isn't a block in the paragraph sense.
> 
> > As a validating application cannot easily determine this rule from the
> > content model (the first mixed content element in the hierarchy is 
> > block enclosing, as well as all outer layers), it may choose the same 
> > approach. 
> 
> I think this is too complicated, as well as being not 100% right.
> I don't think there's a single "right" solution.  This is why it's
> best to allow the parser to pass _all_ whitespace back to the application,
> although it is certainly useful if a DTD-aware parser, even if it isn't
> validating, distinguishes element content whitespace from PCDATA whitespace
> in some way.

Note that these rules are intended for the application, not the 
parser, or any other part of the XML processor. As I state at the top of the rules, "A formatting application 
should......according to the following 5 rules".

> > Note: If PI's, comments or empty elements remain in the data stream,
> > they are deemed transparent to this process, so:
> >  [SP]<!--comment--><p>Some text...
> > 
> > becomes:
> > 
> >  <!--comment--><p>Some text...
> 
> Note that if you have a very large comment, you might need a lot of
> lookahead here.

Actually no, because the application would already KNOW that it is 
currently in block content.

> > RULE 3. A sequence of one or more line-end codes immediately
> > following a start-tag, or immediately preceding an end-tag, are
> > discarded (except in preserved content).
> 
> This means that
> <Paragraph>This is<Emphasis>
> very
> </Emphasis>strange.</Paragraph>
> 
> becomes
> <Paragraph>This is<Emphasis>very</Emphasis>strange.</Paragraph>
> 
> or, if you format withut distinguishing emphasis,
> <Paragraph>This isverystrange.</Paragraph>
> 
> which I don't think is what you want.
> 
> But SGML itself is broken in this regard.

I know, and as it is impossible to cover all angles. I think your 
example is one of the least likely things to happen in reality, and if 
necessary document authors must be educated to avoid it.

I am open to other suggestions, of course. I am only trying to get 
detailed discussions rolling. For example, we could get rid of both 
rules 2 and 3, and improve rule 5 to say that all surrounding white 
space is removed. 
 
> > RULE 4.  A remaining line-end code is converted into a space, except when it is 
> > preceded by a normal (hard) hyphen, or by a soft hyphen ('&#176;'), 
> > in which case it is removed (a soft hyphen is also then removed). 
> > ---
> > Note:
> > 
> >  A[CR]
> >  line-[CR]
> >  end code sep&#176;[CR]
> >  erates lines.
> > 
> > becomes:
> > 
> >  A line-end code seperates lines.
> 
> Well, note that there is no hyphen in that paragraph!!
> The character "-" in ISO 8859-1 (Latin 1) and ASCII is _not_ a hyphen.
> It is a minus sign.

Well, most people in the past have used it as a hyphen in text 
documents, which I think is the important point here.

Also, my source tells me that this character is the official ISO 
hyphen - but my source may be wrong.

> The hyphen is 0255 octal (173 decimal).  It is a hyphen, not a soft hyphen.
> There is no soft hyphen in Latin 1

OK. I will take your word on this. Again, my source of information may be wrong.
 
> I don't have the necessary copy of Unicode in front of me, but last time
> I checked (Unicode 1.1) it was the same in this regard, and also in having
> the ` character be a spacing grave accent, not a single quote.
> 
> This should be done by applications.  I wouldn't want your mesage:

It is being done by the application.

What "wouldn't you want your message:"?

>     ----------
>     RULE 5. Consecutive whitespace characters (including translated 
> turrning into
>     ----------RULE 5. Consecutive whitespace characters (including translated 
> for example.
> 
> > Note: Multiple spaces can be preserved using the non-break space
> > character ('&#160;').
> > 
> >  <p>Some&#160;&#160;&#160;spaces.
> Er, is this defined in Unicode or in ISO 10646??

Don't know. I have it as a non-breaking space, which I am 'liberally' 
interpreting here as a required space (if it can't be broken over 
lines, it must be pretty important). If Unicode has a more explicit 
required space character, then fine, let's use that.

> Lee

Neil.


-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat Aug 16 19:56:46 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:16 2004
Subject: Whitespace rules (v2)
Message-ID: <9502@ursus.demon.co.uk>

Firstly many thanks to neil for posting these proposed rules and those who
have answered.  On balance (I am an optimist!) I think there is something
desirable and achievable here.  I think a lot of us feel there has to be
some guidance on whitespace and I think Neil has covered much of the ground.

I think what is achievable is a set of rules at the 80/20 level (80%
of XML-DEV'ers think they are 80% useful). There are certainly areas where there
will be disagreement - this was a voluminous topic on XML-WG last autumn.

XML-DEV has the advantage and disadvantage that it has no formal standing, so
those who don't like anything that comes out of it can ignore it :-).  So if
we can come up with a set of rules and a label for them, application developers
can use them (or not) as they wish.  An advantage is that because all discussion
is publicly archived, we can always point back and say 'that is why we 
suggested X'.  

If a set of rules *does* emerge, then how can we generally inform an application
that it should take them as DEFAULT?  I assume this is through a PI:

<?XML-SPACE-DEFAULT 
   HREF="http://www.lists.ic.ac.uk/hypermail/xml-dev/12345.html"?>
...
<FOO XML-SPACE="DEFAULT">
    The <!-- munge this accodring to XML-DEV whitespace -->whitespace[CR][LF]is
normalised</FOO>

So I think we need a mechanism from XML-WG to show the application where
it should get its DEFAULT processing mechanism from.


Specific points:

[Rule 1 - normalisation]
I think it's essential to have something like Neil's proposal for [CR][LF]

In message <Pine.BSI.3.95.970816011323.12788A-100000@shell1.interlog.com> 
Liam Quin writes:
> On Sun, 10 Aug 1997, Neil Bradley wrote:
> 
> > [...]
> > RULE 2. All whitespace preceding the start-tag and following the end-tag 
> > of a 'block enclosing' element is discarded.
> > ---
> > Note: a non-validating applications must refer to a style sheet or
> > configuration file to identify 'block enclosing' elements (perhaps by 
> > applying this rule to elements not specified as in-line elements).
> 
> No -- "blockness" is not at all the same as element content.
> For example, you have to allow for a run-in heading, which starts out
> looking like an HTML H3 (say) except that the rest of the paragraph
> follow on on the same line.  So it isn't a block in the paragraph sense.
> 
> > As a validating application cannot easily determine this rule from the
> > content model (the first mixed content element in the hierarchy is 
> > block enclosing, as well as all outer layers), it may choose the same 
> > approach. 
> 
> I think this is too complicated, as well as being not 100% right.
> I don't think there's a single "right" solution.  This is why it's
> best to allow the parser to pass _all_ whitespace back to the application,
> although it is certainly useful if a DTD-aware parser, even if it isn't
> validating, distinguishes element content whitespace from PCDATA whitespace
> in some way.

I agree with Liam - I didn't understand 'blockness'.  I also think that whatever
is done here has to be independent of stylesheets and DTDs.  The average hacker
like me simply won't undertsand the subtleties.
> 
> More than this is a bad idea, I think.
> 
> 
> > Note: If PI's, comments or empty elements remain in the data stream,
> > they are deemed transparent to this process, so:
> >  [SP]<!--comment--><p>Some text...
> > 
> > becomes:
> > 
> >  <!--comment--><p>Some text...
> 
> Note that if you have a very large comment, you might need a lot of
> lookahead here.

I would assume that this processing takes place in the application, not the
parser.  How/whether comments are passed to the application is part of the
parser API.  I assume that at this stage the comment is recognised as a single
chunk which can be deleted with/out surrounding whitespace as required.

> 
> > RULE 3. A sequence of one or more line-end codes immediately
> > following a start-tag, or immediately preceding an end-tag, are
> > discarded (except in preserved content).
> 
> This means that
> <Paragraph>This is<Emphasis>
> very
> </Emphasis>strange.</Paragraph>
> 
> becomes
> <Paragraph>This is<Emphasis>very</Emphasis>strange.</Paragraph>
> 
> or, if you format withut distinguishing emphasis,
> <Paragraph>This isverystrange.</Paragraph>
> 
> which I don't think is what you want.
> 
> But SGML itself is broken in this regard.

This one is tough.  Please criticise my current view :-).  SGML documents seem
to use markup as structure in some places (e.g. OL/LI in HTML) or
event streams (e.g. EM, B in HTML). Authors/readers expect different processing
modes from these types. The example above is best treated as structuring
markup (P) containg an event stream (#PCDATA|EM)* [sorry for abbreviations].
So we have to indicate to the processor that P is structuring and that 
whitespace after <P> or before </P> is irrelevant, and that its content is an 
event stream where all whitespace is normalised to a single space (cf HTML.)
Therefore can we have something like this:
<?XML-SPACE STRUCTURE="YES"?>
<Paragraph>
<?XML-SPACE EVENT="YES"?>
This is<Emphasis>very</Emphasis>strange.
<?XML-SPACE STRUCTURE="YES"?>
</Paragraph>

(I am sure there are cleaner ways of doing this, especially declaring this
for all <Paragraphs>s).  The question is whether a model like this meets the
80/20 rule.


> 
> > RULE 4.  A remaining line-end code is converted into a space, except when it is 
> > preceded by a normal (hard) hyphen, or by a soft hyphen ('&#176;'), 
> > in which case it is removed (a soft hyphen is also then removed). 
> > ---

I have to argue against this :-(.  A hyphen is indistinguishable from a minus
to lots of people. There are also many cases where people may wish to end
a line with a minus:
<MOL>
<ATOMS>
CL-
H+
</ATOMS>
</MOL>

Since we are normalising whitespace, then lines can always be arranged so that
hyphens are unnecessary.

Let's see if there is a solution which is simple, covers most of the common
problems and which is intuitively obvious to the webhackers who graduate from
HTML.  We clearly need something more than <PRE> and </PRE>, but it shouldn't
be more than, say, twice as complex.  I think we are a long way towards that.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Sat Aug 16 19:59:48 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:16 2004
Subject: Whitespace rules (v2)
Message-ID: <3.0.32.19970816105650.008fda00@pop.intergate.bc.ca>

I gotta say that it's noble of you guys to take aim at this particular
problem, but you should bear in mind that it's really really really 
hard.  The original goal as stated in SGML was to ignore white
space "caused by markup" by which they meant "used to prettyprint
markup".  A worthy goal, but in fact most people would agree that
the rules you have to write to achieve this are horrendously complicated
and some would argue that SGML never actually did get it right.  

We spent a huge amount of time on this in the XML committee and 
eventually decided that if simple rules could be written, we weren't
smart enough to figure them out.

So good luck, don't expect it to be easy, but if you get it right
the world will be grateful. -Tim

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Aug 17 00:08:29 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:16 2004
Subject: Whitespace rules (v2)
Message-ID: <9506@ursus.demon.co.uk>

In message <3.0.32.19970816105650.008fda00@pop.intergate.bc.ca> Tim Bray writes:

Thanks very much for your support, Tim.  We believe that XML-DEV has a role in
coming up with workable pragmatic solutions to 'parts' of the XML process. 
Getting those all right at once (i.e. for the spec) may be impossible; getting
a few of them mainly right may be a useful step.

> I gotta say that it's noble of you guys to take aim at this particular
> problem, but you should bear in mind that it's really really really 
> hard.  The original goal as stated in SGML was to ignore white
> space "caused by markup" by which they meant "used to prettyprint
> markup".  A worthy goal, but in fact most people would agree that
> the rules you have to write to achieve this are horrendously complicated
> and some would argue that SGML never actually did get it right.  

I'd agree with this. And XML does not work in precisely the same way as SGML 
here.  It's most useful IMO to proceed on the basis that most XML-DEV'ers
will not understand the niceties of SML-whitespace but *will be prepared to
work to a (fairly) simple set of rules*.

If we go for an 80/20 solution (i.e. 80% of users/applications find it useful
80% of the time, that solves 64% - a reasonable starting point...)
> 
> We spent a huge amount of time on this in the XML committee and 

Yes. And it's essential we don't go round this loop again. It will always be
possible to pick holes in a propsed set of rules - so we have to accept there
will be holes from the start. Juts minimise their size and point them out.

> eventually decided that if simple rules could be written, we weren't
> smart enough to figure them out.

I don't think there *is* a solution in terms that a cast-iron spec could 
contemplate (any more than there is one universal DTD). We have to seek a 
compromise solution.  

> 
> So good luck, don't expect it to be easy, but if you get it right
> the world will be grateful. -Tim

Obviously there will be applications which come 'out-of-the-box' - the 
authoring and processing tools are already written and validated, and most
people won't need to see the intermediate XML text.  Maybe CDF is in this
category.  I think we are aiming at those documents which might be processed
by generic XML processors, or composed of cut-n-paste from a variety of
sources (or both). For example, in a combined MathML and CML document, it
is reasonable to expect the whitespace processing to be openly declared, easily
implementable and (hopefully) easy to understand.

I think we can aim for one (or possibly two) protocols that service 'most'
applications.  With those there would be simple guidelines for authors (of
documents and of processing software).

Firstly there are some 'gotchas'. I don't think anyone *wants* CR/LF problems
to be platform-dependent. So we have to address this independently of other
complications. 

IMO most XML documents will fall into the categories:
	(a) precise whitespace matters (PRESERVE or <(HTML)PRE>). The main
problem with using this is the CR/LF one.
	(b) text-like, where markup is for formatting (mixed content, 
event-stream processing).
	(c) structured, often with pretty-printing (i.e. redundant whitespace)
(element content).
	(d) mixtures of (b) and (c). This would be common in technical documents
with a mixture of 'text' and 'non-textual' structured information.

I believe we can come up with simple rules for b/c/d which are reasonably 
intuitive to the webhacker and also cover a wide enough range of applications.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From neil at bradley.co.uk  Sun Aug 17 09:43:33 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:17 2004
Subject: Whitespace rules (v2)
Message-ID: <199708170743.IAA28970@andromeda.ndirect.co.uk>


Peter Murray-Rust wrote:

> If a set of rules *does* emerge, then how can we generally inform an application
> that it should take them as DEFAULT?  I assume this is through a PI:

I was hoping that relevant applications (mainly browsers and 
typesetting systems) will ALWAYS assume the rules that are finally 
determined, except where preserved content (or some other set of 
rules) is explicitly actioned.
 
> I agree with Liam - I didn't understand 'blockness'.  I also think that whatever
> is done here has to be independent of stylesheets and DTDs.  The average hacker
> like me simply won't undertsand the subtleties.

I am merely trying to distinguish in-line elements from other 
elements. An in-line element implies no line-breaks above or below 
it. A 'Block' element therefore DOES imply such a break. I do not use 
the terms element and mixed content here, because it is not quite the 
same thing. As I have said before, a Para element is a 'block' 
element, and has mixed content, but an Emph element is an 'in-line' 
element, yet also has mixed content. All style sheets, including 
CSS, understand the concept of in-line and block elements. Any 
whitespace surrounding a block element MUST be irrelevant.

Liam raised the issue of a half-way element type, such as a header 
which implies a line-break before it, but not after, so that 
following text will appear on the same line. This one is tricky. 
Suggestions anybody?

> I would assume that this processing takes place in the application, not the
> parser.  How/whether comments are passed to the application is part of the
> parser API.  I assume that at this stage the comment is recognised as a single
> chunk which can be deleted with/out surrounding whitespace as required.

As I say at the top of the rules, ALL these rules are applied by the 
application, not the XML processor.
 
> This one is tough.  Please criticise my current view :-).  SGML documents seem
> to use markup as structure in some places (e.g. OL/LI in HTML) or
> event streams (e.g. EM, B in HTML). Authors/readers expect different processing
> modes from these types. The example above is best treated as structuring
> markup (P) containg an event stream (#PCDATA|EM)* [sorry for abbreviations].
> So we have to indicate to the processor that P is structuring and that 
> whitespace after <P> or before </P> is irrelevant, and that its content is an 
> event stream where all whitespace is normalised to a single space (cf HTML.)
> Therefore can we have something like this:
> <?XML-SPACE STRUCTURE="YES"?>
> <Paragraph>
> <?XML-SPACE EVENT="YES"?>
> This is<Emphasis>very</Emphasis>strange.
> <?XML-SPACE STRUCTURE="YES"?>
> </Paragraph>

I think that, ultimately, some combinations of markup will always 
break whatever rules we come up with. We must ensure that only 
obscure, non-intuitive combinations do this, then just shout from 
the rooftops that these combinations are not to be used.
 
> > 
> > > RULE 4.  A remaining line-end code is converted into a space, except when it is 
> > > preceded by a normal (hard) hyphen, or by a soft hyphen ('&#176;'), 
> > > in which case it is removed (a soft hyphen is also then removed). 
> > > ---
> 
> I have to argue against this :-(.  A hyphen is indistinguishable from a minus
> to lots of people. There are also many cases where people may wish to end
> a line with a minus:
> <MOL>
> <ATOMS>
> CL-
> H+
> </ATOMS>
> </MOL>
> 
> Since we are normalising whitespace, then lines can always be arranged so that
> hyphens are unnecessary.

My concern was to address existing text files, where hyphens are 
often used in this way. Maybe I am over-estimating this problem.

Neil.

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Aug 17 13:50:28 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:17 2004
Subject: Whitespace rules (v2)
Message-ID: <9516@ursus.demon.co.uk>

In message <199708170743.IAA28970@andromeda.ndirect.co.uk> "Neil Bradley" writes:
> 
> Peter Murray-Rust wrote:
> 
> > If a set of rules *does* emerge, then how can we generally inform an application
> > that it should take them as DEFAULT?  I assume this is through a PI:
> 
> I was hoping that relevant applications (mainly browsers and 
> typesetting systems) will ALWAYS assume the rules that are finally 
> determined, except where preserved content (or some other set of 
> rules) is explicitly actioned.

I think - along with TimB - that it is unrealistic to come up with s single
set of rules that will server every application.  There was an enormous amount 
of discussion on the XML group last year and I take it as axiomatic that we
cannot produce a set of rules which everyone agrees are:
	- simple to state
	- unambiguous
	- intuitive and easy to learn
	- universal (i.e. cover every situation)

I think that XML will include applications beyond 'browsers and typesetting 
systems' although these will be the commonest. MathML and CML will have 
chunks of material which contains whitespace not used primarily as part of
text.  Here's a simple example:
<MOL>
  <ATOMS>
[HT]C H N    Cl[CR][LF]
[HT]O P Br[CR][LF]
  </ATOMS>
</MOL>
where the whitespace is used (a) for visual effect and potential ease in 
editing (b) as a delimiter (within ATOMS) [HT]=tab, for example. 

What I am after here is a convention that I can state which instructs the 
processor how to treat this whitespace.  ***I do not wish to have to devise
a specific convention for CML***.  I want to be able to indicate that that 
the W/S after <MOL> is irrelevant, and that the whitespace in the ATOMS content 
is normalisable and used only as a delimiter of tokens.

I expect that many other applications will use a similar approach, so I want
to share the effort with them.  Examples of metadata in XML have often been 
portrayed as prettyprinted and I expect that CML could use the same conventions.
[BTW I think that there will be more human editing of XML files than is often
assumed - and metadata is a good example. Prettyprinting is a useful tool
in those cases.]

I think that we can aim for a set of options that could be used by a post-parser
processor. Different applications (**or document authors**) could choose between
them. Examples might be:
	- normaliseCRLF (Neil's Rule 1)
	- discardAllWS
	- normaliseToSingleSpace

An author or application could then state which of these it was using. 

It might be that in the first instance we can only agree on (say) Rule 1, but
this would be a useful start.

>  
> > I agree with Liam - I didn't understand 'blockness'.  I also think that whatever
> > is done here has to be independent of stylesheets and DTDs.  The average hacker
> > like me simply won't undertsand the subtleties.
> 
> I am merely trying to distinguish in-line elements from other 
> elements. An in-line element implies no line-breaks above or below 
> it. A 'Block' element therefore DOES imply such a break. I do not use 
> the terms element and mixed content here, because it is not quite the 
> same thing. As I have said before, a Para element is a 'block' 
> element, and has mixed content, but an Emph element is an 'in-line' 
> element, yet also has mixed content. All style sheets, including 
> CSS, understand the concept of in-line and block elements. Any 
> whitespace surrounding a block element MUST be irrelevant.

It looks like the context, rather than the content is the significant
feature.

> 
> Liam raised the issue of a half-way element type, such as a header 
> which implies a line-break before it, but not after, so that 
> following text will appear on the same line. This one is tricky. 
> Suggestions anybody?


> 
> > I would assume that this processing takes place in the application, not the
> > parser.  How/whether comments are passed to the application is part of the
> > parser API.  I assume that at this stage the comment is recognised as a single
> > chunk which can be deleted with/out surrounding whitespace as required.
> 
> As I say at the top of the rules, ALL these rules are applied by the 
> application, not the XML processor.

Agreed.  This discussion is about how the application behaves.  The question
is whether we can give it some generic instructions.  I'd delete the word
'ALL' if it suggest that you either take all the rules or none.

>  
> > This one is tough.  Please criticise my current view :-).  SGML documents seem
> > to use markup as structure in some places (e.g. OL/LI in HTML) or
> > event streams (e.g. EM, B in HTML). Authors/readers expect different processing
> > modes from these types. The example above is best treated as structuring
> > markup (P) containg an event stream (#PCDATA|EM)* [sorry for abbreviations].
> > So we have to indicate to the processor that P is structuring and that 
> > whitespace after <P> or before </P> is irrelevant, and that its content is an 
> > event stream where all whitespace is normalised to a single space (cf HTML.)
> > Therefore can we have something like this:
> > <?XML-SPACE STRUCTURE="YES"?>
> > <Paragraph>
> > <?XML-SPACE EVENT="YES"?>
> > This is<Emphasis>very</Emphasis>strange.
> > <?XML-SPACE STRUCTURE="YES"?>
> > </Paragraph>
> 
> I think that, ultimately, some combinations of markup will always 
> break whatever rules we come up with. We must ensure that only 
> obscure, non-intuitive combinations do this, then just shout from 
> the rooftops that these combinations are not to be used.

It is clear that a set of guidelines and examples must accompany these rules.
If necessary we may have to educate people to write XML like:
  <TAG
  ><FOO
  ></FOO
  ></TAG
  >
(although I think if we have to go to this stage we have lost 95% of potential
XML webhackers).

[...]
> 
> My concern was to address existing text files, where hyphens are 
> often used in this way. Maybe I am over-estimating this problem.

I don't think we need to adress the conversion of existing non-XML files to
XML in this discussion. The question is what the application does to the
output of the XML parser.

--------

WS is probably among the commonest problem that most newcomers to XML will 
face, so it's well worth trying to develop guidelines.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tikvas at agentsoft.com  Mon Aug 18 12:04:05 1997
From: tikvas at agentsoft.com (Tikva Schmidt)
Date: Mon Jun  7 16:58:17 2004
Subject: Where can I find CDF  dtd file?
Message-ID: <33F81E15.899@agentsoft.com>

I'd apprecciate it if someone would tell me where to find the
CDF dtd file.

       Tikva Schmidt.

--------------------------------------------------------------------
Tikva Schmidt.
email: tikvas@agentsoft.co.il
corp:  Agentsoft Ltd.     http://www.agentsoft.co.il
Phone: 972-2-6480573
---------------------------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ak117 at freenet.carleton.ca  Mon Aug 18 12:39:32 1997
From: ak117 at freenet.carleton.ca (David Megginson)
Date: Mon Jun  7 16:58:17 2004
Subject: Where can I find CDF  dtd file?
In-Reply-To: <33F81E15.899@agentsoft.com>
References: <33F81E15.899@agentsoft.com>
Message-ID: <199708181038.GAA00192@localhost>

Tikva Schmidt writes:

 > I'd apprecciate it if someone would tell me where to find the
 > CDF dtd file.

You could try putting one together from the excerpts in Microstar's
CDF white paper, but unfortunately, they contain many syntax errors.
I wonder if there _is_ actually a DTD yet.  I've done some pretty
elaborate AltaVista searches (for the likely content of the DTD) and
have turned up nothing so far.


All the best,


David

-- 
David Megginson                 ak117@freenet.carleton.ca
Microstar Software Ltd.         dmeggins@microstar.com
      http://home.sprynet.com/sprynet/dmeggins/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From agreene at bitstream.com  Mon Aug 18 15:52:29 1997
From: agreene at bitstream.com (Andrew Greene)
Date: Mon Jun  7 16:58:17 2004
Subject: Conditional marked sections
Message-ID: <19970818134844.AAA2763@AGREENE-PC.bitstream.com>

Please forgive what I hope will turn out to be a foolish question, but
upon rereading the XML spec, I was left unclear on the question of
whether marked sections could be used in the document instance for
anything except CDATA.

That is, in full SGML, you can say:

    <!DOCTYPE example [
      <!Element example - - ANY>
    ]>
    <example>
    This is a <![include[marked]]> section.
    </example>

and when you run it through nsgmls, you get:

    (EXAMPLE
    -This is a marked section.
    )EXAMPLE
    C

But the XML spec implies that conditional inclusion of marked sections
is only approved for the DTD, and not for the document instance itself;
and that the only legal use of marked sections in the document instance
is for CDATA. It is also implied that parameter entities are also only
valid within the DTD itself.

So, which is it? I'll admit that I'll be disappointed if conditional
marked sections are restricted to the DTD.

Thanks,
  Andrew Greene


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Mon Aug 18 17:04:41 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:17 2004
Subject: Conditional marked sections
Message-ID: <3.0.32.19970818080111.00908a70@pop.intergate.bc.ca>

At 09:48 AM 18/08/97 -0400, Andrew Greene wrote:
>Please forgive what I hope will turn out to be a foolish question, but
>upon rereading the XML spec, I was left unclear on the question of
>whether marked sections could be used in the document instance for
>anything except CDATA.

That's right; nothing except CDATA.

>So, which is it? I'll admit that I'll be disappointed if conditional
>marked sections are restricted to the DTD.

Sorry to disappoint. -Tim

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From russc at watfac.org  Tue Aug 19 00:16:25 1997
From: russc at watfac.org (Russell Chamberlain)
Date: Mon Jun  7 16:58:17 2004
Subject: Whitespace rules (v2)
Message-ID: <3.0.1.32.19970818181729.0069be80@watfac.org>

<HI/>

In message <199708170743.IAA28970@andromeda.ndirect.co.uk> "Neil Bradley"
writes:
> 
> Peter Murray-Rust wrote:
> 
>I think - along with TimB - that it is unrealistic to come up with s single
>set of rules that will server every application.  There was an enormous
amount 
>of discussion on the XML group last year and I take it as axiomatic that we
>cannot produce a set of rules which everyone agrees are:
>	- simple to state
>	- unambiguous
>	- intuitive and easy to learn
>	- universal (i.e. cover every situation)

Axiomatic? Call me stubborn (you won't be the first), but I, for one,
retain some hope. :-)

>
>I think that XML will include applications beyond 'browsers and typesetting 
>systems' although these will be the commonest. MathML and CML will have 
>chunks of material which contains whitespace not used primarily as part of
>text.  Here's a simple example:
><MOL>
>  <ATOMS>
>[HT]C H N    Cl[CR][LF]
>[HT]O P Br[CR][LF]
>  </ATOMS>
></MOL>
>where the whitespace is used (a) for visual effect and potential ease in 
>editing (b) as a delimiter (within ATOMS) [HT]=tab, for example. 
>
>What I am after here is a convention that I can state which instructs the 
>processor how to treat this whitespace.  ***I do not wish to have to devise
>a specific convention for CML***.  I want to be able to indicate that that 
>the W/S after <MOL> is irrelevant, and that the whitespace in the ATOMS
content 
>is normalisable and used only as a delimiter of tokens.
>
>I expect that many other applications will use a similar approach, so I want
>to share the effort with them.  Examples of metadata in XML have often been 
>portrayed as prettyprinted and I expect that CML could use the same
conventions.
>[BTW I think that there will be more human editing of XML files than is often
>assumed - and metadata is a good example. Prettyprinting is a useful tool
>in those cases.]
>
>I think that we can aim for a set of options that could be used by a
post-parser
>processor. Different applications (**or document authors**) could choose
between
>them. Examples might be:
>	- normaliseCRLF (Neil's Rule 1)
>	- discardAllWS
>	- normaliseToSingleSpace
>
>An author or application could then state which of these it was using. 
>
>It might be that in the first instance we can only agree on (say) Rule 1, but
>this would be a useful start.
>
>>  
>> > I agree with Liam - I didn't understand 'blockness'.  I also think
that whatever
>> > is done here has to be independent of stylesheets and DTDs.  The
average hacker
>> > like me simply won't undertsand the subtleties.
>> 
>> I am merely trying to distinguish in-line elements from other 
>> elements. An in-line element implies no line-breaks above or below 
>> it. A 'Block' element therefore DOES imply such a break. I do not use 
>> the terms element and mixed content here, because it is not quite the 
>> same thing. As I have said before, a Para element is a 'block' 
>> element, and has mixed content, but an Emph element is an 'in-line' 
>> element, yet also has mixed content. All style sheets, including 
>> CSS, understand the concept of in-line and block elements. Any 
>> whitespace surrounding a block element MUST be irrelevant.
>
>It looks like the context, rather than the content is the significant
>feature.
>
>> 
>> Liam raised the issue of a half-way element type, such as a header 
>> which implies a line-break before it, but not after, so that 
>> following text will appear on the same line. This one is tricky. 
>> Suggestions anybody?
>

<FormattingSpecificDiscussionOfWhitespace>

The idea of a "half-way" element type just highlights the fact that element
nesting does not necessarily map nicely to block/paragraph structure in
formatting applications. I like to say that block formatting _trancends_
element nesting -- there is no direct mapping.

In my experience, a pair of lower-level concepts (eg. "block start" and
"block end") has proven quite useful. In the current discussion, the
"blockness" of the elements might be described as follows:

           "block start"   "block end"
    -----------------------------------------
    Para       Yes            Yes
    Emph       No             No
    Hn         Yes            No

where:

  "block start" - means start a block at the start of the element
  "block end"   - means end a block at the end of the element

</FormattingSpecificDiscussionOfWhitespace>

<GeneralDiscussionOfWhitespace>

A notation for describing whitespace handling must communicate the notion
that whitespace processing is modal, and provide words for each mode and
phrases for the transitions. 

Let's consider Peter's tentative rules:

>	- normaliseCRLF (Neil's Rule 1)

Please correct me if I am wrong, but this looks like a document-wide
setting whose behaviour/interpretation isn't affected by the application
type. A simple on/off PI setting could be used to set this.

The rest of the rules, though, could be applied on a per-element basis:

>	- discardAllWS
>	- normaliseToSingleSpace

I would add:

    - keepAllWS

(I haven't read every word of every post in this thread. Has this third one
been discarded as a reasonable option? Even if it has, the rest of my
discussion here isn't affected)

Assuming that the three, mutually-exclusive rules (or _modes_) can be
applied to any element, how can we specify this?

Would being able to specify one of the three modes on a per-element basis
be powerful enough? If we used PIs to do this then some HTML tags, for
example, might be listed as follows (just a hypothetical notation example,
_not_ a final suggestion for notation):

    <?XML-SPACE-DISCARD  HTML, HEAD, BODY, ... ?>
    <?XML-SPACE-COLLAPSE TITLE, P, H1, H2, ... ?>
    <?XML-SPACE-KEEP     PRE, XMP, LISTING, ... ?>

Notes:

- HTML applications could just imply these rules.

- Any elements that aren't listed would just use the current mode, which
depends on the context.

- If the desired whitespace mode depends on something other than the
current element (an attribute, say) then this mechanism won't be powerful
enough.

- Specifying the whitespace mode on a per-element basis should make this
technique well-suited to architectural forms, though.

</GeneralDiscussionOfWhitespace>

 - Russ

PS - Should whitespace be blacklisted? ;-)


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tfj at apusapus.demon.co.uk  Tue Aug 19 00:39:17 1997
From: tfj at apusapus.demon.co.uk (Trevor Jenkins)
Date: Mon Jun  7 16:58:17 2004
Subject: Other  whitespace problems was Re: Whitespace rules (v2)
In-Reply-To: <3.0.32.19970816105650.008fda00@pop.intergate.bc.ca>
Message-ID: <199708182152.tfj.2174@apusapus.demon.co.uk>

> The original goal as stated in SGML was to ignore white
> space "caused by markup" by which they meant "used to prettyprint
> markup".  A worthy goal, but in fact most people would agree that
> the rules you have to write to achieve this are horrendously complicated
> and some would argue that SGML never actually did get it right.  

Whilst all the discussion upon "whitespace caused by markup" has been 
going-on I've had reason to look at whitespace within the various 
declarations. I have always been very wary of the separator rules for 
SGML declarations (as a computing scientist I find it odd that such 
separators have been hard-coded in the grammar rules themselves). I'm 
convinced that as they stand the separator rules in XML are 
ambiguous.

I have been looking at the element declaration in particular and its 
abundance of Ss leads to ambiguity. As I read the grammar the 
following is ambiguous:

<!ELEMENT trouble ( ( ...
                   ^
Is this space to be recognised by the first S? in the choice 
production, the first S? in the seq production or the first S? in the 
cps production that each of choice and seq uses? It cannot be 
recognised by them all in practice but each of those productions can 
match it. :-( As to whether it is matched by cps or choice/seq 
depends upon whether you parse the declaration with an LL or LR 
parser.

There is a further problem with the productions for the element
declaration in that the "elements" clause and its children require
more than 1 symbol look-ahead. This also affects the same fragment
becasue it is not clear until after several more tokens have been
parsed as to whether the elements clause is trying to match a choice
or seq. 

I'm working with a copy of WD-xml dated 970807, which when I looked 
late last week was the current version of the text available from 
www.w3.org.

Regards, Trevor.

--

"Real Men don't Read Instruction Manuals"
   Tim Allen, Home Improvement

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue Aug 19 01:26:08 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:17 2004
Subject: Other  whitespace problems was Re: Whitespace rules (v2)
Message-ID: <3.0.32.19970818162238.00902760@pop.intergate.bc.ca>

At 09:52 PM 18/08/97 +0000, Trevor Jenkins wrote:
> I'm 
>convinced that as they stand the separator rules in XML are 
>ambiguous.

Yes; Michael Sperberg-McQueen and I both agree that these need
some more work.  If it weren't for the $#*!@#%#!ing Parameter 
Entities, all this would be simple and straightforward - designing
a grammar for the SGML element declaration language is not exactly
rocket science.

But when you try to pollute the grammar by saying where you can
and can't replace chunks of it with PE references, it all of a
sudden gets hideously difficult.  SGML gets around this with the
clever device of the Ee (entity end) virtual token... which we in 
the XML gang thought was hopelessly unaesthetic; after some struggles 
with this particular problem, Ee is starting to look better.  

Mind you, of the 3 XML-lang co-editors, two (I and Jean Paoli) have
voted against the existence of PEs at every opportunity; these votes
are in some part self-serving.  However, there can be no doubt that
if you want to build and maintain 8879-style markup declarations, it's
basically just not possible to do this without PE's.  Sigh.  Mind you,
some of us have another solution for that... 

Another compromise would be to apply the internal-subset rule, i.e. 
you can have PE's but they have to replace whole declarations.  There 
are other interim measures, i.e. you can only replace a whole content 
model; all involve severe limitations on PE usefulness as the payment
for spec/grammar clarity.

Anyhow, further grammar engineering is in order.  One thing to 
think about is simply to drop the 'S' (space) nonterminal, write
a couple of simple tokenization rules, and take it that way.  CMSMcQ
has investigated this at length, but it has problems too.

Pardon me for whining; I'm sure we'll figure out something. -Tim

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Tue Aug 19 12:12:07 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:17 2004
Subject: Whitespace
Message-ID: <199708191011.LAA29289@GPO.iol.ie>

>> Peter Murray-Rust wrote:
>> 
>>I think - along with TimB - that it is unrealistic to come up with s single
>>set of rules that will server every application.  There was an enormous
>amount 
>>of discussion on the XML group last year and I take it as axiomatic that we
>>cannot produce a set of rules which everyone agrees are:
>>	- simple to state
>>	- unambiguous
>>	- intuitive and easy to learn
>>	- universal (i.e. cover every situation)
>

**Warning:** Rush of blood to the head follows. Get those flame throwers
ready...

I know this whole white space thing was trashed out at length some time ago but
it worries me greatly that on XML-DEV the whole issue seems to be as problematic
as it was before XML-Lang's rulings on whitespace handling where decided upon.
It seems that the problem was not really solved - just pushed up a layer:-)

It just sounds wrong to me that white space handling is to be the subject of
application conventions rather than part of the core XML parsing activity.

Anyway, I think everyone should be allowed over-simplify the "White Space
Problem"
once in there lives! Here is my contribution:-


Ban mixed content. Mixed content is a markup minimization feature.

If you want a chunk of PCDATA in an XML doc, use the <PCDATA>
reserved element name.

<foo>
   <pcdata>I am data 1</pcdata>
   <pcdata>I am data 2</pcdata>
</foo>

Becomes
<foo><pcdata>I am line 1</pcdata><pcdata>I am line 2</pcdata></foo>

If you need whitespace to be something other than whitespace- i.e. a
newline to be a real newline to be passed on to the application, use an
empty element type to represent it.

<foo>
   <pcdata>I am data 1</pcdata><newline/>
   <pcdata>I am data 2</pcdata>
</foo>


Give me five minutes to put on the asbestos suit and then you flame
away....


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From bdonoghoe at spin.net.au  Tue Aug 19 15:46:42 1997
From: bdonoghoe at spin.net.au (Bill Donoghoe)
Date: Mon Jun  7 16:58:17 2004
Subject: Whitespace
Message-ID: <199708191344.XAA01627@spin.net.au>


>Sean Mc Grath wrote:
>>> Peter Murray-Rust's post removed to conserve space
>
>**Warning:** Rush of blood to the head follows. Get those flame throwers
>ready...
>
>I know this whole white space thing was trashed out at length some time ago but
>it worries me greatly that on XML-DEV the whole issue seems to be as 
problematic
>as it was before XML-Lang's rulings on whitespace handling where decided upon.
>It seems that the problem was not really solved - just pushed up a layer:-)
>
>It just sounds wrong to me that white space handling is to be the subject of
>application conventions rather than part of the core XML parsing activity.
>
>Anyway, I think everyone should be allowed over-simplify the "White Space
>Problem"
>once in there lives! Here is my contribution:-
>
>
>Ban mixed content. Mixed content is a markup minimization feature.
>
>If you want a chunk of PCDATA in an XML doc, use the <PCDATA>
>reserved element name.
>
><foo>
>   <pcdata>I am data 1</pcdata>
>   <pcdata>I am data 2</pcdata>
></foo>
>
>Becomes
><foo><pcdata>I am line 1</pcdata><pcdata>I am line 2</pcdata></foo>
>
>If you need whitespace to be something other than whitespace- i.e. a
>newline to be a real newline to be passed on to the application, use an
>empty element type to represent it.
>
><foo>
>   <pcdata>I am data 1</pcdata><newline/>
>   <pcdata>I am data 2</pcdata>
></foo>
>
>
>Give me five minutes to put on the asbestos suit and then you flame
>away....
>
Instead of flaming you I will hope onto the bandwagon (can I borrow the 
asbestos suit for awhile).

Firstly to paraphrase some earlier comments, the "whitespace problem" has 
resulted from its dual personality.

Personality 1.  The programmer's whitespace ("pretty printing") is used as a 
layout tool for visual editing of the markup and content.  Besides, lots of 
editing applications won't allow lines over 250 characters.

Personality 2.  The whitespace is part of the content used because the 
author either wanted it that way or he/she could not see any other easy way 
to encode the information correctly.

SGML tried to cater for both personalities and it succeeded in a moderate 
fashion.  The downside was that it is not an easy task to maintain and 
process SGML documents.

Now for some personal opinion on what I thought XML was all about.  XML is 
an attempt to either simplify SGML (get rid of or change the bits which make 
it hard to understand/use/process) or extend HTML to deal with information 
content as well as presentation.  I lean towards the former view "SGML for 
the Web".  

IMHO the current XML "whitespace handling" has not simplified the SGML 
situation significantly.

Here are some comments and slight variations on Sean's suggestion.

I belive that Sean's suggestion has plently of merit.

What is wrong with having some standard elements 
(<PCDATA>,<CDATA>,<NEWLINE>)which are part of every XML DTD?

If you didn't want users to have to author these tags then "normalisation" 
applications could be developed which could convert "raw" XML into the 
"normalised" version.

Example:

<foo>
   I am data 1
   I am <emph>data</emph> 2
</foo>

could be normalised to:

<foo>
   <pcdata>I am data 1</pcdata><newline/>
   <pcdata>I am data 2</pcdata>
</foo>

or

<foo><pcdata>I am data 1 I am <emph>data</emph> 2</pcdata>
</foo>

depending on the DTD declarations for the elements or a style sheet (?!!)

However, normalisation is not needed if the authors can be given tools which 
can produced the desired markup.

Thus, all whitespace in the "normalised" documents could be collapsed to a 
single space (because we removed personality 2 we are only left with pretty 
printing).

I will stop rambling now.

IMHO the solution lies in removing the dual personalities of whitespace at 
document authoring time (or at its interface to XML tools for documents 
tagged by human hand).

Regards,
Bill


Regards,
Bill Donoghoe              bdonoghoe@acslink.net.au
InfoTech (NSW) Pty Ltd     mobile: 014 625 397 (in Australia)
SGML/HyTime/DSSSL/XML Consultancy and Development


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tms at ansa.co.uk  Tue Aug 19 20:33:32 1997
From: tms at ansa.co.uk (Toby Speight)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace
In-Reply-To: bdonoghoe@spin.net.au's message of Tue, 19 Aug 1997 23:44:08 +1000 (EST)
References: <199708191344.XAA01627@spin.net.au>
Message-ID: <s8lo1yaurv.fsf@plato.ansa.co.uk>

A non-text attachment was scrubbed...
Name: not available
Type: text/plain (pgp signed)
Size: 2803 bytes
Desc: not available
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970819/ceb413c8/attachment.bin
From dgd at cs.bu.edu  Tue Aug 19 23:57:37 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace rules (v2)
In-Reply-To: <3.0.1.32.19970818181729.0069be80@watfac.org>
Message-ID: <v03007800b01fa935a1f1@[205.181.197.116]>

I observed with dismay that the issue of whitespace has surfaced on this
list, after we finally gave it the wooden-stake-in-the-heart treatment on
the WG discussion lists. As a chief proponent of the current method, I'll
take a shot at explaining the rationale, as that is something that doesn't
really fit in a standard, but actually helps a great deal in understanding
one.

I'm taking some recent notes on this list as a starting point.

At 5:17 PM -0500 8/18/97, Russell Chamberlain wrote:

>> Peter Murray-Rust wrote:
>>
>>I think - along with TimB - that it is unrealistic to come up with s single
>>set of rules that will server every application.  There was an enormous
>amount
>>of discussion on the XML group last year and I take it as axiomatic that we
>>cannot produce a set of rules which everyone agrees are:
>>	- simple to state
>>	- unambiguous
>>	- intuitive and easy to learn
>>	- universal (i.e. cover every situation)
>
>Axiomatic? Call me stubborn (you won't be the first), but I, for one,
>retain some hope. :-)

We all did at first. The problem is really the last point -- _universal_
and while I am tempted to agree with Peter, I do not, in fact, because I
think the current method actually does satisfy all four points -- but not
necessarily in the way that you would expect.

>>[Peter states in detail different policies on whitespace he might need in
>>different contexts.]
>>
>>What I am after here is a convention that I can state which instructs the
>>processor how to treat this whitespace.  ***I do not wish to have to devise
>>a specific convention for CML***.  I want to be able to indicate that that
>>the W/S after <MOL> is irrelevant, and that the whitespace in the ATOMS
>content
>>is normalisable and used only as a delimiter of tokens.

The problem with this is that there are a large number of ways that
whitespace can be used: the "tokens" form mentioned at the end, for
example, has never been proposed for XML.

>>I expect that many other applications will use a similar approach, so I want
>>to share the effort with them.  Examples of metadata in XML have often been
>>portrayed as prettyprinted and I expect that CML could use the same
>conventions.

This charing makes sense, only when the sharing of effort is not imposing
an unreasonable burden on others. The problem with whitespace is that the
different possible policies are all unneeded by many applications.

The typical browser/formatter may never need "token" style whitespace, and
may implement such things by passing data to applets or other external
processes that will handle them.

In fact, the need to write xml->xml transducers (SGML has tought us that
this need never goes away), argues that it must be _possible_ to see all
whitespace at least _some_ of the time, regardless of document. That's one
reason that the current "pass all whitespace" model works.

The other reason that it works, is that you an always ignore data that
you're not interested in (whitespace) but you can never get access to data
that is hidden from you -- therefore the convenience of "automatic
whitespace removal" is an inability to see that space without using
non-standard tools.

>>I think that we can aim for a set of options that could be used by a
>post-parser
>>processor. Different applications (**or document authors**) could choose
>between
>>them. Examples might be:
>>	- normaliseCRLF (Neil's Rule 1)
>>	- discardAllWS
>>	- normaliseToSingleSpace

I agree that this is the right place for such processing to happen (between
a parser and an application). I'm not yet sure whether these things are as
reusable as people think. I do know that without the use of #FIXED
attributes (so I could avoid markup in the instance) I would _not_ use
these, but rather make sure that my application (or stylesheet language)
had the ability to apply these policies on request, as needed.

><GeneralDiscussionOfWhitespace>
>
>A notation for describing whitespace handling must communicate the notion
>that whitespace processing is modal, and provide words for each mode and
>phrases for the transitions.
>
>Let's consider Peter's tentative rules:
>
>>	- normaliseCRLF (Neil's Rule 1)
>
>Please correct me if I am wrong, but this looks like a document-wide
>setting whose behaviour/interpretation isn't affected by the application
>type. A simple on/off PI setting could be used to set this.

One might want to do this only in specific elements. Say I'm piping some
sub-elements to a stupid processor, and that requires a fixed linend
convention, but none of my other processing cares.

>
>The rest of the rules, though, could be applied on a per-element basis:
>
>>	- discardAllWS
>>	- normaliseToSingleSpace
>
>I would add:
>
>    - keepAllWS
>
>(I haven't read every word of every post in this thread. Has this third one
>been discarded as a reasonable option? Even if it has, the rest of my
>discussion here isn't affected)

This is the option that XML universally adopts. That means  that any other
method can be implemented _by any processor that cares_. If one can imagine
destroying meaning of a document's content by the flattening of all
whitespace strings to a single space, then you may need more elements in
your content model, if you are not able to control the software that will
process the document.

In other words the parser guarantees all WS will be visible to applications
-- this makes designing and implementing WS dependent processing easy --
but since applications are _not_ constrained as folding or other WS
processing behaviour, document authors will have to be cautious in using
significant whitespace. If you can't assume that applications to process
your markup will do the right thing, then you should not play games with WS.

This actually is not much of an issue for CML, since it's a reasonable
assumption that any implementation of CML markup-display will have to do
lots of special things, of which whitespace is the least.

[[[Geek note: I think that authors might be a little safer if significant
WS is in a CDATA marked section. Since CDATA is essentially a quoting
mechanism, Applications should be more careful about such content.]]]
>Would being able to specify one of the three modes on a per-element basis
>be powerful enough? If we used PIs to do this then some HTML tags, for
>example, might be listed as follows (just a hypothetical notation example,
>_not_ a final suggestion for notation):
>
>    <?XML-SPACE-DISCARD  HTML, HEAD, BODY, ... ?>
>    <?XML-SPACE-COLLAPSE TITLE, P, H1, H2, ... ?>
>    <?XML-SPACE-KEEP     PRE, XMP, LISTING, ... ?>
>
>Notes:
>
>- HTML applications could just imply these rules.
>
>- Any elements that aren't listed would just use the current mode, which
>depends on the context.
>
>- If the desired whitespace mode depends on something other than the
>current element (an attribute, say) then this mechanism won't be powerful
>enough.
>
>- Specifying the whitespace mode on a per-element basis should make this
>technique well-suited to architectural forms, though.

One way to see that this is inadequate is to think about typesetting, where
you may need to consider the whitespace and adjacent typefaces independent
of their placement with respect to markup, in order to correctly handle
italic corrections and the like. This is something that authors frequently
fail to get right, and that is probably best solved, 90% of the time, by
smart software. (Let's not even consider the problem of punctuation in the
same environments!)

I think XML's agnostic position is the correct one for tha language.
Authors should probably assume (unless they anticipate absolutely no
re-use) that HTML-style draconian normalization might occur anywhere and
use markup rather than whitespace, or at least CDATA sections. This
position _may_ be moderated (a little) where a well-known DTD with
well-defined WS rules can be used (like the TEI or HTML).

  -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Wed Aug 20 00:39:30 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace
Message-ID: <199708192239.XAA01470@GPO.iol.ie>


Paul Prescod made the point that Charles Goldfarb made the
"ban mixed content" suggestion some time ago. In private correspondence,
a number of other XML'ers have said likewise.

Paul goes on to say that it was rejected as unwieldy at the time.
I was not involved in XML at the time but the more I think about it the more
"wieldy" Charles' idea seems.

I think it speaks volumes for the merit of Charles' idea that the best and
brightest brains in the SGML world have fought with this issue since the early
days of XML without achieving (IMHO) the hoped for breakthrough.

If it is more complex than "I before E except after C" or "the right hand
thumb rule", it is too complex IMHO.

The PCDATA element trick is sooooo easy to understand! Mixed content
SGML can be converted to this "mixed-content-free" format quite easily. 

XML started out aiming for simplicity. It has achieved this
wonderfully well in a whole variety of areas but "the white space" is not
one of them.

If it is too late to revisit this I will have to console myself with the
thought that
the universe bifurcated when the white space decision was made. In some
parallel universe, Charles' suggestion is simplifying XML for many people.

Anyway, perhaps it is too late to revisit the mixed content problem. I hope not
but will shut up when someone who knows what the position is tell me to.

Sean


Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Wed Aug 20 01:03:23 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace
Message-ID: <3.0.32.19970819160004.00917aa0@pop.intergate.bc.ca>

At 11:12 PM 19/08/97 +0100, Sean Mc Grath wrote:
>The PCDATA element trick is sooooo easy to understand! Mixed content
>SGML can be converted to this "mixed-content-free" format quite easily. 

Hmm, let's say the GI is the null string.

<P><>Some text that is </><I>italicized</I><>.</></P>

Whitespace discussions cause brain damage. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Sun Aug 24 16:44:56 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace
Message-ID: <199708241452.AAA07286@jawa.chilli.net.au>

 
From: Sean Mc Grath <digitome@iol.ie>
 
> If you need whitespace to be something other than whitespace- i.e. a
> newline to be a real newline to be passed on to the application, use an
> empty element type to represent it.

> <foo>
>   <pcdata>I am data 1</pcdata><newline/>
>   <pcdata>I am data 2</pcdata>
> </foo>

Yes and no.  <newline/> is not needed in XML.  ISO10646 includes 
characters which  unambigously represent line-breaks and paragraph breaks: 
U+2028 and U+2029.

<foo>I am data 1&#x2028;I am data 2</foo>

Any conventions for handling whitespace in XML do not need to address
"hard returns".  If someone wants a hard return, they can mark it up
explicitly just using what XML already provides (by adopting ISO 10646).

Similarly, XML-DEV does not need to make up any conventions to handle 
no-break spaces (&nbsp; or &#x00A0;) or "hard spaces" (ideographic space 
does not collapse: &#x3000;).

Lets not make this more complicated than it is!


Rick Jelliffe

P.S. In the example quoted, I think probably <RCDATA> is a closer
description of the element rather than <PCDATA>.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Sun Aug 24 17:05:00 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace rules (v2)
Message-ID: <199708241512.BAA07702@jawa.chilli.net.au>


> From: Liam Quin <liamquin@interlog.com>
  
> The hyphen is 0255 octal (173 decimal).  It is a hyphen, not a soft hyphen.
> There is no soft hyphen in Latin 1.
> I don't have the necessary copy of Unicode in front of me, but last time
> I checked (Unicode 1.1) it was the same in this regard, and also in having
> the ` character be a spacing grave accent, not a single quote.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Sun Aug 24 17:10:37 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace rules (v2)
Message-ID: <199708241518.BAA07768@jawa.chilli.net.au>


> From: Liam Quin <liamquin@interlog.com>
  
> The hyphen is 0255 octal (173 decimal).  It is a hyphen, not a soft hyphen.
> There is no soft hyphen in Latin 1.
> I don't have the necessary copy of Unicode in front of me, 

In both Unicode 1.0 and Unicode 2.0   &#x00AD;  is called "soft hyphen"
or "discretionary hyphen", so it is available, but perhaps not reliably 
supported by 8859-1 applications.

Also available is the zero-width
space  &#x200B;  which can be used to provide non-hyphenating line-break
points inside long technical terms (this might be useful in chemical names,
where a dash of any kind might be misleading) and in languages in which 
words are not delimited by spaces.

For example,  supercali&#x200B;fragalistic&x200B;expialladocious.


Rick Jelliffe


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From liamquin at interlog.com  Sun Aug 24 23:26:50 1997
From: liamquin at interlog.com (Liam Quin)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace rules (v2)
In-Reply-To: <199708241518.BAA07768@jawa.chilli.net.au>
Message-ID: <Pine.BSI.3.95.970824171827.29580C-100000@shell1.interlog.com>

On Mon, 25 Aug 1997, Rick Jelliffe wrote:

> > From: Liam Quin <liamquin@interlog.com>
> > The hyphen is 0255 octal (173 decimal).  It is a hyphen, not a soft hyphen.
> > There is no soft hyphen in Latin 1.
> > I don't have the necessary copy of Unicode in front of me, 
> 
> In both Unicode 1.0 and Unicode 2.0   &#x00AD;  is called "soft hyphen"
> or "discretionary hyphen", so it is available, but perhaps not reliably 
> supported by 8859-1 applications.

Not supported at all would be a fairer way to put it!
At any rate not by _conforming_ 8859-1 applications, as far as I
understand it... in the same way that most SGML applications don't
treat &x; as a syntax error even when it's illegal in ISO C or FORTRAN :-)

I don't have a copy of 8859 any more to check, but if the hyphen chracter
is to be treated as a soft hyphen, there's no way to type a hard hyphen...

> Also available is the zero-width space  &#x200B;
> For example,  supercali&#x200B;fragalistic&x200B;expialladocious.

Perhaps, but to claim that this is more readable to humans than
		supercali&softhy;fragalistic&softhy;expialladocious.
would be absurd.  If you hadn't omitted the # in the 2nd reference, the
length would have been the same too.  Using &hy; is even better.

You can always do
    <!--* hy: soft (discretionary) hyphenation point: *-->
    <!Entity hy '&#x200B;'>

Lee

-- 
Liam Quin --  the barefoot typographer -- Toronto
lq-text: freely available Unix text retrieval

email address: liamquin, at host: interlog dot com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Mon Aug 25 00:48:32 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace rules (v2)
Message-ID: <199708242256.IAA14336@jawa.chilli.net.au>


> From: Liam Quin <liamquin@interlog.com>
 
> I don't have a copy of 8859 any more to check, but if the hyphen chracter
> is to be treated as a soft hyphen, there's no way to type a hard hyphen...
 
Yes. But why is this a surprise? A "hard hyphen" is a dash (copying whatever
kind of dash has heen used by the application) followed by a hard 
return.

> Perhaps, but to claim that this is more readable to humans than
> 		supercali&softhy;fragalistic&softhy;expialladocious.
> would be absurd.  If you hadn't omitted the # in the 2nd reference, the
> length would have been the same too.  Using &hy; is even better.

It might be more useful to include a hyphenation dictionary at
the top of the document that can be fed into the typesetting application's
hyphenation dictionary, rather than complicate the text with inplace
softhyphens. You can then use any character you like to signal the
soft hyphen, also, which may shorten things.

<hyph-dict>over^blown, under^done
</hyph-dict>

> You can always do
>     <!--* hy: soft (discretionary) hyphenation point: *-->
>     <!Entity hy '&#x200B;'>

Yes. I think people use "&shy;"  for soft hyphen more than "&hy;".


Rick Jelliffe

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From murata at apsdc.ksp.fujixerox.co.jp  Mon Aug 25 04:11:14 1997
From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace
In-Reply-To: <3.0.32.19970819160004.00917aa0@pop.intergate.bc.ca>
Message-ID: <9708250211.AA01302@lute.apsdc.ksp.fujixerox.co.jp>

Tim Bray writes:
>
>Hmm, let's say the GI is the null string.
>
><P><>Some text that is </><I>italicized</I><>.</></P>

Suppose that we have different kinds of tags for mixed-content elements (e.g, 
<name:mixed> and </name:mixed>) and element-content elements (e.g, 
<name:element> and </name:element>).  Then, even non-validating parsers 
can tell element contents and mixed contents.  Does this help?  

>Whitespace discussions cause brain damage.

A fatal error.  I can not, and should not recover...

MURATA Makoto (FAMILY Given)
Fuji Xerox Information Systems
 
Tel: 044-812-7230   Fax: 044-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From liamquin at interlog.com  Mon Aug 25 06:30:29 1997
From: liamquin at interlog.com (Liam Quin)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace rules (v2)
In-Reply-To: <199708242256.IAA14336@jawa.chilli.net.au>
Message-ID: <Pine.BSI.3.95.970825002712.28792A-100000@shell1.interlog.com>

On Mon, 25 Aug 1997, Rick Jelliffe wrote:
> > From: Liam Quin <liamquin@interlog.com>
>> I don't have a copy of 8859 any more to check, but if the hyphen chracter
>> is to be treated as a soft hyphen, there's no way to type a hard hyphen...
>  
> Yes. But why is this a surprise? A "hard hyphen" is a dash (copying whatever
> kind of dash has heen used by the application) followed by a hard return.
So I can't type "Forbes-Hamilton" with a hyphen? (I have used a minus sign
here because I'm using 7-bit ASCII software right now!)

At any rate, unless hyphenation behaviour becomes part of XML-LANG, I don't
see that this discussion is relevant, although by all means mail me
privately if you want to prolong it :-)

Lee

-- 
Liam Quin --  the barefoot typographer -- Toronto
lq-text: freely available Unix text retrieval

email address: liamquin, at host: interlog dot com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From mrc at allette.com.au  Mon Aug 25 11:39:37 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace
References: <9708250211.AA01302@lute.apsdc.ksp.fujixerox.co.jp>
Message-ID: <340152AF.F224A51C@allette.com.au>

Apologies in advance to all those who have thought and fought over this
issue for a long time, but as a self-confessed critic of the claim that
"XML is SGML", I feel compelled to throw my hat into the ring.

As far as I can see, there are only two circumstances when whitespace is
an issue - receiving an XML document or authoring one. Receiving, it
doesn't matter if you have a DTD or not - the application can determine
from a well formed document whether it should regard an element's
content as MIXED or ELEMENT. It does involve parsing it, but only until
it sees mixed content. If elements are assumed to be ELEMENT until
proven otherwise, surely this wouldn't be a massive overhead. Authoring
applications would be similar - the first time a tag contained mixed
content, the application would reset the status of the element. The onus
would from then on be on the application to assist the user in creating
semantically correct documents, by such mechanisms as not allowing hard
returns at element boundaries, in short, making significant whitespace
look like significant whitespace.

MURATA Makoto wrote:

> Suppose that we have different kinds of tags for mixed-content
> elements (e.g, <name:mixed> and </name:mixed>) and element-content
> elements (e.g, <name:element> and </name:element>).  Then, even
> non-validating parsers can tell element contents and mixed contents.
> Does this help?

It seems that the choices are either the current proposal that nobody
seems to feel is entirely satisfactory, or suggestions such as the
above, which would certainly work, but ultimately may involve as great
an overhead as sending the DTD. It seems to me that we're throwing the
baby out with the bathwater by ignoring a solution such as declaring at
the start of the document how whitespace in elements should be handled.

I would also like to see DTDs sent to non-validating parsers, just so
they could determine how to apply whitespace rules without necessarily
having to do any structural parsing. If need be, two new types of
declared content could be added, ELEMENT and MIXED. They might behave
the same way as ANY, or the DTD could be constructed even more loosely,
where only MIXED elements were declared and everything else was
defaulted to ELEMENT. This would result in a small DTD sent only for the
sake of making the application aware of how to deal with whitespace. If
desirable, no DTD need be sent, but the application's performance may
suffer marginally for it. This is in keeping with the idea that an
application need not know how to deal with a document as it comes in. As
far as I can see, much of the functionality in XML (such as linking)
relies on a DTD, so it's not going to be foreign to most XML
applications anyway.

The whitespace rules in SGML can be simplified - most people accept that
they should. Because inclusions and exclusions aren't valid in XML
anyway, the rules are already somewhat simpler. I would really like to
see XML and SGML stay in synch - I think anything else would be to
everyones disadvantage. There really isn't a lot of point in flaming me
for this; the question is well intentioned and the current solution
seems to have satisfied few. The concept of declaring things at the
start is a tried and true methodology, yet we seem to be fleeing it in
favor of something nobody's quite sure about.


--
Regards

Marcus Carr                  email:  mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia)  email:  info@allette.com.au
Level 10, 91 York Street     www:    http://www.allette.com.au
Sydney 2000 NSW Australia    phone:  +61 2 9262 4777
                             fax:    +61 2 9262 4774
_______________________________________________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Mon Aug 25 13:33:45 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:18 2004
Subject: Whitespace rules (v2)
Message-ID: <9623@ursus.demon.co.uk>

I have been away for a few days so maybe it's a useful time to try to summarise
the Whitespace debate and to ask a few questions. You don't need to read the 
rest of this unless you believe there is a problem to be addressed :-)

In message <v03007800b01fa935a1f1@[205.181.197.116]> dgd@cs.bu.edu (David G. Durand) writes:
> I observed with dismay that the issue of whitespace has surfaced on this
> list, after we finally gave it the wooden-stake-in-the-heart treatment on
> the WG discussion lists. As a chief proponent of the current method, I'll

:-) I am not sure what has been killed :-)

> take a shot at explaining the rationale, as that is something that doesn't
> really fit in a standard, but actually helps a great deal in understanding
> one.

I will take David's points first, because I *do* believe that many of those
who were involved in the development of the spec feel that there is no scope
for further discussion of this *IN THE SPEC*.  I agree with this.

Essentially the spec says:
	- This is a difficult problem.  [Actually it doesn't say this, but 
it might help if it did in a footnote.]
	- We have taken a minimalist approach where we do not give any support
to any whitespace philosophy [other than PRESERVE which passes everything and
can be platform-dependent], but leave this to the community. DEFAULT is simply
the absence of PRESERVE.
	
I believe this solves one species of problem, where the authoring tool/system
is closely coupled to the application. CDF might be such a system (e.g. I have
never seen a native CDF file).

*IF* this is the major use of XML - where there is a one-to-one communication
of this sort - then there is no real problem.  I do not believe this is the
case, and I think there are at least two areas where XML will run into this
general problem on numerous occasions:

(A) There is a defined DTD (e.g. TEI, HTML) but a variety of authoring tools
and a variety of applications from different providers. Traditionally these
will come from the SGML community. I believe that there will certainly be
initial problems where m'facturer X emits whitespace in a particular way
which is incompatible with Y's tools for rendering/transforming it. It may
also be platform dependent.  We've seen this in the development of HTML systems
although they are improving. 

Remember that most SGML systems are current implemented within a single site
(the tools are chosen to be compatible throughout the process). Very little
SGML is delivered over the WWW to be consistent between different m'facturers.
XML is specifically designed to be delivered over the WWW in (I assume)
a platform and m'facturer-independent way.  Do we expect to see 'this XML
file best viewed with FOO software'??? If so, we might as well give up now.

IMO any developer needs to be able to say:
	(i) I support a wide range of XML DTDs.
	(ii) I can easily customise my software to support a range of commonly
used DTDs
	(iii) Documents authored by my software should be readable by software
from another m'facturer with whom I have had no formal discussions
	(iv) My system can support a range of applications which read documents
produced by other m'facturers systems and with whom I have had no formal
discussions.

If all the manufacturers tell me this is a non-problem, I'll shut up (on this
issue!) If each DTD defines its own use of whitespace (or worse, doesn't 
define it) they may have a lot of work.

(B) There are generic XML applications. The XML community continues to discuss
documents which 'contain information from more than one DTD' or 'are WF but
not necessarily valid(atable)'. Examples of these are:
	(i) an XML document to which meta-data has been prepended.
	(ii) an XML document which includes chunks conforming to well-defined
DTDs such as MathML.

The possible combinations are indefinitely large.
It is impossible to write bespoke software to process these documents, and we
need generic mechanisms. Perhaps many will be dealt with by stylesheets, and
maybe the WS issue is a question of developing appropriate conventions in
stylesheets.  In documents of this sort there have to be conventions and flags
that indicate how to interpret the documents. The spec has indicated that it
shouldn't be in the XML markup - no problem.  Somehow conventions have to
evolve, either conveyed implicitly or explicitly (e.g. through PIs). 
[Remember that there are - as yet - no agreed conventions as to what a PI can
look like - you can put anything in after the target.]

> 
[...]
> >Axiomatic? Call me stubborn (you won't be the first), but I, for one,
> >retain some hope. :-)
> 
> We all did at first. The problem is really the last point -- _universal_
> and while I am tempted to agree with Peter, I do not, in fact, because I
> think the current method actually does satisfy all four points -- but not
> necessarily in the way that you would expect.

Note; I am NOT trying to find a universal solution here.  I am suggesting that
we develop some common, useful approaches which will solve a reasonable 
number of problems.

> 
> >>[Peter states in detail different policies on whitespace he might need in
> >>different contexts.]
> >>
> >>What I am after here is a convention that I can state which instructs the
> >>processor how to treat this whitespace.  ***I do not wish to have to devise
> >>a specific convention for CML***.  I want to be able to indicate that that
> >>the W/S after <MOL> is irrelevant, and that the whitespace in the ATOMS
> >content
> >>is normalisable and used only as a delimiter of tokens.
> 
> The problem with this is that there are a large number of ways that
> whitespace can be used: the "tokens" form mentioned at the end, for
> example, has never been proposed for XML.

I agree there are a large number of ways.  Some classification would be 
valuable and IMO the sort of thing that XML-DEV could usefully provide.
[The WS-separated tokens are no different from 'words' in HTML and I would
expect that a large number of people would welcome a convention on 
normalising whetspace between 'words'.]

> 
> >>I expect that many other applications will use a similar approach, so I want
> >>to share the effort with them.  Examples of metadata in XML have often been
> >>portrayed as prettyprinted and I expect that CML could use the same
> >conventions.
> 
> This charing makes sense, only when the sharing of effort is not imposing
> an unreasonable burden on others. The problem with whitespace is that the
> different possible policies are all unneeded by many applications.

Then the application needn't implement them :-)  Applications have to do
*something* about whitespace.  This can be:
	- ignore the problem (or use PRESERVE)
	- their own thing
	- a set of choices which is understood by the community
	- refuse to process the document.

> 
> The typical browser/formatter may never need "token" style whitespace, and
> may implement such things by passing data to applets or other external
> processes that will handle them.
> 
> In fact, the need to write xml->xml transducers (SGML has tought us that
> this need never goes away), argues that it must be _possible_ to see all
> whitespace at least _some_ of the time, regardless of document. That's one
> reason that the current "pass all whitespace" model works.

It 'works' in that it shifts the problem to the application developer. I like
the idea of an XML->XML transducer - perhaps in front of the application, or
callable within it.  If David thinks that such tools could be built 
independently of applications that is exactly what I am suggesting :-)

> 
> The other reason that it works, is that you an always ignore data that
> you're not interested in (whitespace) but you can never get access to data
> that is hidden from you -- therefore the convenience of "automatic
> whitespace removal" is an inability to see that space without using
> non-standard tools.

it's clear that an application *must* have access to all whitespace if it 
wants it (this is made clear by, say, the requirement of XMl_LINK to search
on pseudoelements).  However it should also be able to access a normalised
form of the document.

> 
> >>I think that we can aim for a set of options that could be used by a
> >post-parser
> >>processor. Different applications (**or document authors**) could choose
> >between
> >>them. Examples might be:
> >>	- normaliseCRLF (Neil's Rule 1)
> >>	- discardAllWS
> >>	- normaliseToSingleSpace
> 
> I agree that this is the right place for such processing to happen (between
> a parser and an application). I'm not yet sure whether these things are as
> reusable as people think. I do know that without the use of #FIXED
> attributes (so I could avoid markup in the instance) I would _not_ use
> these, but rather make sure that my application (or stylesheet language)
> had the ability to apply these policies on request, as needed.

But we do have #FIXED, right? In which case I generally agree.

> 
[...]
> This is the option that XML universally adopts. That means  that any other
> method can be implemented _by any processor that cares_. If one can imagine
> destroying meaning of a document's content by the flattening of all
> whitespace strings to a single space, then you may need more elements in
> your content model, if you are not able to control the software that will
> process the document.

This is a good point.

> 
> In other words the parser guarantees all WS will be visible to applications
> -- this makes designing and implementing WS dependent processing easy --
> but since applications are _not_ constrained as folding or other WS
> processing behaviour, document authors will have to be cautious in using
> significant whitespace. If you can't assume that applications to process
> your markup will do the right thing, then you should not play games with WS.

Yes. But where is the rigour in authoring going to come from? This is where
I believe that XML-DEV has a role.

> 
> This actually is not much of an issue for CML, since it's a reasonable
> assumption that any implementation of CML markup-display will have to do
> lots of special things, of which whitespace is the least.

No, the point was that CML wishes to re-use HTML and MathML as additonal
components in the document. And then meta-data, and ... So that the 
application will become bloated unless it can re-use the approaches from 
the rest of the community.

> 
[...]
> 
> 
> I think XML's agnostic position is the correct one for tha language.
> Authors should probably assume (unless they anticipate absolutely no
> re-use) that HTML-style draconian normalization might occur anywhere and
> use markup rather than whitespace, or at least CDATA sections. This
> position _may_ be moderated (a little) where a well-known DTD with
> well-defined WS rules can be used (like the TEI or HTML).

I agree on this.  The point I have been trying to promote is that it should
be possible to collate the requirements of such systems and offer them
on a re-usable basis.

I know from experience that it's extremely easy to go round in circles here.
If this discussion is going to echieve something - and I think that a number
of peopel would welcome this - then perhaps a revised set of the rules 
recently suggested, and adddressed to HTML-like usage (with perhaps other 
common current DTDs as well) would be beneficial.

An author could then say:
	- the content of FOO, BAR, FLIP can be expected to be treated by 
XML-DEV-HTML-like WS normalisation.
	- the content of BAZ, BLORT suffers WS stripping as described in
XML-DEV-HTML-like-stripping.  

and that's about it. If we can get something along those lines, then 
I think a reasonable number of people would take note. It doesn't just have 
to apply to HTML DTDs.


	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Mon Aug 25 14:02:00 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
Message-ID: <9626@ursus.demon.co.uk>

Thanks Marcus,

In message <340152AF.F224A51C@allette.com.au> Marcus Carr writes:
> Apologies in advance to all those who have thought and fought over this
> issue for a long time, but as a self-confessed critic of the claim that
> "XML is SGML", I feel compelled to throw my hat into the ring.
> 
> As far as I can see, there are only two circumstances when whitespace is
> an issue - receiving an XML document or authoring one. Receiving, it
> doesn't matter if you have a DTD or not - the application can determine
> from a well formed document whether it should regard an element's
> content as MIXED or ELEMENT. It does involve parsing it, but only until
> it sees mixed content. If elements are assumed to be ELEMENT until

I may have misunderstood this, but the problem seems to be that we cannot
reliably determine this if authors use whitespace for pretty-printing. If
what you mean is 'non-whitespace MIXED content' (i.e. content which has at
least one non-WS character in) then I'm sympathetic. IOW it is possible
to say 'treat anything with only WS content or element content as having element
content'.  This is exectly the sort of convention that I have been suggesting
people might propose. Whether it's workable depends on the reaction you get :-)

> proven otherwise, surely this wouldn't be a massive overhead. Authoring
> applications would be similar - the first time a tag contained mixed
> content, the application would reset the status of the element. The onus
> would from then on be on the application to assist the user in creating
> semantically correct documents, by such mechanisms as not allowing hard
> returns at element boundaries, in short, making significant whitespace
> look like significant whitespace.
> 
> MURATA Makoto wrote:
> 
> > Suppose that we have different kinds of tags for mixed-content
> > elements (e.g, <name:mixed> and </name:mixed>) and element-content
> > elements (e.g, <name:element> and </name:element>).  Then, even
> > non-validating parsers can tell element contents and mixed contents.
> > Does this help?

I think this approach does help, but might be implementable through PIs
(see below)

> 
> It seems that the choices are either the current proposal that nobody
                                           ^^^^^^^^^^^^^^^^
I assume you mean the current XML spec.  

> seems to feel is entirely satisfactory, or suggestions such as the
> above, which would certainly work, but ultimately may involve as great
> an overhead as sending the DTD. It seems to me that we're throwing the
> baby out with the bathwater by ignoring a solution such as declaring at
> the start of the document how whitespace in elements should be handled.

I think that this is exactly what some members of this list are striving
for.  The spec requires them to use one or more of:
	- a specific markup element (e.g. <NELWLINE/>)
	- a stylesheet 
	- a PI

> 
> I would also like to see DTDs sent to non-validating parsers, just so
> they could determine how to apply whitespace rules without necessarily
> having to do any structural parsing. If need be, two new types of

It seems axiomatic that there are already documents that do no conform
to any given DTD, so this isn't an option. It has been suggested that 
content could be defined on a per-element basis, but at present parsers are
expected to use this to validate the whole document.

> declared content could be added, ELEMENT and MIXED. They might behave
> the same way as ANY, or the DTD could be constructed even more loosely,
> where only MIXED elements were declared and everything else was
> defaulted to ELEMENT. This would result in a small DTD sent only for the
> sake of making the application aware of how to deal with whitespace. If
> desirable, no DTD need be sent, but the application's performance may
> suffer marginally for it. This is in keeping with the idea that an
> application need not know how to deal with a document as it comes in. As
> far as I can see, much of the functionality in XML (such as linking)
> relies on a DTD, so it's not going to be foreign to most XML
> applications anyway.

This seems possible, but it requires a change to the XML-spec.  XML WG
members read this list and if any of them think it's a good idea they might
take it up.  But my impression is that most take the view that David Durand
has posted - the spec is not capable of further refinement at this stage.

It may be possible to implement this through a PI. This could define which
elements had which type of content, e.g.
<?XML-WHITESPACE CONTENT="ELEMENT" ELEMENTS="UL OL"?>
<?XML-WHITESPACE CONTENT="MIXED" ELEMENTS="P EM B H1"?> <!-- etc. -->

> 
> The whitespace rules in SGML can be simplified - most people accept that
> they should. Because inclusions and exclusions aren't valid in XML
> anyway, the rules are already somewhat simpler. I would really like to
> see XML and SGML stay in synch - I think anything else would be to
> everyones disadvantage. There really isn't a lot of point in flaming me
> for this; the question is well intentioned and the current solution

There are no flames on xml-dev :-) We are all trying to solve a difficult
technical, perceptual and cultural problem. [The general standard of debate
and courtesy within the SGML community is impressive.]

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Mon Aug 25 14:02:02 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:19 2004
Subject: XML developers' day
Message-ID: <9627@ursus.demon.co.uk>

Like many other readers of this list I was not able to attend the 
XML-developers' day. I would find it extremely useful if anyone was able to 
report on that, highlighting the main problems people face. Any indications
as to how this list might serve the community would be valuable :-)

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Mon Aug 25 17:01:47 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
In-Reply-To: <340152AF.F224A51C@allette.com.au>
References: <9708250211.AA01302@lute.apsdc.ksp.fujixerox.co.jp>
Message-ID: <v03007801b0274c0920b2@[205.181.197.109]>

At 4:38 AM -0500 8/25/97, Marcus Carr wrote:
>Apologies in advance to all those who have thought and fought over this
>issue for a long time, but as a self-confessed critic of the claim that
>"XML is SGML", I feel compelled to throw my hat into the ring.

I looked with interest for the criticism of the claim, since that would be
useful information -- we've gone so far as to hold off critical feeatures
of XML in a few places to wait for the ISO to catch up in the current SGML
revision. One of the things they kindly agreed to update is the whitespace
rules, so that the XML rules can be turned on in the SGML declaration.

>As far as I can see, there are only two circumstances when whitespace is
>an issue - receiving an XML document or authoring one. Receiving, it
>doesn't matter if you have a DTD or not - the application can determine
>from a well formed document whether it should regard an element's
>content as MIXED or ELEMENT.

Since XML must deal with well formed documents (no DTD) the traditional
SGML whitespace rules _cannot_ be used, as element content and mixed
content are not distinguished in instances by _any_ dependable cues. The
limited DTD proposal pleased neither the DTD-haters, nor the DTD-lovers,
though it was in a draft for a long time.

> It does involve parsing it, but only until
>it sees mixed content. If elements are assumed to be ELEMENT until
>proven otherwise, surely this wouldn't be a massive overhead.

It might involve buffering large amounts for whitespace across an arbitrary
parser lookahead, since there is no limit on the size of an element, or
where the non-space PCDATA might show up.
One would have to buffer the entire document in the parser before one could
decide whether to emit any whitespace in the root element. This might be a
bit of a memory performance hit...

> Authoring
>applications would be similar - the first time a tag contained mixed
>content, the application would reset the status of the element. The onus
>would from then on be on the application to assist the user in creating
>semantically correct documents, by such mechanisms as not allowing hard
>returns at element boundaries, in short, making significant whitespace
>look like significant whitespace.
Manye people have claimed that they use editors incapable of funtioning
without inserting linends (of their local flavor) every 200 characters or
so. I (personally) wasn't very sympathetic to this argument, but it stood
in for the empirical observation that people are very loose with
whitespace/linends, and that forcing tools not to emit whatever line-ending
codes it wants could be a problem.

>MURATA Makoto wrote:
>
>> Suppose that we have different kinds of tags for mixed-content
>> elements (e.g, <name:mixed> and </name:mixed>) and element-content
>> elements (e.g, <name:element> and </name:element>).  Then, even
>> non-validating parsers can tell element contents and mixed contents.
>> Does this help?
>
>It seems that the choices are either the current proposal that nobody
>seems to feel is entirely satisfactory, or suggestions such as the
>above, which would certainly work, but ultimately may involve as great
>an overhead as sending the DTD. It seems to me that we're throwing the
>baby out with the bathwater by ignoring a solution such as declaring at
>the start of the document how whitespace in elements should be handled.

The real problem is that there's an assumption that a generic processor can
solve the "whitespace problem" -- and that is not really true. In a very
real sense the meaning of whitespace is a product of the document _and_ and
he application. For instance, line breaks (as indicated by whitespace)
might be critical in a typesetting application for poetry (but _only in
<poem> elements). The same document, however, would be best processed with
some form of whitespace-collapsing everywhere, when indexed by a full-text
search engine. The same data may have different signficance when processed
differently.

The fact is that whitespace should be controlled by the application. For
typesetting and display, this means that practically, it's going to be part
of the "stylesheet" or other processing mechanism. The advantage of "parser
handled whitespace" would be the ability to create meaningful, error-free
applications that can work on arbitrary markup _whithout a stylesheet or
other processing specification_. The only small problem with that
convenience is that such processing is basically impossible, for many more
reasons that telling where words end, or if CR; is a linend or just part of
a CRLF sequence.
>
> .....
> As
>far as I can see, much of the functionality in XML (such as linking)
>relies on a DTD, so it's not going to be foreign to most XML
>applications anyway.

This is not necessarily the case. It's also harder to detect mixed content
from DTD declarations, than simply to recognized #FIXED attributes.

>
>The whitespace rules in SGML can be simplified - most people accept that
>they should.

>I would really like to
>see XML and SGML stay in synch - I think anything else would be to
>everyones disadvantage.

Yes, this is very true -- and this battle has been won by the compatibility
camp -- they are in synch. SGML has a new "pass all whitespace" option for
the declaration. This is not going to be a big problem for existing
implementations, since it's incredibly easy for parsers to implement --
most have had to anyway, if they attempt to support SGML->SGML
transformation tools. I think SP already can do the right thing.

> There really isn't a lot of point in flaming me
>for this; the question is well intentioned and the current solution
>seems to have satisfied few. The concept of declaring things at the
>start is a tried and true methodology, yet we seem to be fleeing it in
>favor of something nobody's quite sure about.

   No flameage required. I agree with the intent -- just not your proposed
solutions. We went through all these permutations -- any form of
normalization _before_ the application causes some kind of problem. And
since there is, iun any case, no universal way to handle markup without a
external processing spec (that can include whitespace among its many other
factors) there's no reason to make the parser cause applications more
problems than they will have to solve already.

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Mon Aug 25 17:02:02 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace rules (v2)
In-Reply-To: <9623@ursus.demon.co.uk>
Message-ID: <v03007803b02753ddf76f@[205.181.197.109]>

At 6:36 AM -0500 8/25/97, Peter Murray-Rust wrote:
>I have been away for a few days so maybe it's a useful time to try to
>summarise
>the Whitespace debate and to ask a few questions. You don't need to read the
>rest of this unless you believe there is a problem to be addressed :-)

Afraid that I have to chime in when I see a non-problem consuming valuable
time...

>
>In message <v03007800b01fa935a1f1@[205.181.197.116]> dgd@cs.bu.edu (David
>G. Durand) writes:
>> I observed with dismay that the issue of whitespace has surfaced on this
>> list, after we finally gave it the wooden-stake-in-the-heart treatment on
>> the WG discussion lists. As a chief proponent of the current method, I'll
>
>:-) I am not sure what has been killed :-)

I hoped the discussion. Certainly I hoped the shibboleth of a parser
"normalizing" whitespace on behalf of the application.

>I will take David's points first, because I *do* believe that many of those
>who were involved in the development of the spec feel that there is no scope
>for further discussion of this *IN THE SPEC*.  I agree with this.

Actually, the only question remaining, in my mind, is how the XML
stylesheet language should allow shitespace to be processed. I disagree
that there is any need for a non-stylesheet, non-application convention for
whitespace. Note, that in some sense, the Document type _description_ (i.e.
descriptive prose desribing the intent of a DTD) and the "schema" notions
are application specifications, and are entitled to declare whitespace
handling rules.

>Essentially the spec says:
>	- This is a difficult problem.  [Actually it doesn't say this, but
>it might help if it did in a footnote.]
It's only difficult if you think that it's a parser problem. It's easy in
XML, because all whitespace is visible. I can think of no _simpler_ rule
that a _parser_ could implement.

>	- We have taken a minimalist approach where we do not give any support
>to any whitespace philosophy [other than PRESERVE which passes everything and
>can be platform-dependent], but leave this to the community. DEFAULT is simply
>the absence of PRESERVE.

Yes, since there is not a universal "whitespace philosophy" even for a
single document (see my response to Marcus for an example), there's no
reason to declare it in the instance.

>I believe this solves one species of problem, where the authoring tool/system
>is closely coupled to the application. CDF might be such a system (e.g. I have
>never seen a native CDF file).

No, it's a case where the "philosophy" is coupled to the application, not
to the "document" in the abstract -- except insofar as it is defined by a
"document type description" or "schema" -- which is essentially a set of
ideal constraints that applications are expected to follow.

>(A) There is a defined DTD (e.g. TEI, HTML) but a variety of authoring tools
>and a variety of applications from different providers. Traditionally these
>will come from the SGML community. I believe that there will certainly be
>initial problems where m'facturer X emits whitespace in a particular way
>which is incompatible with Y's tools for rendering/transforming it. It may
>also be platform dependent.  We've seen this in the development of HTML
>systems
>although they are improving.

TEI defines where whitesspace is signficant (almost nowhere if I remember
correctly).

>Remember that most SGML systems are current implemented within a single site
>(the tools are chosen to be compatible throughout the process). Very little
>SGML is delivered over the WWW to be consistent between different m'facturers.
>XML is specifically designed to be delivered over the WWW in (I assume)
>a platform and m'facturer-independent way.  Do we expect to see 'this XML
>file best viewed with FOO software'??? If so, we might as well give up now.

No, but every document will _have_ to either conform to a well-known DTD or
schema of some sort, or be delivered with a stylesheet, and those are
usefule places that this behavior should be explained.

>IMO any developer needs to be able to say:
>	(i) I support a wide range of XML DTDs.
>	(ii) I can easily customise my software to support a range of commonly
>used DTDs
>	(iii) Documents authored by my software should be readable by software
>from another m'facturer with whom I have had no formal discussions
>	(iv) My system can support a range of applications which read documents
>produced by other m'facturers systems and with whom I have had no formal
>discussions

Nothing in a stylesheet based solution violates this to my mind.

>If all the manufacturers tell me this is a non-problem, I'll shut up (on this
>issue!) If each DTD defines its own use of whitespace (or worse, doesn't
>define it) they may have a lot of work.
>
>(B) There are generic XML applications. The XML community continues to discuss
>documents which 'contain information from more than one DTD' or 'are WF but
>not necessarily valid(atable)'. Examples of these are:
>	(i) an XML document to which meta-data has been prepended.
I'm probably not the best person to address this, as I think that the
mix-and-match proposals are ill-thought out, but since the data is supposed
to recognizable, presumably it is also to be ignored by all applications
other than "meta-applications". So that's not a problem.

>	(ii) an XML document which includes chunks conforming to well-defined
>DTDs such as MathML.

In which case, they should have well-known stylesheets or descriptions that
explain any whitespace conventions in use.
>
>The possible combinations are indefinitely large.

But since each individual part must have defined bevhavior, this should not
be a problem.

>It is impossible to write bespoke software to process these documents, and we
>need generic mechanisms. Perhaps many will be dealt with by stylesheets, and
>maybe the WS issue is a question of developing appropriate conventions in
>stylesheets.  In documents of this sort there have to be conventions and flags
>that indicate how to interpret the documents. The spec has indicated that it
>shouldn't be in the XML markup - no problem.  Somehow conventions have to
>evolve, either conveyed implicitly or explicitly (e.g. through PIs).
>[Remember that there are - as yet - no agreed conventions as to what a PI can
>look like - you can put anything in after the target.]

I used to think this might be useful, but I can't actually think of any
application that could plausibly care about whitespace folding and also do
meaningful processing without knowledge of the DTD. A text-indexer can work
without a DTD, but also doesn't need any whitespace info (folding is always
good enough) -- and it needs to see every byte, because it may have to
track file offsets of hits.

Can you think of any other useful examples of "DTD-blind" applications that
might care about how the document _intended_ the whitespace to be
processed. I cofness that I can't.


>Note; I am NOT trying to find a universal solution here.  I am suggesting that
>we develop some common, useful approaches which will solve a reasonable
>number of problems.

But I don't actually see what problems we can solve with such solutions,
that are not better addressed in either the stylesheet or DTD/schema
problems.

>> The problem with this is that there are a large number of ways that
>> whitespace can be used: the "tokens" form mentioned at the end, for
>> example, has never been proposed for XML.
>
>I agree there are a large number of ways.  Some classification would be
>valuable and IMO the sort of thing that XML-DEV could usefully provide.
>[The WS-separated tokens are no different from 'words' in HTML and I would
>expect that a large number of people would welcome a convention on
>normalising whetspace between 'words'.]

Enumerating these might have some pedagogical value, but I no longer see
the practical value of declaring the behaviors. I used to think it might be
useful, but I'm not so sure.

>Then the application needn't implement them :-)  Applications have to do
>*something* about whitespace.  This can be:
>	- ignore the problem (or use PRESERVE)
>	- their own thing
>	- a set of choices which is understood by the community
>	- refuse to process the document.

Only 2 (their own thing) makes any sense -- and is typically driven by
their knwoledge of a DTD or possesion and following of the dictates of a
stylesheet.

>It 'works' in that it shifts the problem to the application developer. I like
>the idea of an XML->XML transducer - perhaps in front of the application, or
>callable within it.  If David thinks that such tools could be built
>independently of applications that is exactly what I am suggesting :-)

They are close to a _null_ application, and require _no_ whitespace
normalization, since they need only pass any whitespace they see straight
through. This was my original point. Only if you insist on "normalizing" do
you _create_ problems with transduction.

>it's clear that an application *must* have access to all whitespace if it
>wants it (this is made clear by, say, the requirement of XMl_LINK to search
>on pseudoelements).  However it should also be able to access a normalised
>form of the document.
Why? I think I've argued effectively that this is not useful without a
stylesheet or well-known DTD, and in those cases, it is not necessary (as
the DTD or stylesheet should declare the conventions in use).

>> This is the option that XML universally adopts. That means  that any other
>> method can be implemented _by any processor that cares_. If one can imagine
>> destroying meaning of a document's content by the flattening of all
>> whitespace strings to a single space, then you may need more elements in
>> your content model, if you are not able to control the software that will
>> process the document.
>
>This is a good point.
>
>>
>> In other words the parser guarantees all WS will be visible to applications
>> -- this makes designing and implementing WS dependent processing easy --
>> but since applications are _not_ constrained as folding or other WS
>> processing behaviour, document authors will have to be cautious in using
>> significant whitespace. If you can't assume that applications to process
>> your markup will do the right thing, then you should not play games with WS.
>
>Yes. But where is the rigour in authoring going to come from? This is where
>I believe that XML-DEV has a role.
I'm not sure what you mean here... If the application or DTD depend on
whitespace critically (a bad idea, probably, but a permissible one) -- then
it is the author's responsibility to use it properly (and select a tool
that let's her). Since the generic dumb text-editor is such a tool, and
it's widely available, I don't see a big problem here.

>> This actually is not much of an issue for CML, since it's a reasonable
>> assumption that any implementation of CML markup-display will have to do
>> lots of special things, of which whitespace is the least.
>
>No, the point was that CML wishes to re-use HTML and MathML as additonal
>components in the document. And then meta-data, and ... So that the
>application will become bloated unless it can re-use the approaches from
>the rest of the community.

I'm afraid I don't see how you're going to share code with an HTML
processor. Nor can I psych myself up to believe that whitespace folding
code:
  while (isspace(c = getc())) ;
  outchar = ' ';
is a big bloat problem in a program that can render organic chem reaction
diagrams.

>> I think XML's agnostic position is the correct one for tha language.
>> Authors should probably assume (unless they anticipate absolutely no
>> re-use) that HTML-style draconian normalization might occur anywhere and
>> use markup rather than whitespace, or at least CDATA sections. This
>> position _may_ be moderated (a little) where a well-known DTD with
>> well-defined WS rules can be used (like the TEI or HTML).
>
>I agree on this.  The point I have been trying to promote is that it should
>be possible to collate the requirements of such systems and offer them
>on a re-usable basis.

If it's useful, just list some policies and be done with it, I guess. In
answering this mail I've found that I no longer believe that it's very
important, because I don't see how to use it effectively anywhere.

>An author could then say:
>	- the content of FOO, BAR, FLIP can be expected to be treated by
>XML-DEV-HTML-like WS normalisation.
>	- the content of BAZ, BLORT suffers WS stripping as described in
>XML-DEV-HTML-like-stripping.
>
>and that's about it. If we can get something along those lines, then
>I think a reasonable number of people would take note. It doesn't just have
>to apply to HTML DTDs.

Why not. Make a web page for the policies, create a notation declaration
that points at it, and then use that notation as a prefix on a PI to
declare these things. It can't do any harm other than maybe wasting time.

  -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Mon Aug 25 17:47:50 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
Message-ID: <199708251547.QAA26726@GPO.iol.ie>

[David Durand]
>
>The fact is that whitespace should be controlled by the application.

I disagree. Leaving it to the application lowers the level at which
XML applications can achieve a "lock in effect" on XML documents to
a level that I find worrying.

User A : "What file format is that?"
User B : "It's MicroScape XML."
User A : "I better buy a copy of MicroScape so - otherwise the white space
will get busted again".


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From liamquin at interlog.com  Mon Aug 25 18:11:24 1997
From: liamquin at interlog.com (Liam Quin)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
In-Reply-To: <199708251547.QAA26726@GPO.iol.ie>
Message-ID: <Pine.BSI.3.95.970825120742.29986C-100000@shell1.interlog.com>

On Mon, 25 Aug 1997, Sean Mc Grath wrote:
> User A : "What file format is that?"
> User B : "It's MicroScape XML."
> User A : "I better buy a copy of MicroScape so - otherwise the white space
> will get busted again".

If this happens, it wlil be time to standardise whitespace handling at the
applicaton level, perhaps.  Right now, I fnd this argument totally bogus.
You might as well point out that Microsoft Excel (say) interprets
<formula> and <cell> in one way, and PrisonGlue interprets them differently.

Whitespace treatment needs to be specified in the CML specification,
for example, and then any conforming CML processor will do the right
thing and there's no problem.  Taking CML and passing it to a CDF processor
will result in different whitespace treatment, I expect... and also different
treatment of all the non-whitespace too!  And that's fine.

Lee

-- 
Liam Quin --  the barefoot typographer -- Toronto
lq-text: freely available Unix text retrieval

email address: liamquin, at host: interlog dot com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Mon Aug 25 20:34:38 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
Message-ID: <199708251834.TAA03755@GPO.iol.ie>

>On Mon, 25 Aug 1997, Sean Mc Grath wrote:
>> User A : "What file format is that?"
>> User B : "It's MicroScape XML."
>> User A : "I better buy a copy of MicroScape so - otherwise the white space
>> will get busted again".

[Liam Quin]
>If this happens, it wlil be time to standardise whitespace handling at the
>applicaton level, perhaps.  Right now, I fnd this argument totally bogus.

What are you saying? Lets wait and see if the horse bolts - if he
does we will lock the barn door?


Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From mrc at allette.com.au  Tue Aug 26 01:21:50 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
References: <9708250211.AA01302@lute.apsdc.ksp.fujixerox.co.jp> <v03007801b0274c0920b2@[205.181.197.109]>
Message-ID: <3402129C.22871191@allette.com.au>

David G. Durand wrote:

> > It does involve parsing it, but only until
> >it sees mixed content. If elements are assumed to be ELEMENT until
> >proven otherwise, surely this wouldn't be a massive overhead.
>
> It might involve buffering large amounts for whitespace across an
> arbitrary parser lookahead, since there is no limit on the size of an
> element, or where the non-space PCDATA might show up. One would have
> to buffer the entire document in the parser before one could decide
> whether to emit any whitespace in the root element. This might be a
> bit of a memory performance hit...

Why would you need to buffer anything? Every element starts with a
default value of 'element'. As they're shown to be otherwise, their
status is revised. This involves tracking open elements, not picking up
chunks and reviewing them. One linear pass of the document tells you all
you need to know.

> Manye people have claimed that they use editors incapable of
> funtioning without inserting linends (of their local flavor) every 200
> characters or so. I (personally) wasn't very sympathetic to this
> argument, but it stood in for the empirical observation that people
> are very loose with whitespace/linends, and that forcing tools not to
> emit whatever line-ending codes it wants could be a problem.

This would still respect the limits set by the user in the same way an
application would behave when you turn off hyphenation - the line might
be shorter, but it's broken in a sensible place.

> The real problem is that there's an assumption that a generic
> processor can solve the "whitespace problem" -- and that is not really
> true. In a very real sense the meaning of whitespace is a product of
> the document _and_ and he application. For instance, line breaks (as
> indicated by whitespace) might be critical in a typesetting
> application for poetry (but _only in <poem> elements). The same
> document, however, would be best processed with some form of
> whitespace-collapsing everywhere, when indexed by a full-text search
> engine. The same data may have different signficance when processed
> differently.

If line breaks are critical, they should be marked explicitly. If you
gave a hand written poem to a data entry person with no knowledge of
poetry, you may have to specify that you want the current line
boundaries respected. Why should an application not be given the same
info?

> The fact is that whitespace should be controlled by the application.
> For typesetting and display, this means that practically, it's going
> to be part of the "stylesheet" or other processing mechanism.

Whitespace is also a mechanism used to make data readable. In that
sense, a space is a character in it's own right, not just something that
appears around words. Imagine the response if it wasn't whitespace that
was being discussed, it was the letter 'x', and we were telling people
'x' may or may not appear in their data.

> > As
> >far as I can see, much of the functionality in XML (such as linking)
> >relies on a DTD, so it's not going to be foreign to most XML
> >applications anyway.
>
> This is not necessarily the case. It's also harder to detect mixed
> content from DTD declarations, than simply to recognized #FIXED
> attributes.

It can't be that hard. If parameter entities (I assume they're allowed?)
have to be unravelled anyway, surely it's just a case of looking at the
content model? If it starts with #PCDATA and contains anything else,
it's mixed content.

> >I would really like to
> >see XML and SGML stay in synch - I think anything else would be to
> >everyones disadvantage.
>
> Yes, this is very true -- and this battle has been won by the
> compatibility camp -- they are in synch. SGML has a new "pass all
> whitespace" option for the declaration. This is not going to be a big
> problem for existing implementations, since it's incredibly easy for
> parsers to implement -- most have had to anyway, if they attempt to
> support SGML->SGML transformation tools. I think SP already can do the
> right thing.

"Pass all whitespace" will go some distance toward fixing the problem,
but what else does it impact? Does it mean that inclusions and
exclusions suddenly appear differently than they did in the 'old SGML'?

> And since there is, iun any case, no universal way to handle markup
> without a external processing spec (that can include whitespace among
> its many other factors) there's no reason to make the parser cause
> applications more problems than they will have to solve already.

My understanding is that one of the basic requirements of XML was that
the applications had to be easy to write, so things could be allowed to
happen quickly. As much as I do agree that this would have to be a good
thing, (and as you pointed out, applications are coming out already) I
would argue that maybe they should be more difficult to write, but
should address this issue correctly.


--
Regards

Marcus Carr                  email:  mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia)  email:  info@allette.com.au
Level 10, 91 York Street     www:    http://www.allette.com.au
Sydney 2000 NSW Australia    phone:  +61 2 9262 4777
                             fax:    +61 2 9262 4774
_______________________________________________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From mrc at allette.com.au  Tue Aug 26 01:25:31 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
References: <199708251834.TAA03755@GPO.iol.ie>
Message-ID: <340213FD.CF236CC2@allette.com.au>

Sean Mc Grath wrote:

> >On Mon, 25 Aug 1997, Sean Mc Grath wrote:
> >> User A : "What file format is that?"
> >> User B : "It's MicroScape XML."
> >> User A : "I better buy a copy of MicroScape so - otherwise the
> white space
> >> will get busted again".
>
> [Liam Quin]
> >If this happens, it wlil be time to standardise whitespace handling
> at the
> >applicaton level, perhaps.  Right now, I fnd this argument totally
> bogus.
>
> What are you saying? Lets wait and see if the horse bolts - if he does
> we will lock the barn door?

By then, there will be far too many hands on the door to think about
locking it; the best you can hope for is to kiss the horse goodbye on
the way past.


--
Regards

Marcus Carr                  email:  mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia)  email:  info@allette.com.au
Level 10, 91 York Street     www:    http://www.allette.com.au
Sydney 2000 NSW Australia    phone:  +61 2 9262 4777
                             fax:    +61 2 9262 4774
_______________________________________________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Tue Aug 26 07:36:27 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
In-Reply-To: <340152AF.F224A51C@allette.com.au> (message from Marcus Carr on Mon, 25 Aug 1997 19:38:56 +1000)
Message-ID: <199708260532.WAA00995@boethius.eng.sun.com>

It's not up to me to tell this group what to talk about, but I think
that you should be aware that the WG discussed the issue of whitespace
to the point of complete exhaustion during no less than three separate
phases of the design process, and the chances of it being formally
reconsidered in the XML 1.0 time frame are exactly zero.  A discussion
of conventions for specific classes of user agents (e.g., web
browsers) is useful, but I feel that it's my obligation to point out
to anyone mistakenly thinking that this issue might conceivably be
reconsidered in the current XML specification that it is not going to
happen.

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Tue Aug 26 10:31:19 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
Message-ID: <9649@ursus.demon.co.uk>

In message <199708260532.WAA00995@boethius.eng.sun.com> Jon.Bosak@eng.Sun.COM (Jon Bosak) writes:
> It's not up to me to tell this group what to talk about, but I think
> that you should be aware that the WG discussed the issue of whitespace
> to the point of complete exhaustion during no less than three separate
> phases of the design process, and the chances of it being formally
> reconsidered in the XML 1.0 time frame are exactly zero.  A discussion
                                             ^^^^^^^^^^^^
This is the position I have been taking - there is no suggestion that we 
should ask the WG for a change to the spec.  My suggestions to this group 
were based on the assumption that there was a group of developers who were
sufficiently interested in this problem that they could develop some protocols
which might be helpful to the community.

The following mechanisms are consistent with the current spec and do not
require changes:
	1. stylesheets. The authors can describe how they expect stylesheet
	processors to treat their documents. 
	2. PIs (e.g. <?WHITESPACE ... ?>
	3. additional elements in the DTD (e.g. NEWLINE).
	4. implicit conventions (i.e. 'always replace CR/LF with CR').

(Have I missed anything?)

We are clear that this has been discussed at great length on the WG and are
not seeking to re-open that discussion. My suggestion here is that we are
trying to see how the WG's conclusion can be implemented.

> of conventions for specific classes of user agents (e.g., web
> browsers) is useful, but I feel that it's my obligation to point out
            ^^^^^^^^^
Some people think this is a waste of time.  Perhaps it may turn out to be.
Unlike the discussions on the spec, this group has no stated goals and exists
to provide mutual support for those developing XML applications. If a number
of people feel this is worth discussing,  then see let's see if they can 
achieve anything. If *they* wish to spend the time trying to do this, it
needn't waste other people's ... :-)

My own feelings are that only mechanisms 1 and 2 above are likely to find
favour. I think that PIs can be further explored in this discussion.
(Perhaps I should not have used <?XML-WHITESPACE .. ?> as this would (I think)
require WG approval, so I would rephrase this as <?XDEV-WHITESPACE .. ?>)
Given that, it seems possible to include PI statements within the document as
the how the author intends the whitespace to be treated.

It may be argued that this can be done better with stylesheets. Perhaps I'm
conservative, but I see PIs embedded in a document as 'being part of the
document' to a greater extent than stylesheets which are more likely to be
changed by people other than the document's authors. 

> to anyone mistakenly thinking that this issue might conceivably be
> reconsidered in the current XML specification that it is not going to
> happen.
> 
	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From mrc at allette.com.au  Tue Aug 26 10:37:13 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
References: <199708260532.WAA00995@boethius.eng.sun.com>
Message-ID: <34029587.52568DFA@allette.com.au>

Jon Bosak wrote:

> It's not up to me to tell this group what to talk about, but I think
> that you should be aware that the WG discussed the issue of whitespace
> to the point of complete exhaustion during no less than three separate
> phases of the design process, and the chances of it being formally
> reconsidered in the XML 1.0 time frame are exactly zero.

I did go out of my way in my mail yesterday to recognise the work that
has been done on the standard, and I can appreciate how it must bore you
to see all this re-hashed for the hundredth time, but not all of us have
had the benefit/curse of the extensive exposure to this topic that you
have.

> A discussion of conventions for specific classes of user agents (e.g.,
> web browsers) is useful, but I feel that it's my obligation to point
> out to anyone mistakenly thinking that this issue might conceivably be
> reconsidered in the current XML specification that it is not going to
> happen.

I'm not asking for anything to happen, but I do believe these things
should be allowed to be discussed. If people tire of the topic, they'll
stop talking about it - knocking healthy (even if misguided) discussion
on the head contributes nothing.


--
Regards

Marcus Carr                  email:  mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia)  email:  info@allette.com.au
Level 10, 91 York Street     www:    http://www.allette.com.au
Sydney 2000 NSW Australia    phone:  +61 2 9262 4777
                             fax:    +61 2 9262 4774
_______________________________________________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From neil at bradley.co.uk  Tue Aug 26 10:39:12 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
Message-ID: <199708260838.JAA23835@andromeda.ndirect.co.uk>


>Sean Mc Grath
> >On Mon, 25 Aug 1997, Sean Mc Grath wrote:
> >> User A : "What file format is that?"
> >> User B : "It's MicroScape XML."
> >> User A : "I better buy a copy of MicroScape so - otherwise the white space
> >> will get busted again".
> 
> [Liam Quin]
> >If this happens, it wlil be time to standardise whitespace handling at the
> >applicaton level, perhaps.  Right now, I fnd this argument totally bogus.
> 
> What are you saying? Lets wait and see if the horse bolts - if he
> does we will lock the barn door?
> 
> Sean Mc Grath
 
I agree with you totally. The horse will bolt, for certain. I want to 
be able to use XML editor A, and allow people to view the 
output on browser B and C, publish it on DTP system D,
send the data to someone else using editor E,
and let people search for pseude-elements using extended pointers
in products E and F, and all without extra spaces appearing or
vital spaces disappearing at any point.

I cannot understand why some people think this will not be problem. 
We are getting extreme views here, from let the XML processor handle 
it, to let every application do its own thing. Neither position is acceptable. 
OK, lets rule out special cases. I can accept that CML and CDF etc 
will have their own strict rules, perhaps, but I am far more 
concerned with general document editing and publishing (the sort of 
things HTML and SGML have been primarily used for).

Personally, I am happy to say this issue is beyond the XML processor, 
and should be handled by the application. Fine. But let all 
PUBLISHING RELATED applications adopt the same guidelines. Too many 
developers are going to miss problems which we could help avoid if we 
could arrive at even a partial setof guidelines. Personally, I think 
we can achieve more than this.

Do we want XML to gain a reputation as an unreliable 
data exchange and publishing format?

We should not have to burden document authors with processing codes, 
etc. People want the ease of use of HTML (and, dare I say it, SGML 
too, in this respect at least). I still think this is unnecessary. 

Others have recently proposed the style sheet as the answer, and I 
agree. My original proposal to base some of the rules on in-line/block definitions 
assumed this approach. It is more reliable than 
element content versus mixed content. I do not, however, think we 
need to go as far as waiting for the official DSSSL based style sheet 
to be completed. I for one do not believe all XML-aware applicaitons 
will use it, and certainly not in the short term. Any config file or 
style sheet will suffice.

People are also proposing all kind of Unicode special characters to 
perform vital tasks. Let's remember here that few people even have 
the specification, let alone use this set extensively. I am sure its 
time will come, but let us be realistic. XML is going to be in 
widespread use first, and needs to be workable with 7-bit ASCII, if 
possible, and ISO 8859 if not.

I did not expect the rules I (nervously and tentatively) proposed to be acceptable. 
But I did hope they could form the basis of detail discussion, from 
which a better set of rules would emerge. Unfortunately, we seem to 
be getting nowhere. I am trying not to depair. But it's hard.

Neil.

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Tue Aug 26 14:00:46 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
Message-ID: <9653@ursus.demon.co.uk>

In message <199708260838.JAA23835@andromeda.ndirect.co.uk> "Neil Bradley" writes:
> 
[...]
> OK, lets rule out special cases. I can accept that CML and CDF etc 
> will have their own strict rules, perhaps, but I am far more 

Actually I would like to develop CML *without* its own set of rules as far
as possible. OK, Only chemists want to know how to display <ATOMS>, but
there is just as much material of the form:
<P> We took
<VAR TYPE="float">23.03+e02</VAR>
<UNIT>gram</UNIT>
of water
</P>
and we want to know whether there is whitespace round the contained elements.
As I have repeatedly said I would like to borrow a communal solution rather 
than invent yet another one.

> concerned with general document editing and publishing (the sort of 
> things HTML and SGML have been primarily used for).
> 
> Personally, I am happy to say this issue is beyond the XML processor, 
> and should be handled by the application. Fine. But let all 
> PUBLISHING RELATED applications adopt the same guidelines. Too many 
> developers are going to miss problems which we could help avoid if we 
> could arrive at even a partial setof guidelines. Personally, I think 
> we can achieve more than this.

CML is actually aimed very much at the publishing process.  I want to be
able to combine text, images, vector graphics, maths, and chemistry and for
a technically oriented published to be able to process it. I accept that
some people think this merging of XML from different sources is 
unrealistic, but there are others who share the same vision - we'll find
out soon enough whether it's a disaster! In any case, we can always 
mix and match using XML-LINK EMBED.
> 
> Do we want XML to gain a reputation as an unreliable 
> data exchange and publishing format?
> 
> We should not have to burden document authors with processing codes, 
> etc. People want the ease of use of HTML (and, dare I say it, SGML 
> too, in this respect at least). I still think this is unnecessary. 
> 
> Others have recently proposed the style sheet as the answer, and I 
> agree. My original proposal to base some of the rules on in-line/block definitions 
> assumed this approach. It is more reliable than 
> element content versus mixed content. I do not, however, think we 
> need to go as far as waiting for the official DSSSL based style sheet 

Could you expand this? It is intended to produce a single official style
sheet that covers all of this?

> to be completed. I for one do not believe all XML-aware applicaitons 
> will use it, and certainly not in the short term. Any config file or 
> style sheet will suffice.
> 
> People are also proposing all kind of Unicode special characters to 
> perform vital tasks. Let's remember here that few people even have 
> the specification, let alone use this set extensively. I am sure its 
> time will come, but let us be realistic. XML is going to be in 
> widespread use first, and needs to be workable with 7-bit ASCII, if 
> possible, and ISO 8859 if not.

I would strongly argue against Unicode characters at this stage. *I* wouldn't
know where to get them from, and typing by hand could be a disaster. It
will take a while before Unicode is natural to HTML authors.
> 
> I did not expect the rules I (nervously and tentatively) proposed to be acceptable. 
> But I did hope they could form the basis of detail discussion, from 
> which a better set of rules would emerge. Unfortunately, we seem to 
> be getting nowhere. I am trying not to depair. But it's hard.
     ^^^^^^^^^^^^^^^
Don't despair.  There seem to be a group of people on this list who think it's
worth pursuing.  Several ideas have been suggested. If nothing else it's
probably worth summarising what they can do and where they fall down
(seriously). If they can be encapsulated in a stylesheet, perhaps so much
the better.

The problem is probably knowing where to draw the boundary as to what these 
rules will accomplish. Solve part of the problem and see if it appeals to
a sufficient number of people. 

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Tue Aug 26 15:09:13 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
Message-ID: <199708261308.OAA25545@mail.iol.ie>

>
>>Sean Mc Grath
>> >On Mon, 25 Aug 1997, Sean Mc Grath wrote:
>> >> User A : "What file format is that?"
>> >> User B : "It's MicroScape XML."
>> >> User A : "I better buy a copy of MicroScape so - otherwise the white space
>> >> will get busted again".
>> 
>> [Liam Quin]
>> >If this happens, it wlil be time to standardise whitespace handling at the
>> >applicaton level, perhaps.  Right now, I fnd this argument totally bogus.
>> 
>> What are you saying? Lets wait and see if the horse bolts - if he
>> does we will lock the barn door?
>> 
>> Sean Mc Grath

[Neil Bradley] 
>I agree with you totally. The horse will bolt, for certain. I want to 
>be able to use XML editor A, and allow people to view the 
>output on browser B and C, publish it on DTP system D,
>send the data to someone else using editor E,
>and let people search for pseude-elements using extended pointers
>in products E and F, and all without extra spaces appearing or
>vital spaces disappearing at any point.
>
[Lots of v. good points about WS elided]

Is this a fair summary of the position then? :-

1) WS handling is an application convention - not part of the XML standard

2) Different applicatioms are free to have different conventions

3) There is a generally agreed need to work out some conventions/idioms
because:-
        a) They will give app. developers a leg up on a potentially
difficult topic
        b) They will hopefully contain the "distilled essense" of the WS
intelligensia
        c) They will give tools purchasers a stick with which to beat
vendors IFF divigations
        from the conventions prove troublesome. I.e. "does your XML tool
support the Bradley
        conventions for white space handling...?"

If so. Lets go for it. How about we concentrate in the first instance on
inter-operability
of straight XML editing tools?

                
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Tue Aug 26 15:52:37 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:19 2004
Subject: Whitespace
In-Reply-To: <199708251834.TAA03755@GPO.iol.ie>
Message-ID: <v03007800b02895fcc6a4@[205.181.197.109]>

At 1:07 PM -0500 8/25/97, Sean Mc Grath wrote:
>[Liam Quin]
>>If this happens, it wlil be time to standardise whitespace handling at the
>>applicaton level, perhaps.  Right now, I fnd this argument totally bogus.
>
>What are you saying? Lets wait and see if the horse bolts - if he
>does we will lock the barn door?

This is a lovely example of how quoting out of context can replace giving a
counter-argument. Here's the _substantive_ part of Liam's note:

>Whitespace treatment needs to be specified in the CML specification,
>for example, and then any conforming CML processor will do the right
>thing and there's no problem.  Taking CML and passing it to a CDF processor
>will result in different whitespace treatment, I expect... and also different
>treatment of all the non-whitespace too!  And that's fine.

The point is (and I also made this at length before) there are few ways to
meaningfully process markup without knowing the DTD or having a stylesheet,
there are even fewer ways to process such markup (w/out stylsheet or
knowledge of the DTD) such that ignoring whitespace is something that you
_need_ to do. XML already provides you with _all_ the whitespace (unlike
original flavor SGML) -- so there's no problem with the parser hiding
significant whitespace. The only question is whether we should be addding
features to note that some whitespace is _insignficant_. In fact I believe
that small set of ways to process markup (that you don't know the meaning
of, without a processing spec), and where you _have to_ collapse or
otherwise mangle the whitespace is the _null set_.

As I asked before, I'd like to see even one example of of an application
that needs this.

   -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Tue Aug 26 15:52:49 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:20 2004
Subject: Whitespace
In-Reply-To: <199708260838.JAA23835@andromeda.ndirect.co.uk>
Message-ID: <v03007801b02897e53987@[205.181.197.109]>

At 5:51 PM -0500 8/25/97, Neil Bradley wrote:
>I want to
>be able to use XML editor A, and allow people to view the
>output on browser B and C, publish it on DTP system D,
>send the data to someone else using editor E,
>and let people search for pseude-elements using extended pointers
>in products E and F, and all without extra spaces appearing or
>vital spaces disappearing at any point.


Vital spaces will never disappear in _XML parsing_ because all whitespace
is literally passed along. This means that the safe thing is just to leave
it in, and define stylesheets so they can strip any excess space.

They'll only be disappearing if applications have bugs (which can be dealt
with app-by-app, or if XML processors start "doing favors" for applications
by "pre-normalizing" the data.

>I cannot understand why some people think this will not be problem.

I don't understand how it _can_ be a problem (in general, rather than due
to particular bugs).

>We are getting extreme views here, from let the XML processor handle
>it, to let every application do its own thing. Neither position is
>acceptable.
>OK, lets rule out special cases. I can accept that CML and CDF etc
>will have their own strict rules, perhaps, but I am far more
>concerned with general document editing and publishing (the sort of
>things HTML and SGML have been primarily used for).
In general document editing, you still have DTDs and will still have
conventions for whitespace. In particular, any formatting application _must
have_ a stylesheet or other formatting spec. That is the correct place for
formatting information about whitespace collapse to be specified.

>Do we want XML to gain a reputation as an unreliable
>data exchange and publishing format?

Then we'd better not start dropping data in the parser!

>We should not have to burden document authors with processing codes,
>etc. People want the ease of use of HTML (and, dare I say it, SGML
>too, in this respect at least). I still think this is unnecessary.

>Others have recently proposed the style sheet as the answer, and I
>agree. My original proposal to base some of the rules on in-line/block
>definitions
>assumed this approach. It is more reliable than
>element content versus mixed content. I do not, however, think we
>need to go as far as waiting for the official DSSSL based style sheet
>to be completed. I for one do not believe all XML-aware applicaitons
>will use it, and certainly not in the short term. Any config file or
>style sheet will suffice.

Personally, despite the sliught nausea engendered by the theought, I expect
that some CSS variation will be the one in common use -- and that CSS will
usually fold space like HTML does now.

>People are also proposing all kind of Unicode special characters to
>perform vital tasks. Let's remember here that few people even have
>the specification, let alone use this set extensively. I am sure its
>time will come, but let us be realistic. XML is going to be in
>widespread use first, and needs to be workable with 7-bit ASCII, if
>possible, and ISO 8859 if not.

XML is _defined_ to be Unicode, and the only way to do simple 8-bit
processors is to use UTF-8 -- but of course, that just makes special
unicode chars look like "escape sequences". Not so bad, really.

>I did not expect the rules I (nervously and tentatively) proposed to be
>acceptable.
>But I did hope they could form the basis of detail discussion, from
>which a better set of rules would emerge. Unfortunately, we seem to
>be getting nowhere. I am trying not to depair. But it's hard.

All I care about is that XML-dev not give the impression that generic XML
processors should start folding whitespace, since we explicitly removed
whitespace processing from XML to avoid the "vanishing space problem".

If we can find any applications other than formatting, and that don't
depend on knowing the meanings of the tags, then we need to consider using
PIs to declare special whitespace folding in a document. I don't currently
believe that such applications exist -- because I can't some up with any.
When I thought they _might_ exist, I thought that this kind of spec. would
be a good idea. Now it just seems to add confusion where we had made
simplicity.

I still think "all whitespace is significant" is the simplest rule we can
use that allows everything that we can do today.

  -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Tue Aug 26 16:26:36 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:20 2004
Subject: Whitespace
In-Reply-To: <9653@ursus.demon.co.uk>
Message-ID: <v03007804b028a18a7dbc@[205.181.197.114]>

At 7:39 AM -0500 8/26/97, Peter Murray-Rust wrote:
>Actually I would like to develop CML *without* its own set of rules as far
>as possible. OK, Only chemists want to know how to display <ATOMS>, but
>there is just as much material of the form:
><P> We took
><VAR TYPE="float">23.03+e02</VAR>
><UNIT>gram</UNIT>
>of water
></P>
>and we want to know whether there is whitespace round the contained elements.
>As I have repeatedly said I would like to borrow a communal solution rather
>than invent yet another one.
But there is whitespace around the contained elements. If you don't want
it, don't put it in...

XML passes all whitespace in the source to the application.

<P> We took <VAR TYPE="float">23.03+e02</VAR><UNIT>gram</UNIT> of water</P>

has no space. An Author can be told to enter either one, depending on what
they want. If you want the effect of my markup with the source you gave,
that's a CML convention, to the effect that VAR and UNIT "eat" adjacent
whitespace...

or a formatting convention.

The worst problem I see with whitespace is one that can't be solved by a
parser easily:

if I have a document bit like:

This is an end of paragraph.</P><p>And this is the start of another.

There's no way to tell that there isn't a word "paragraph.And" in the
document, without knowing the meaning of the tags. Of course there is only
one word in:

<font size=+5>L</font>arge initial letter

But this tends to bear out my view that whitespace handling is just the tip
of an iceberg only soluble with a lot of semantic knowledge -- that it is
the duty of stylesheet and DTD authors to determine.


_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Tue Aug 26 16:26:55 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:20 2004
Subject: Whitespace
Message-ID: <199708261426.PAA01594@mail.iol.ie>

>At 1:07 PM -0500 8/25/97, Sean Mc Grath wrote:
>>[Liam Quin]
>>>If this happens, it wlil be time to standardise whitespace handling at the
>>>applicaton level, perhaps.  Right now, I fnd this argument totally bogus.
>>
>>What are you saying? Lets wait and see if the horse bolts - if he
>>does we will lock the barn door?
>
>This is a lovely example of how quoting out of context can replace giving a
>counter-argument. Here's the _substantive_ part of Liam's note:

[David Durand]
My counter argument (not reproduced above) *followed* the sentence you *have*
reproduced. A lovely example of how quoting....

Here is a concrete scenario that either illustrates the problem or
illustrates my ignorance.

I want to know how two XML applications that apply different
WS conventions can inter-operate losslessly. Specifically, why is this
scenario wrong? :-

I wish to perform a null transformation across two editing tools App A and
App B.

foo.xml --> App A --> App B --> bar.xml

I want foo.xml == bar.xml

App A : reads foo.xml and treats WS according to APPA-WS-RULES
        writes temp.xml

App B : reads temp.xml and treats WS according to APPB-WS-RULES
        writes bar.xml

Result : foo1.xml != bar.xml


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Tue Aug 26 16:27:24 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:20 2004
Subject: Whitespace
In-Reply-To: <3402129C.22871191@allette.com.au>
References: <9708250211.AA01302@lute.apsdc.ksp.fujixerox.co.jp>
 <v03007801b0274c0920b2@[205.181.197.109]>
Message-ID: <v03007802b0289ba01a1d@[205.181.197.109]>

At 6:17 PM -0500 8/25/97, Marcus Carr wrote:
>David G. Durand wrote:
>
>> > It does involve parsing it, but only until
>> >it sees mixed content. If elements are assumed to be ELEMENT until
>> >proven otherwise, surely this wouldn't be a massive overhead.
>>
>> It might involve buffering large amounts for whitespace across an
>> arbitrary parser lookahead, since there is no limit on the size of an
>> element, or where the non-space PCDATA might show up. One would have
>> to buffer the entire document in the parser before one could decide
>> whether to emit any whitespace in the root element. This might be a
>> bit of a memory performance hit...
>
>Why would you need to buffer anything? Every element starts with a
>default value of 'element'. As they're shown to be otherwise, their
>status is revised. This involves tracking open elements, not picking up
>chunks and reviewing them. One linear pass of the document tells you all
>you need to know.

Well, you can't send any of the data within an element while you are
"tracking", since you don't know if whitespace is data or noise. So you
have to buffer element opens and closes, and any PCDATA, until the element
is over (or you find non-WS PCDATA). The easy to see worst case has
megabytes of doc with one lone  string: "The end" in the content of the
top-level element, that otherwise contains only elements and no data.

>> Manye people have claimed that they use editors incapable of
>> funtioning without inserting linends (of their local flavor) every 200
>> characters or so. I (personally) wasn't very sympathetic to this
>> argument, but it stood in for the empirical observation that people
>> are very loose with whitespace/linends, and that forcing tools not to
>> emit whatever line-ending codes it wants could be a problem.
>
>This would still respect the limits set by the user in the same way an
>application would behave when you turn off hyphenation - the line might
>be shorter, but it's broken in a sensible place.

Exactly. So as I said, this ptoential justification for whitespace mangling
is a non-starter. Thanks for the support.

>
>> The real problem is that there's an assumption that a generic
>> processor can solve the "whitespace problem" -- and that is not really
>> true. In a very real sense the meaning of whitespace is a product of
>> the document _and_ and he application. For instance, line breaks (as
>> indicated by whitespace) might be critical in a typesetting
>> application for poetry (but _only in <poem> elements). The same
>> document, however, would be best processed with some form of
>> whitespace-collapsing everywhere, when indexed by a full-text search
>> engine. The same data may have different signficance when processed
>> differently.
>
>If line breaks are critical, they should be marked explicitly. If you
>gave a hand written poem to a data entry person with no knowledge of
>poetry, you may have to specify that you want the current line
>boundaries respected. Why should an application not be given the same
>info?

It should. In a stylesheet. I've still not seen an example of a case where
an application that doesn't know the DTD, and doesn't have a processing
spec needs to _collapse_ whitespace. XML always passes all the whitespace,
so it is never lost except by _explicit application action_.

>
>> The fact is that whitespace should be controlled by the application.
>> For typesetting and display, this means that practically, it's going
>> to be part of the "stylesheet" or other processing mechanism.
>
>Whitespace is also a mechanism used to make data readable. In that
>sense, a space is a character in it's own right, not just something that
>appears around words. Imagine the response if it wasn't whitespace that
>was being discussed, it was the letter 'x', and we were telling people
>'x' may or may not appear in their data.

You are the one arguing that it must be possible to "turn off x's" when
convenient. XML _Passes all whitespace_. The only kind of convention we can
create is one that turns off some whitespace. I see that as dangerous, for
the reasons you give. We may be in raging agreement!

>
>> > As
>> >far as I can see, much of the functionality in XML (such as linking)
>> >relies on a DTD, so it's not going to be foreign to most XML
>> >applications anyway.
>>
>> This is not necessarily the case. It's also harder to detect mixed
>> content from DTD declarations, than simply to recognized #FIXED
>> attributes.
>
>It can't be that hard. If parameter entities (I assume they're allowed?)
>have to be unravelled anyway, surely it's just a case of looking at the
>content model? If it starts with #PCDATA and contains anything else,
>it's mixed content.

If you don't have a DTD, you don't have content models. Even if you do have
the DTD, a minimal parse would involve "entity unravelling" -- a serious
increment in complexity just to be able to ignore a few spaces. In any
case, XML has decided to elkiminate SGML's arcane whitespace rules, since
the ISO has agreed to create an SGML declaration option that will have the
same effect.

>"Pass all whitespace" will go some distance toward fixing the problem,
>but what else does it impact? Does it mean that inclusions and
>exclusions suddenly appear differently than they did in the 'old SGML'?

There are no inclusions or exlcusions in XML. If you are using the new
declaration in the new SGML you'll have to read the spec and find out, but
it's irrelevant to XML.

The XML authors worked through the consequences, but it wasn't very hard,
since most of the problematic features of SGML (inclusion exceptions,
shortrefs, minimization) were already gone, so the interactions were simple.

The one wierd thing is that the distinction between whitespace behavior for
element and mixed content no longer exists. You see all whitespace
regardless. This was essentially required for DTDless and DTDfull parsing
to produce equivalent results.

So in "pure XML" whitespace is never "source-code formatting", but is
_always_ data.

>My understanding is that one of the basic requirements of XML was that
>the applications had to be easy to write, so things could be allowed to
>happen quickly. As much as I do agree that this would have to be a good
>thing, (and as you pointed out, applications are coming out already) I
>would argue that maybe they should be more difficult to write, but
>should address this issue correctly.

They do -- even when correctly means compatible with ISO SGML -- but we did
get ISO to simplify some of the hard bits of SGML.

  -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Tue Aug 26 16:27:28 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:20 2004
Subject: Whitespace
In-Reply-To: <9649@ursus.demon.co.uk>
Message-ID: <v03007803b028a0302c58@[205.181.197.114]>

At 4:01 AM -0500 8/26/97, Peter Murray-Rust wrote:
>The following mechanisms are consistent with the current spec and do not
>require changes:
>	1. stylesheets. The authors can describe how they expect stylesheet
>	processors to treat their documents.
>	2. PIs (e.g. <?WHITESPACE ... ?>
>	3. additional elements in the DTD (e.g. NEWLINE).
>	4. implicit conventions (i.e. 'always replace CR/LF with CR').
>

>(Have I missed anything?)
>> of conventions for specific classes of user agents (e.g., web
>> browsers) is useful, but I feel that it's my obligation to point out
>            ^^^^^^^^^
>Some people think this is a waste of time.  Perhaps it may turn out to be.
>Unlike the discussions on the spec, this group has no stated goals and exists
>to provide mutual support for those developing XML applications. If a number
>of people feel this is worth discussing,  then see let's see if they can
>achieve anything. If *they* wish to spend the time trying to do this, it
>needn't waste other people's ... :-)

>My own feelings are that only mechanisms 1 and 2 above are likely to find
>favour. I think that PIs can be further explored in this discussion.
>(Perhaps I should not have used <?XML-WHITESPACE .. ?> as this would (I think)
>require WG approval, so I would rephrase this as <?XDEV-WHITESPACE .. ?>)
>Given that, it seems possible to include PI statements within the document as
>the how the author intends the whitespace to be treated.

I'm afraid that I must ask what these are to be used for. I used to think
that this was a problem, and now I don't see how we really need these
declarations. They only seem to be relevant for typesetting, and if
typesetting is the task, then you'll only get correct results witha
well-known DTD or stylesheet in any case -- so why have the declarations.
I'm not concerned about readers taking correct stylesheets and later
mucking them up -- that will always be possible.

>It may be argued that this can be done better with stylesheets. Perhaps I'm
>conservative, but I see PIs embedded in a document as 'being part of the
>document' to a greater extent than stylesheets which are more likely to be
>changed by people other than the document's authors.

My problem is that I'm no longer able to see why this information has to go
with the document... I can't think of a case where it's necessary, and tons
of other information about the meanings of tags, etc, is not also necessary.

Also, since XML passes all whitespace, the only case we can deal with is
one where its essential to _ignore_ whitespace in the source document.

   -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Tue Aug 26 16:46:17 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:20 2004
Subject: Whitespace
In-Reply-To: <199708261426.PAA01594@mail.iol.ie>
Message-ID: <v03007805b028a83a100b@[205.181.197.114]>

At 9:26 AM -0500 8/26/97, Sean Mc Grath wrote:
>[David Durand]
>My counter argument (not reproduced above) *followed* the sentence you *have*
>reproduced. A lovely example of how quoting....

I knew that I should have held my fingers (but hit send too soon).
Apologies for implying that you are not trying for understanding.

>Here is a concrete scenario that either illustrates the problem or
>illustrates my ignorance.
>
>I want to know how two XML applications that apply different
>WS conventions can inter-operate losslessly. Specifically, why is this
>scenario wrong? :-
>
>I wish to perform a null transformation across two editing tools App A and
>App B.
>
>foo.xml --> App A --> App B --> bar.xml
>
>I want foo.xml == bar.xml
>
>App A : reads foo.xml and treats WS according to APPA-WS-RULES
>        writes temp.xml
>
>App B : reads temp.xml and treats WS according to APPB-WS-RULES
>        writes bar.xml
>
>Result : foo1.xml != bar.xml

Editing tools that change whitespace are not preserving the XML data stream
that would be returned by a parser on the document. a Tool that works like
this is simply buggy, since it reads in data that would return one data
stream to applications, and produces output that would produce a different
stream.

On the current definition, even tools that normalize CRLF to LF are
potentially damaging the document. This last is the only poitn that worries
me much.

Editors are _not allowed_ to blindly apply application conventions, unless
they can _ensure_ that the document was created for, and will only be
processed by, that application.

The beauty of not having whitespace normalization is that it's easy to tell
if you've changed anything because the only way not to change it, is to
change nothing.

The only safe rule for an editor is to preserve whitespace just as it is,
unless it knows something about the DTD, or stylesheet, or if the author
requests special handling becuase she knows something about these.

  --- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Tue Aug 26 18:45:21 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:20 2004
Subject: Whitespace
Message-ID: <199708261645.RAA22663@mail.iol.ie>

[David Durand]
>
>Editing tools that change whitespace are not preserving the XML data stream
>that would be returned by a parser on the document. a Tool that works like
>this is simply buggy, since it reads in data that would return one data
>stream to applications, and produces output that would produce a different
>stream.
>
>On the current definition, even tools that normalize CRLF to LF are
>potentially damaging the document. This last is the only poitn that worries
>me much.

It worries me too! Here is a concrete example of a CRLF bug that I hit
today.

I have just used an OffLine Browser called Snake to download a web site
authored in MS FrontPage. some of the links have been correctly munged to 
local links and some have not. By inspecting the HTML it emerged that
correctly munged links looked like this:-

<AREA ... HREF="http://www.a.com/foo.htm">

whilst un-munged links looked like this:-

<AREA ...
HREF = "http://www.a.com/foo.htm">

It is easy to see what has happened here. The s/w developers have
a pattern for matching AREA elements that does not countenance the presence
of a CRLF.

How should analagous problems in XML be addressed. Doing WS processing makes
pattern
matching/state space handling easier but at the expense of making it very
difficult
to re-produce the elided WS to ensure lossless transformation.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue Aug 26 19:04:03 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:20 2004
Subject: Whitespace
Message-ID: <3.0.32.19970826100108.00ab7bb0@pop.intergate.bc.ca>

At 05:45 PM 26/08/97 +0100, Sean Mc Grath wrote:
>It is easy to see what has happened here. The s/w developers have
>a pattern for matching AREA elements that does not countenance the presence
>of a CRLF.

Gimme a break; the software developers in this case have screwed up;
there is a technical term to describe this behavior: "wrong".  There may
in fact be productive things to be said about particular application
profiles for whitespace handing, but this example is a complete
red herring. 

>How should analagous problems in XML be addressed.

By writing software correctly.  -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Tue Aug 26 19:22:29 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:20 2004
Subject: Whitespace
Message-ID: <9668@ursus.demon.co.uk>

There is clearly a wide spectrum of opinion on this - and everyone is being very
helpful and patient.  I think I see where (at least some of) the differences 
lie and hope this is helpful:

In message <v03007803b028a0302c58@[205.181.197.114]> dgd@cs.bu.edu (David G. Durand) writes:
> 
> I'm afraid that I must ask what these are to be used for. I used to think
> that this was a problem, and now I don't see how we really need these
> declarations. They only seem to be relevant for typesetting, and if

I think this highlights that what we are doing is going through a learning
process and David (and others) have already been through this :-). It took
several months for XML-WG to arrive at the present position (there were 
intermediate drafts which included munging of various sorts). [It reminds me of 
a story of a very famous physicist (I forget whom) who, when asked to justify 
an equation in a lecture, stated it was trivial, then looked at it in silence 
for 15 mins, and then re-iterated 'Yes, it is trivial'.] 

The problem we have is not a technical one, but a variety of human perceptions
and preconceptions.

We agree that:
	1. this is NOT a parser concern, and all whitespace is passed to the
		application.
	2. that it is always *possible* to create an XML document in which no 
		non-significant whitespace appears.
	3. the XML-WG, in its wisdom, has found it useful to allow authors
		to pass the attribute XML-SPACE="DEFAULT" to the application.

I believe that (2) is David's position which is logical and consistent. If
(2) is universally applied then I can see no value in (3). It suggests that
there is value in passing non-significant whitespace to the application and
processing it in some application-dependent way. If we are processing 
whitespace by stylesheet, then isn't DEFAULT 
irrelevant? My problem is probably mainly because, after *much* debate, (3) 
has been included in the spec and I don't see what it is for.

[David suggests that one reason to add whitespace is that it should appear in
the final typeset version - this makes it significant (though I suspect that
some people would prefer to pass explicit markup).  Personally I do not wish
to do this.]

As  David says, it is possible to produce an XML document with no line-ends
and no other non-significant whitespace. If additional whitespace (e.g. 
for paragraphs) is to be included in the processed document, then it can
either be explicitly included as markup, or deduced from markup through
stylesheets or other methods.

The reasons I can see that non-significant whitespace is contained in XML 
documents are:
	- the documents are produced to be human-readable
	- the authoring/editing tools used introduce non-significant whitespace
	- non-significant whitespace is required to allow various tools to
		process the documents 
	- humans edit the XML documents

I can conceive of a time (perhaps 2 years hence) when there are a wide variety
of XML authoring tools and when the HTML community is educated about XML. In 
that state, perhaps, documents will be always created without non-significant
whitespace. Then, perhaps, we shall have a non-problem.

At present we have (at least) the viewpoints:
	- whitespace matters and authors must define precisely what they want
		in a document. The SGML community can understand and manage
		whitespace. If newcomers find it difficult, they'll have to
		learn the rules, or use proper tools.
	- most of the people who will want to use XML will graduate from HTML.
		This has 'taught' them that whitespace is not significant and
		gets normalised somewhere. They will start creating XML by 
		analogy with HTML. XML will not succeed unless we can
		offer some support for this transitional period.

As is fairly obvious, I take the second viewpoint.  I am trying to 'sell' CML
to a community which has never heard of SGML, but knows about HTML. I cannot
sell them files which they can't read (because they have no line breaks) or
force them to understand where space conventions differ from HTML.  Remember
that many XML files are going to be authored by people who never go near an
SGML tool - the molecular community will probably use C programs.

So - David asks for examples :-)

I want to be able to state that these 3 XML documents are to be interpreted 
to give identical results:

<FOO><META DC.AUTHOR="foo"/><META DC.TITLE="baz"/><BAR B="b"/></FOO>

and

<FOO>
  <META DC.AUTHOR="foo"/>
  <META DC.TITLE="baz"/>
  <BAR B="b"/>
</FOO>

Almost everyone who posts **examples** of XML files shows them prettyprinted
in some fashion.  No-one posts 1000 character lines to this list, or to
XML-SIG - they wouldn't be popular! So the impression is probably universal
outside the XML experts that XML files can be prettyprinted ad lib. 

I would like to preserve this prettyprinting - I suspect this is a major
motive for trying to see some way forward here.

A second example could be the one that I posted earlier:
<PARA> We took
<VAR TYPE="float">23.02+02</VAR>
<UNIT>gram</UNIT>
water
</PARA>

This is clearly contains 'text' and my community is conditioned to reading 
this in the same way as HTML (i.e. that the line-ends are normalised to 
a single space.) It seems to me that this is likely to be valuable in many
applications and that interoperability and code re-use would be greatly
helped by giving it a label and a set of rules. As I have said more than once
I would like to avoid having to develop both my own rules and my own code.

I have a fear (and I think it is shared by my community) that data within a
document can be changed by changing a stylesheet.  The *meaning* of the
(HTML) file below differs according to whether the line-end is normalised to a 
space or not:

<P> I saw a <B>black</B>
<B>bird</B>
</P>

Since stylesheets can be (and will be) imposed by people other than the 
author (publishers, browsers, readers, etc.) there is a danger that stylesheet
imposed WS processing can change meaning. Of course you can argue that the 
author above should have taken greater trouble to create an unambiguous 
text, but this is the way that I expect many newcomers to XML to approach it.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From gannon at commerce.net  Tue Aug 26 19:32:54 1997
From: gannon at commerce.net (Patrick Gannon)
Date: Mon Jun  7 16:58:21 2004
Subject: Papers Comparing MCF, CDF, D-C & RDF?
Message-ID: <01BCB209.51EF0540@arrow-d29.sierra.net>

Does anyone know of any papers that discuss and compare/contrast the scope of the following standards efforts:
	MCF - Meta Content Framework (Apple/Netscape)
	CDF - Channel Definition Format (Microsoft)
	D-C - Dublin Core
	RDF - Resource Description Framework (W3C)

I recognize that much of the discussion around these various topics indicates they are in various stages of development and review.  What is not clear is the precise scope each of these endeavors.  What problem sets are they trying to solve?

I have read most of the relevant documentation describing each of these proposals (except RDF, since there is no public document describing the scope of the W3C RDF WG that I could find).

So, is there any paper or soon-to-be-written paper that addresses the relative scope of these efforts?

Patrick Gannon
-----------------------------------------
President & CEO
Internet Shopping Directory, Inc.
702-831-2251   702-831-3925 (Fax)
mailto://patrick@shoppingdirect.com
http://www.shoppingdirect.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue Aug 26 19:45:21 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:21 2004
Subject: Whitespace
Message-ID: <3.0.32.19970826103807.00ab0e60@pop.intergate.bc.ca>

At 05:11 PM 26/08/97 GMT, Peter Murray-Rust wrote:
>	2. that it is always *possible* to create an XML document in which no 
>		non-significant whitespace appears.
>	3. the XML-WG, in its wisdom, has found it useful to allow authors
>		to pass the attribute XML-SPACE="DEFAULT" to the application.
>
>I believe that (2) is David's position which is logical and consistent. If
>(2) is universally applied then I can see no value in (3). It suggests that
>there is value in passing non-significant whitespace to the application and
>processing it in some application-dependent way. If we are processing 
>whitespace by stylesheet, then isn't DEFAULT 
>irrelevant? My problem is probably mainly because, after *much* debate, (3) 
>has been included in the spec and I don't see what it is for.

Well DEFAULT is 'irrelevant' in that it expresses no opinion about what
should be done with whitespace.  the PRESERVE value exists to support
constructs like HTML's <PRE>.  Yes, putting XML-SPACE="PRESERVE" on
something with element content is at the least questionable; but the
fact that this can be used to do something stupid does not mean it
isn't useful.

>At present we have (at least) the viewpoints:
>	- whitespace matters and authors must define precisely what they want
>		in a document. The SGML community can understand and manage
>		whitespace. If newcomers find it difficult, they'll have to
>		learn the rules, or use proper tools.

Well, they only have to learn one rule: the whitespace you put in
the document is the whitespace that is in the document.  XML neither
addeth nor taketh away.

>	- most of the people who will want to use XML will graduate from HTML.
>		This has 'taught' them that whitespace is not significant and
>		gets normalised somewhere. They will start creating XML by 
>		analogy with HTML. XML will not succeed unless we can
>		offer some support for this transitional period.

Uh, if they are using it for browser applications, I am quite sure that
browsers, while doing XML, will duplicate the HTML whitespace semantics,
i.e. eat most of it, and people will just not notice the difference.
Another way to say this is that the "HTML" whitespace semantic should
probably be renamed the "browser" whitespace semantic.

It would be a good and useful thing to write down (precisely) what
that browser semantic is; it's a little subtler than you'd think.

When they get into more ambitious apps than just browsing, they will
be glad of XML's transparency.
 - T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue Aug 26 19:48:50 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:21 2004
Subject: Papers Comparing MCF, CDF, D-C & RDF?
Message-ID: <3.0.32.19970826104550.00a72930@pop.intergate.bc.ca>

At 10:17 AM 26/08/97 -0700, Patrick Gannon wrote:
>Does anyone know of any papers that discuss and compare/contrast the scope 
>of the following standards efforts:

No such exist, to my knowledge.

>So, is there any paper or soon-to-be-written paper that addresses the 
>relative scope of these efforts?

Several of these are about to be rolled together into RDF.  There is 
a big RDF meeting tomorrow and Thursday in Seattle at which this
process gets going.

For those who are W3C members, check out
 http://www.w3.org/Metadata/RDF/Group/9708/27Agenda.html
 -Tim

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Tue Aug 26 20:16:40 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:21 2004
Subject: Whitespace
Message-ID: <9686@ursus.demon.co.uk>

Thanks Tim - I think this helps (me) considerably :-)

In message <3.0.32.19970826103807.00ab0e60@pop.intergate.bc.ca> Tim Bray writes:
[...]
> 
> Well DEFAULT is 'irrelevant' in that it expresses no opinion about what
> should be done with whitespace.  the PRESERVE value exists to support

so when might it be used (in preference to a stylesheet, for example?)

> constructs like HTML's <PRE>.  Yes, putting XML-SPACE="PRESERVE" on

Since the whitespace is all passed, presumably a stylesheet is capable of
keeping it all?
 
> something with element content is at the least questionable; but the
> fact that this can be used to do something stupid does not mean it
> isn't useful.

It sounds as if there isn't really very much need for XML-SPACE, and maybe
that has distorted my viewpoint...

> 
> >At present we have (at least) the viewpoints:
> >	- whitespace matters and authors must define precisely what they want
> >		in a document. The SGML community can understand and manage
> >		whitespace. If newcomers find it difficult, they'll have to
> >		learn the rules, or use proper tools.
> 
> Well, they only have to learn one rule: the whitespace you put in
> the document is the whitespace that is in the document.  XML neither
> addeth nor taketh away.

Understood. It is also the whitespace that your authoring tool puts in :-)
 
> >	- most of the people who will want to use XML will graduate from HTML.
> >		This has 'taught' them that whitespace is not significant and
> >		gets normalised somewhere. They will start creating XML by 
> >		analogy with HTML. XML will not succeed unless we can
> >		offer some support for this transitional period.
> 
> Uh, if they are using it for browser applications, I am quite sure that
> browsers, while doing XML, will duplicate the HTML whitespace semantics,
> i.e. eat most of it, and people will just not notice the difference.
> Another way to say this is that the "HTML" whitespace semantic should
> probably be renamed the "browser" whitespace semantic.
> 
> It would be a good and useful thing to write down (precisely) what
> that browser semantic is; it's a little subtler than you'd think.

I think this is the key to much of this discussion. (I am under no illusions
that it may be subtler than I can think :-) It was certainly true that early
HTML browsers could display whitespace very differently and I imagine
that there are still differences.

So - with Tim's encouragement - this seems like a useful thing to aim for.
This semantic seems to be one of the things we are chasing.
 
	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Tue Aug 26 22:07:06 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:21 2004
Subject: Whitespace
Message-ID: <199708262006.VAA13479@GPO.iol.ie>

>At 05:45 PM 26/08/97 +0100, Sean Mc Grath wrote:
>>It is easy to see what has happened here. The s/w developers have
>>a pattern for matching AREA elements that does not countenance the presence
>>of a CRLF.

[Tim Bray]
>Gimme a break; the software developers in this case have screwed up;
>there is a technical term to describe this behavior: "wrong".  There may
>in fact be productive things to be said about particular application
>profiles for whitespace handing, but this example is a complete
>red herring. 
>

I presented this "red herring" because it was *real*. I could have
contrived a more realistic one:-) This is an
example of a *real* programmer screwing up in a real application.

I am interested in avoiding screwups. WS is a screwup "happy hunting
ground" for us normal programmers who make mistakes day in day out.

At least I think it is. Perhaps (hopefully) I'm wrong.

I doubt if I will get this right but I will try and formulate the programming
problem as I see it. 

Here goes:-

XML processing applications that read/write XML have to faithfully
reproduce white space to avoid data loss. In the course of XML processing,
actions will regularly be triggered by context. I.e. "element X within
element Y",
"first data content chunk below element X" etc.

Take a really simple context, "X followed by Y". In order to faithfully
reproduce 
WS on output the simple pattern "XY" must be transformed into (in rusty Perl)

"(w*)X(w*)Y(w*)"

Where "w" represents the pattern for White Space.

As the state spaces get more complex (i.e. realistic) doesn't this problem
escalate?

Could someone out there who reckons this is easy kindly put
me out of my misery by showing how it can be best handled?


Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From mrc at allette.com.au  Wed Aug 27 00:39:48 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun  7 16:58:21 2004
Subject: Whitespace
References: <199708260532.WAA00995@boethius.eng.sun.com> <34029587.52568DFA@allette.com.au>
Message-ID: <34035B04.FC63D9D6@allette.com.au>

Marcus Carr wrote:

> Jon Bosak wrote:
>
> > A discussion of conventions for specific classes of user agents
> (e.g.,
> > web browsers) is useful, but I feel that it's my obligation to point
>
> > out to anyone mistakenly thinking that this issue might conceivably
> be
> > reconsidered in the current XML specification that it is not going
> to
> > happen.
>
> I'm not asking for anything to happen, but I do believe these things
> should be allowed to be discussed. If people tire of the topic,
> they'll stop talking about it - knocking healthy (even if misguided)
> discussion on the head contributes nothing.

It has been pointed out to me in private mail that my answer could be
perceived as somewhat unfair criticism and that I may have
misinterpreted the tone of Jon's mail. I'll plead all the usual excuses
(it was late, my cat had been run over, etc), offer my apologies and
rephrase my point.

Given that there is nothing binding the suggestions of this group to the
standard, I feel that even the most radical suggestions should be
entertained as an exercise in lateral thinking (and perhaps tolerance).
As long as we all accept that this has no impact on the formation of the
standard, and we must, this list can act as a well of diversity,
tempered only by the delete keys of its readership.

Again, Jon my apologies.


--
Regards

Marcus Carr                  email:  mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia)  email:  info@allette.com.au
Level 10, 91 York Street     www:    http://www.allette.com.au
Sydney 2000 NSW Australia    phone:  +61 2 9262 4777
                             fax:    +61 2 9262 4774
_______________________________________________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From JohnGo at asymetrix.com  Wed Aug 27 00:41:16 1997
From: JohnGo at asymetrix.com (John Gossman)
Date: Mon Jun  7 16:58:21 2004
Subject: Request for advice defining an XML based syntax
Message-ID: <c=US%a=_%p=asymetrix%l=ASYMEXCHANGE-970826224605Z-4483@exchange.asymetrix.com>

>
>
>
>    To make a long story short:  I have been developing a file format for
>data exchange between applications.  The essential purpose is to provide a
>format that objects can stream their persistent state to, for saving or
>exchanging of data.  Further I have a number of criteria for this format: 
>    1.  It must be simple 
>    2.  It must be robust--resistant to data loss 
>    3.  Flexible -- all sorts of data 
>    4.  Extensible -- developers and users can add their own data and
>datatypes 
>    5.  Human readable -- easy to understand 
>    6.  Support versioning easily 
>    7.  Support strong typing--no confusion 
>    I knew from my experience with Autodesk's DXF (Drawing eXchange Format)
>that these goals were achievable, and knew where DXF fell down.  My essential
>idea is data come in two forms--primitive fields and structured records.  For
>primitive fields I realized I needed to store 3 things--type,name, and value.
> The original format I came up with was quite simple, in fact I'll just give
>an example of a button object's data: 
>
>start button 
>    string caption="Click Here" 
>    int left = 50 
>    int right = 100 
>    int top = 80 
>    int bottom = 100 
>end 
>  
>
>    Easy to parse, easy to output, easy to read (helps if you are a
>programmer used to a typed language), and no special characters except the
>almost universally understood '='.  Several of my co-workers asked why I
>didn't use MCF or XML.  My answer was that these formats are two complex, but
>after further study of XML I realized I could make an XML-compliant version
>of the syntax quite easily.  After several iterations I arrived at this: 
>
><button> 
>    <caption string "Click Here"/> 
>    <left    int    50/> 
>    <right  int    100/> 
>    <top    int    80/> 
>    <bottom    int    100/> 
></button> 
>
>    Last week in Montreal, Tim Bray confirmed my suspicion that XML did not
>allow the supression of attribute names as a form of shorthand, which is
>going to necessitate one more change.  However, on further thought, I also
>wonder if I have violated something of the spirit of XML by including all the
>data in attributes--all structure no content.  Option 1 then is the
>following: 
>
><button> 
>    <caption type="string" value="Click Here"/> 
>    <left    type="int"    value="50"/> 
>    <right    type="int"    value="100"/> 
>    <top   type="int"    value="80"/> 
>    <bottom type="int"    value="100"/> 
></button> 
>
>    There is precedent for such a thing, in HTML's IMG tag for example, which
>is an empty tag with all the "data" in attributes.  My question then.  Is
>this better?:
>
><button> 
>    <caption type="string">"Click Here"</caption> 
>    <left    type="int">50</left> 
>    <right    type="int">100</right> 
>    <top   type="int">80</top> 
>    <bottom type="int">100</bottom> 
></button> 
>
>    So, I am asking for the kind the advice of those most familiar with XML.
>Opinions please, either here or by private e-mail (johngo@asymetrix.com), on
>this question or anything else that comes to mind.
>
>    Many thanks in advance, 
>
>    John Gossman 
>    Asymetrix 
>
>    P.S.  The format (which I call OXF for Open Exchange Format) is fully
>defined in a spec written here.  It includes the ability to create data
>schema and use inheritance to extend them, and is specifically designed to be
>non-validating (for robustness:  you don't want to throw away all the data
>because of a few problems).  I would rather not post the spec. until I have
>settled these last few issues, but I will provide a draft for the asking. 
>
>
>
>
>

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Wed Aug 27 01:49:40 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:21 2004
Subject: Request for advice defining an XML based syntax
Message-ID: <199708262357.JAA00084@jawa.chilli.net.au>

 
> From: John Gossman <JohnGo@asymetrix.com>
 
> >Option 1 then is the
> >following: 
> >
> ><button> 
> >    <caption type="string" value="Click Here"/> 
> >    <left    type="int"    value="50"/> 
> >    <right    type="int"    value="100"/> 
> >    <top   type="int"    value="80"/> 
> >    <bottom type="int"    value="100"/> 
> ></button> 
> >
> >    There is precedent for such a thing, in HTML's IMG tag for example, which
> >is an empty tag with all the "data" in attributes.  My question then.  Is
> >this better?:
> >
> ><button> 
> >    <caption type="string">"Click Here"</caption> 
> >    <left    type="int">50</left> 
> >    <right    type="int">100</right> 
> >    <top   type="int">80</top> 
> >    <bottom type="int">100</bottom> 
> ></button> 
 
According to your taste, you can weight these general rationale and 
come to your own decision--

1) Attributes are really shorthand so that you don't need complex
content models, and to allow a measure of stronger typing in particular
for ID and IDREF attributes.   This suggests it doesn't matter which you
use: you don't have a complex content model and you the value attribute
is just CDATA.

2) The content is the thing primarily described by the GI. So an empty
element with an attribute called "value" is always an over-elarabrate 
design.  This suggest you should use Option 2.

3) The content of an element is the text that a dumb browser that is not
aware of your document type will display it.   Therefore the text 
should be in the nature of an alternative string for guidance.  So
<caption> should be content, and <left> etc should use attributes.

4) You may at some future stage want to extend how <caption>, 
<left>, etc work.  So option 1 leaves you free to define a content
model later, for some other functionality.

5) Using a value attribute is more familiar to HTML people who
like the meta tag.


You should also consider:

<button 
  left="50" 
  right="100" 
  top="80" 
  bottom="100">
 "Click Here"</button> 

The XML element type declaration for this is:

<!ELEMENT button  (#PCDATA) >
<!ATTLIST button
  left  CDATA  #REQUIRED
  right CDATA  #REQUIRED 
  top   CDATA  #REQUIRED
  bottom CDATA #REQUIRED >

I hope this is some help.

Rick Jelliffe

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Wed Aug 27 02:38:05 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:21 2004
Subject: Whitespace
Message-ID: <199708270046.KAA01200@jawa.chilli.net.au>


> From: Sean Mc Grath <digitome@iol.ie>
 
> Could someone out there who reckons this is easy kindly put
> me out of my misery by showing how it can be best handled?

Without addressing your dolorous (if not rubescent) herring, 
Knuth's comment in "The Errors of TeX" are useful:

"The stickiest issue in TeX has always been the treatment of 
blank spaces.  Users tend to insert spaces in their computer
files so that files look nice, but document processors muct also
treat spaces as abojects that appear in the final output...
I kept searching for rules that would be simple enough to 
easily learned, yet natural enough that they could be applied 
almost unconsiously.  I finally concluded that no such rules
existed, and I opted for the best compromise I could find."

Charles Goldfarb commented at the Barcelona WG8 meeting 
that whitespace handling was one of the design areas that 
he felt SGML got it wrong (by which I think he did not mean
that the SGML86 rules are not a workable, justifiable and 
rational compromise -- given the constraint of having to work
with fixed-line-length text editors, which is the nub of the
design decision for SGML86 -- merely that perhaps the XML 
'solution' of making it someone else's problem would 
have deflected some consternation away from ISO 8879, and 
partitioned functionality more neatly).

The solution that I think XML *now* has is this:

1) There are ISO 10646 characters available for lots of different
kinds of spaces. These can be specified directly by numeric 
character references, or indirectly using the ISO public entities.
Some of these entities are already familiar to HTML people: in
particular  &nbsp;  is generated almost pathologically by some
versions of Netscape's HTML editor.  So if you want to force 
a break or space, these should be used.

2) If you want to force that normal spaces should not be collapsed,
then the attribute  XML-SPACE="preserve" should be specified on 
the containing element.

3) Otherwise, you should use spaces and newlines only when you
need them, and expect whitespace sequences to be collapsed.
XML generators that have access to the DTD should strip out
confusing whitespaces from element and mixed content.

4) SGML86 and XML have different whitespace rules. So you should
expect to have to process the files to add or remove space when
you convert between the two, unless you write your SGML DTD
without mixed content and/or impose some stricter discipline on 
document creation.

5) If you need to prettyprint your document text, then you are best
advised to use whitespace within tags, rather than between tags.
For example:

<p  x=1
>An element</p
>

Rather than
<p>
An element
</p>

If this looks strange to XML people, then remember that Bert Bos 
found it natural to do (something like) this in a paper he wrote:

	<x	>blah<	/x>
	<x	>blurt<	/x>
So I do not think that we should assume too much about how HTML 
people naturally view tag integrity.  (In SGML and XML, Bert's
experimental markup would be invalid and not well-formed, despite
its nice pretty-printing: ETAGO  '</' cannot be divided by whitespace.)

6) The XML stylesheet language must be strong enough to handle forcing
spaces between elements. It must be possible to define that, for example,
a keyword element must be seperated by whitespace or punctuation (or
superscripted note references) from adjacent words, in languages that 
use spaces as word separators.


I think these are good enough. If developers implement their systems to
allow them, then users will learn to tailor their documents 
appropriately.  Users will always be able to markup documents incorrectly,
no matter how hard we try, I tend to think.


Rick Jelliffe 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From JohnGo at asymetrix.com  Wed Aug 27 04:38:02 1997
From: JohnGo at asymetrix.com (John Gossman)
Date: Mon Jun  7 16:58:21 2004
Subject: Request for advice defining an XML based syntax
Message-ID: <c=US%a=_%p=asymetrix%l=ASYMEXCHANGE-970827024254Z-4911@exchange.asymetrix.com>

	Thanks for the summary in points 1-5.  Those are exactly the sort of
points I am seeking clarity on.  
	The other option would be fine if I were defining a format with types
like "button".  But OXF is designed to describe generic data, the button
was just an example.  The DTD is strictly optional, perhaps even harmful
in the case of OXF, since the whole purpose is to make it so the reader
can salvage even partial or poorly formed files.

	John Gossman
	Asymetrix
>----------
>From: 	Rick Jelliffe[SMTP:ricko@allette.com.au]
>Sent: 	Tuesday, August 26, 1997 4:47 PM
>To: 	'xml-dev@ic.ac.uk'
>Subject: 	Re: Request for advice defining an XML based syntax
>
>
>You should also consider:
>
><button 
>  left="50" 
>  right="100" 
>  top="80" 
>  bottom="100">
> "Click Here"</button> 
>
>The XML element type declaration for this is:
>
><!ELEMENT button  (#PCDATA) >
><!ATTLIST button
>  left  CDATA  #REQUIRED
>  right CDATA  #REQUIRED 
>  top   CDATA  #REQUIRED
>  bottom CDATA #REQUIRED >
>
>
>
>

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Wed Aug 27 04:43:11 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:21 2004
Subject: Whitespace
Message-ID: <199708270251.MAA05357@jawa.chilli.net.au>

 
> From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
  
> I would strongly argue against Unicode characters at this stage. *I* wouldn't
> know where to get them from, and typing by hand could be a disaster. 

I have attached a table with how XML, by adopting ISO 10646, allows developers
to handle spaces, hyphenation and breaking.  I hope people find it useful. (I have
previously sent around versions of the ISO public entity sets converted for XML
use: these are available on the Robin Cover's website at the Summer Institute of
Linguistics. The table has a copyright note against printing because I have 
prepared it for my forthcoming book "The SGML Cookbook" out soon.)

You can get more information

* the Unicode 2.0 book, available in book stores
* ISO 10646 standard, availabel from your national standards bocy
* there is an online listing of the characters at the Unicode consortium's
website, and an independent one on the SGML Oslo archive site, and 
by looking at the SPREAD public entity set
* on NT you can use the keycaps viewer to see (printing) characters in Unicode
fonts.


>It will take a while before Unicode is natural to HTML authors.

ISO 10646 provides a very rich set of characters to handle spaces and 
newlines.  It is very important that XML developers understand and implement
them, because then it simplifies what people need to do in their XML scripts.
It removes spacing from being a "how to format this element" issue to being
a "how render this character" issue, which is neater.  If developers ignore
these unambiguous characters, they then have to overload space and -, with
unpredictable results. To get definite results you need definite markup:
developers should not confuse the visual simplicity of the space and hyphen
with the complexity of what must be marked-up to get them to work.

There is *no* natural way for HTML people to do most of the things that
ISO 10646 offers for control of spaces, hyphenation and breaking. 
However, it is more like what users of word processors will find natural.


Rick Jelliffe
-------------- next part --------------
A non-text attachment was scrubbed...
Name: space.htm
Type: application/octet-stream
Size: 2841 bytes
Desc: space.htm (Internet Document (HTML))
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970827/bc40c921/space.obj
From ricko at allette.com.au  Wed Aug 27 04:57:42 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:21 2004
Subject: Request for advice defining an XML based syntax
Message-ID: <199708270305.NAA05672@jawa.chilli.net.au>


> From: John Gossman <JohnGo@asymetrix.com>
   
> 	The other option would be fine if I were defining a format with types
> like "button".  But OXF is designed to describe generic data, the button
> was just an example.  The DTD is strictly optional, perhaps even harmful
> in the case of OXF, since the whole purpose is to make it so the reader
> can salvage even partial or poorly formed files.

Declarations are also useful to describe what you want to get after 
salvaging. So they can be documentation for humans too.


Rick Jelliffe

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From mike at datachannel.com  Wed Aug 27 18:07:31 1997
From: mike at datachannel.com (Mike Dierken)
Date: Mon Jun  7 16:58:21 2004
Subject: Request for advice defining an XML based syntax
Message-ID: <01BCB2C8.7AB050F0@NEMO>

I also have some (philosophical) questions about elements and attributes in XML.

Rick J's point:
> 3) The content of an element is the text that a dumb browser that is not
> aware of your document type will display it.   Therefore the text 
> should be in the nature of an alternative string for guidance.  So
> <caption> should be content, and <left> etc should use attributes.

made a lot of sense for me. I think, however, that John G's application of XML
is such that the properties of objects 'are' the content, and therefore it's not required 
for other viewers to skip that information.

I would like to hear some pro's & con's about the following four styles 
(continuing John Gossman's example):

1 Attributes within element
<button top=20 left=20 bottom=40 right=100>
<caption>Click me!</caption>
</button>

2 Attributes as single specific sub-element
<button>
<region top=20 left=20 bottom=40 right=100 />
<caption>Click me!</caption>
</button>

3 Attributes as several specific sub-elements 
NOTE: the properties of the button are stored as in style 1 (i.e. within the element) so 
other viewers can skip them.
<button>
<top value=20/>
<left value=20/>
<bottom value=40/>
<right value=100/>
<caption>Click me!</caption>
</button>

4 Attributes as several generic sub-elements
NOTE: The properties of the button are stored as content, since the document is 
intented to be storage for objects & their properties (i.e. the properties 'are' the content).
<button>
<prop name="top">20</prop>
<prop name="left">20</prop>
<prop name="bottom">40</prop>
<prop name="right">100</prop>
<caption>Click me!</caption>
</button>


In addition I have two questions about elements and attributes.
1. Generic tag with 'type' attribute
When should you use a generic <object type="button"> versus a specific <button>:
generic:
<object type="button>
Click this!
</object>

specific:
<button>
Click this!
</button>

2. Attributes as sub-element(s) 
When should you move attributes to a sub-element:

as attributes:
<button top=20 left=20 bottom=40 right=100>
</button>

as sub-element:
<button>
<region top=20 left=20 bottom=40 right=100 />
</button>


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From JohnGo at asymetrix.com  Wed Aug 27 19:34:40 1997
From: JohnGo at asymetrix.com (John Gossman)
Date: Mon Jun  7 16:58:21 2004
Subject: Request for advice defining an XML based syntax
Message-ID: <c=US%a=_%p=asymetrix%l=ASYMEXCHANGE-970827173929Z-5686@exchange.asymetrix.com>

	These questions are a better summary of the basic ones I had.  What is
content and what is an attribute?  Why use one or another?

	-JG

>----------
>From: 	mike@datachannel.com[SMTP:mike@datachannel.com]
>Sent: 	Wednesday, August 27, 1997 9:06 AM
>To: 	'xml-dev@ic.ac.uk'
>Subject: 	RE: Request for advice defining an XML based syntax
>
>I also have some (philosophical) questions about elements and attributes in
>XML.
>

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Wed Aug 27 19:35:15 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:22 2004
Subject: Whitespace
Message-ID: <199708271734.SAA10438@mail.iol.ie>


>digitome@iol.ie (Sean Mc Grath) writes:
>[...]
>> I doubt if I will get this right but I will try and formulate the programming
>> problem as I see it. 
>> 
>> Here goes:-
>> 
>> XML processing applications that read/write XML have to faithfully
>> reproduce white space to avoid data loss. In the course of XML processing,
>> actions will regularly be triggered by context. I.e. "element X within
>> element Y",
>> "first data content chunk below element X" etc.

[Murray Altheim]
>Aha! The culprit: 'XML processing applications'. I think where the confusion
>lies here is with the lack of differentiation between processor and
application.
>We are defining an XML _processor_, which in all cases preserves whitespace
>and hands it on to the _application_. An application's handling of whitespace
>will be entirely dependent upon the needs of the application. For example,
>'XML as a data format' might normalize or even eliminate all whitespace,
>whereas 'XML as a document markup' may rely on some type of default handling
>under certain circumstances, or rely entirely on stylesheets. A browsers, as a
>specific case of application, will have different WS handling than a database
>engine, and different again than an XML text-based editor.
>
>If the processor faithfully passes all WS to the application, the application
>can generate character-accurate offsets for links, etc. with no problems
>due to WS data loss. I see no problems in the current spec, although I must 
>agree with Tim and others that XML-SPACE="DEFAULT" seems to have no discernable
>meaning in this context.
>
>Does that help at all?
>
The "processor" is the XML parser and the "XML applictation" is the
editor, browser, spell checker, indexer etc. Okay. My concern is how these apps
will *interoperate* in the face of application specific WS conventions. To
do the right thing they need to faithfully reproduce the WS. I think
this is a hard problem. I await with interest some code examples
that illustrate XML->XML interoperability.

Am I completely off base in thinking that WS makes for some hairy issues
in XML->XML applications? What apps have been written that read/write XML? How
have they handled WS integrity? Are patterns emerging that can usefully
become part of
XML-DEV lore?


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Thu Aug 28 00:02:45 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:22 2004
Subject: Whitespace
Message-ID: <199708272202.XAA04683@GPO.iol.ie>

[Sean Mc Grath]
>>I await with interest some code examples
>> that illustrate XML->XML interoperability.
>
[Murray Altheim]
>By this I assume you mean interoperation between applications, not between
parsers
>and applications, since across a common API parsers should be interoperable, at
>least in theory.
Yes I mean between applications.
>...
>Browsers don't manipulate files, merely display them

What about "File-Save As"?

[Sean Mc Grath]
>> Am I completely off base in thinking that WS makes for some hairy issues
>> in XML->XML applications? What apps have been written that read/write
XML? How
>> have they handled WS integrity? Are patterns emerging that can usefully
>> become part of XML-DEV lore?
>
[Murray Altheim]
>In applications that do modify source documents (such as editors), I don't
>expect them to mangle/reformat whitespace, unless whitespace is simply
>not an issue (such as XML-as-database apps).

Why is WS not an issue for XML-as-database apps?
Because the data stream is a single line of XML?
I use Borland Brief and Borlands 32 bit grep.exe all the time.
Both have line length limits. I cannot use these with WS-free XML.

Won't our desperate Perl hackers' beloved $_ variable be significantly less
useful if it contains the *entire* document.

>...
>I imagine WYSIWYG XML
>editing/word processing applications that completely reformat whitespace
>will not be capable of proper link creation, so those would then be simply
>considered 'broken applications' and probably not be successful.
>
The market loves WYSIWYG (however "pseudo" the reality is). Isn't XML
in trouble if user-friendly (read "WYSIWYG") apps cannot do linking?

Cheers,
Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From srn at techno.com  Thu Aug 28 00:30:14 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:58:22 2004
Subject: Papers Comparing MCF, CDF, D-C & RDF?
In-Reply-To: <01BCB209.51EF0540@arrow-d29.sierra.net> (message from Patrick
	Gannon on Tue, 26 Aug 1997 10:17:48 -0700)
Message-ID: <199708272226.SAA02015@bruno.techno.com>

> Does anyone know of any papers that discuss and compare/contrast the =
> scope of the following standards efforts:
> 	MCF - Meta Content Framework (Apple/Netscape)
> 	CDF - Channel Definition Format (Microsoft)
> 	D-C - Dublin Core
> 	RDF - Resource Description Framework (W3C)

This list happens to be the same list that I use as an argument in
favor of answering Tim Bray's "p2" requirement in the design of XML:
to allow mixing and matching any classes of objects (element types)
from any number of architectures, even if their semantics overlap and
their syntaxes conflict.  This seemingly outlandish goal can be
accomplished, and simply.  HyTime* shows what I believe to be
(conceptually, at least) the right way to do it.  (In XML, the
up-front declarative syntax may have to be a little different.)

*See Annex A.3 of the Second Edition of the HyTime standard (ISO
 10744:1997).  (http://www.drmacro.com/n1920/html/clause-A.3.html)


-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jones at nceas.ucsb.edu  Thu Aug 28 02:00:04 1997
From: jones at nceas.ucsb.edu (Matt Jones)
Date: Mon Jun  7 16:58:22 2004
Subject: NXP/MSXML confusion
Message-ID: <3404C00C.7E2ED4A9@nceas.ucsb.edu>

Hello,
We are trying to use NXP and MSXML to validate xml documents with a
specific
DTD, but we are getting  inconsistent results from the two programs.
The results from the two parsers, plus the simple dtd and xml files, are
at the end of
this message.

Does anybody know which one produces the proper output? Does the lack of

an error indication indicate that MSXML thinks this is a valid doc
according to
the DTD?  Do you think it is a valid XML DTD and document (the specific
problem in NXP seems to arise in association with the "<!element te
((x | y),(s,t))>"
construct)?

Thanks for your help,
Zheng and Matt


*****************************************
MSXML parser output:
*****************************************
java msxml -d te.xml
<?XML VERSION="1.0" RMD="ALL"?>
<!DOCTYPE TE SYSTEM "te.dtd">
<TE>
   <Y>
        yyyy
   </Y>
    <S>
        ssss
    </S>
    <T>
        tttt
    </T>
</TE>


*****************************************
NXP parser output:
*****************************************
java Cl -v -f te.xml

NXP - Norbert's XML Parser 0.96 - 20.05.1997

Fetch file : te.xml
Start parsing ...
Parsing Started ......
Fetch file: ./te.dtd
<TE>
<Y> "
                yyyy
        "
</Y>
<S>
****** Invalid content model !
"
                ssss
        "
</S>
<T>
****** Invalid content model !
"
                tttt
        "
</T>
</TE>
Parsing finished - Time : 433 msec.


*****************************************
DTD file "te.dtd"
*****************************************
<!element te    ((x | y),(s,t))>
<!element x (#PCDATA)>
<!element y (#PCDATA)>
<!element z (#PCDATA)>
<!element s (#PCDATA)>
<!element t (#PCDATA)>

*****************************************
XML file "te.xml"
*****************************************
<?XML version="1.0" RMD="all" ?>
<!DOCTYPE te SYSTEM "te.dtd">
<te>
        <y>
                yyyy
        </y>
        <s>
                ssss
        </s>
        <t>
                tttt
        </t>
</te>


******************************************************************
Matt Jones                                    jones@nceas.ucsb.edu
http://www.nceas.ucsb.edu/    Ph: 805-892-2508   Fax: 805-892-2510
National Center for Ecological Analysis and Synthesis (NCEAS)
******************************************************************


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From clovett at microsoft.com  Thu Aug 28 05:31:18 1997
From: clovett at microsoft.com (Chris Lovett)
Date: Mon Jun  7 16:58:22 2004
Subject: NXP/MSXML confusion
Message-ID: <41135C785691CF11B73B00805FD4D2D7035865E0@RED-17-MSG.dns.microsoft.com>

	The way I see it the <Y> element satisfies the choice (x|y)
allowing it to move on to the sequence (s,t) which appears in the XML
correctly.  So I think MSXML is correct in saying the document is valid,
don't you agree ?

	By the way you can run msxml without the -d argument "java msxml
te.xml" and if no errors are displayed then the XML is valid.  

> -----Original Message-----
> From:	Matt Jones [SMTP:jones@nceas.ucsb.edu]
> Sent:	Wednesday, August 27, 1997 5:02 PM
> To:	xml-dev@ic.ac.uk
> Cc:	jones@nceas.ucsb.edu; wang@nceas.ucsb.edu
> Subject:	NXP/MSXML confusion
> 
> Hello,
> We are trying to use NXP and MSXML to validate xml documents with a
> specific
> DTD, but we are getting  inconsistent results from the two programs.
> The results from the two parsers, plus the simple dtd and xml files,
> are
> at the end of
> this message.
> 
> Does anybody know which one produces the proper output? Does the lack
> of
> 
> an error indication indicate that MSXML thinks this is a valid doc
> according to
> the DTD?  Do you think it is a valid XML DTD and document (the
> specific
> problem in NXP seems to arise in association with the "<!element te
> ((x | y),(s,t))>"
> construct)?
> 
> Thanks for your help,
> Zheng and Matt
> 
> 
> *****************************************
> MSXML parser output:
> *****************************************
> java msxml -d te.xml
> <?XML VERSION="1.0" RMD="ALL"?>
> <!DOCTYPE TE SYSTEM "te.dtd">
> <TE>
>    <Y>
>         yyyy
>    </Y>
>     <S>
>         ssss
>     </S>
>     <T>
>         tttt
>     </T>
> </TE>
> 
> 
> *****************************************
> NXP parser output:
> *****************************************
> java Cl -v -f te.xml
> 
> NXP - Norbert's XML Parser 0.96 - 20.05.1997
> 
> Fetch file : te.xml
> Start parsing ...
> Parsing Started ......
> Fetch file: ./te.dtd
> <TE>
> <Y> "
>                 yyyy
>         "
> </Y>
> <S>
> ****** Invalid content model !
> "
>                 ssss
>         "
> </S>
> <T>
> ****** Invalid content model !
> "
>                 tttt
>         "
> </T>
> </TE>
> Parsing finished - Time : 433 msec.
> 
> 
> *****************************************
> DTD file "te.dtd"
> *****************************************
> <!element te    ((x | y),(s,t))>
> <!element x (#PCDATA)>
> <!element y (#PCDATA)>
> <!element z (#PCDATA)>
> <!element s (#PCDATA)>
> <!element t (#PCDATA)>
> 
> *****************************************
> XML file "te.xml"
> *****************************************
> <?XML version="1.0" RMD="all" ?>
> <!DOCTYPE te SYSTEM "te.dtd">
> <te>
>         <y>
>                 yyyy
>         </y>
>         <s>
>                 ssss
>         </s>
>         <t>
>                 tttt
>         </t>
> </te>
> 
> 
> ******************************************************************
> Matt Jones                                    jones@nceas.ucsb.edu
> http://www.nceas.ucsb.edu/    Ph: 805-892-2508   Fax: 805-892-2510
> National Center for Ecological Analysis and Synthesis (NCEAS)
> ******************************************************************
> 
> 
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From nmikula at edu.uni-klu.ac.at  Thu Aug 28 07:48:47 1997
From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula)
Date: Mon Jun  7 16:58:22 2004
Subject: NXP/MSXML confusion
References: <3404C00C.7E2ED4A9@nceas.ucsb.edu>
Message-ID: <34059315.5A55@edu.uni-klu.ac.at>

Matt Jones wrote:
> 
> Hello,
> We are trying to use NXP and MSXML to validate xml documents with a
> specific
> DTD, but we are getting  inconsistent results from the two programs.
> The results from the two parsers, plus the simple dtd and xml files, 
> are at the end of this message.

In one of the last releases I introduced a very stupid bug
in the validation module. I am already working on a new release
which will fix this bug (and others). I am very sorry about
the inconvenience that this might have caused. 

Thanx to pointing this out. Please watch this and other 
lists for the announcement of the upcoming new release.

-- 
Best regards,
Norbert H. Mikula

=====================================================
= SGML, XML, DSSSL, Intra- & Internet, AI, Java 
=====================================================
= mailto:nmikula@edu.uni-klu.ac.at 
= http://www.edu.uni-klu.ac.at/~nmikula
=====================================================

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Thu Aug 28 07:53:26 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:22 2004
Subject: Request for advice defining an XML based syntax
Message-ID: <3.0.32.19970827224759.009fe96c@pop.intergate.bc.ca>

At 10:39 AM 27/08/97 -0700, John Gossman wrote:
>	These questions are a better summary of the basic ones I had.  What is
>content and what is an attribute?  Why use one or another?

There is no automated decision procedure; it all comes down, in the
end, to design aesthetics.  Software should recognize this and be prepared 
to pull the information it needs out of either elements or attributes. -Tim

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Thu Aug 28 15:52:08 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:22 2004
Subject: Whitespace
In-Reply-To: <199708272202.XAA04683@GPO.iol.ie>
Message-ID: <v03007800b02b3cc39417@[205.181.197.125]>

At 4:35 PM -0500 8/27/97, Sean Mc Grath wrote:
>What about "File-Save As"?
If the document is intended to be equivalent to the one you read, it should
have whitespace in the same places.

Is this such a hard rule? Seems easier to me than any other I can think of.
>>[Murray Altheim]
>>In applications that do modify source documents (such as editors), I don't
>>expect them to mangle/reformat whitespace, unless whitespace is simply
>>not an issue (such as XML-as-database apps).
Right. I fyou don't know the application, you _always preserve_ the space
that you saw on input.
>
>Why is WS not an issue for XML-as-database apps?
In _some such applications_ you will know that line breaks don't matter --
or that certain elements (e.g. <RECORD>) are element content. If you _know_
the purpose of the data, you might be able to normalize whitespace. But if
you're writing a general XML editor, you would be foolish to assume that
you have such knowledge.

>Because the data stream is a single line of XML?
Might be, or might not be. Author's decision.
>I use Borland Brief and Borlands 32 bit grep.exe all the time.
>Both have line length limits. I cannot use these with WS-free XML.

True, so you'd best not process such files with them. What's the point, really?

If you're creating documents you can put WS in. Even HTML parsers are
accepting arbitrary-length lines nowadays -- because lots of database HTML
TOOLS produce them.

>
>Won't our desperate Perl hackers' beloved $_ variable be significantly less
>useful if it contains the *entire* document.

Just as it's not useful in processing HTML. Regexps that don't match across
line boundaries are the most common problem I've seen in HTML-processing
Perl scripts. Looks like that will continue until people figure out that
Perl's line "Feature" is jsut a big when used with XML/HTML.

>>I imagine WYSIWYG XML
>>editing/word processing applications that completely reformat whitespace
>>will not be capable of proper link creation, so those would then be simply
>>considered 'broken applications' and probably not be successful.

This comment of Murray's is right on!

>The market loves WYSIWYG (however "pseudo" the reality is). Isn't XML
>in trouble if user-friendly (read "WYSIWYG") apps cannot do linking?

This has nothing to do with whitespace, but is an issue of how you choose
to display things on your screen. One could choose to present a nicely
formatted display and still track whitespace explicitly. I've  often wished
that tools like MS WORD would remember _not_ to typeset two spaces after a
period, for instance.

IT Sounds to me like what we really need is a small paper (about 5
paragraphs) explaining whitespace for developers:

the 3 sentence version would be as follows:

We're serious about all whitespace being significant. If you're not dealing
with an element in a document type that warrants some form of whitespace
normalization, then you _should not_  output different whitespace without
the user being aware that a significant change has been made in the
document. Such notification might take many forms in an interface: an
option, an interface that displays the whitespace as read, or an explicit
operation to "normalize" withespace.

   -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Thu Aug 28 17:21:32 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:22 2004
Subject: Whitespace
Message-ID: <199708281521.QAA17187@mail.iol.ie>


>>Won't our desperate Perl hackers' beloved $_ variable be significantly less
>>useful if it contains the *entire* document.
>
>Just as it's not useful in processing HTML. Regexps that don't match across
>line boundaries are the most common problem I've seen in HTML-processing
>Perl scripts. Looks like that will continue until people figure out that
>Perl's line "Feature" is jsut a big when used with XML/HTML.
>

Bang goes the the notion of a lightweigth XML app. then! Thou shalt always
parse!

XML as a friendly format to, say, DPH needs some explaining. To use Perl to
read/write XML 
you *must* use an XML parser. Indeed any tool intending to read/write XML
needs to use a 
*fully blown parser* to get at the document. Bye bye the entire Unix family
of line oriented text processing apps:-(

>IT Sounds to me like what we really need is a small paper (about 5
>paragraphs) explaining whitespace for developers:
>
I think this is an excellent idea!


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From matt at wdi.disney.com  Thu Aug 28 18:19:48 1997
From: matt at wdi.disney.com (Matthew Fuchs)
Date: Mon Jun  7 16:58:22 2004
Subject: Whitespace
In-Reply-To: <199708281521.QAA17187@mail.iol.ie>; from "Sean Mc Grath" at Aug 28, 97 04:21:22 pm
Message-ID: <199708281621.JAA05124@scrumpox.rd.wdi.disney.com>

> 
> 
> >>Won't our desperate Perl hackers' beloved $_ variable be significantly less
> >>useful if it contains the *entire* document.
> >
> >Just as it's not useful in processing HTML. Regexps that don't match across
> >line boundaries are the most common problem I've seen in HTML-processing
> >Perl scripts. Looks like that will continue until people figure out that
> >Perl's line "Feature" is jsut a big when used with XML/HTML.
> >
> 
> Bang goes the the notion of a lightweigth XML app. then! Thou shalt always
> parse!
> 
Nonsense! Regexps that fail across line boundaries are only due to
lazy DPHs.  The "s" modifer to a regex will treat the entire string
(i.e., document) as a single target.  The problem here is that
_insignificant_ whitespace (a newline) is treated significantly.
A regex modifier which treated newline, tabs, etc., as spaces would
really help reduce this problem. (Larry Wall doesn't follow this
mailing list does he?)


> XML as a friendly format to, say, DPH needs some explaining. To use Perl to
> read/write XML 
> you *must* use an XML parser. Indeed any tool intending to read/write XML
> needs to use a 
> *fully blown parser* to get at the document. Bye bye the entire Unix family
> of line oriented text processing apps:-(
> 

Maybe you just need to put a filter at the beginning of your pipeline
to normalize whitespace to whatever you need.

Matthew

-----------------------------------------------------
Matthew Fuchs
matt@wdi.disney.com
http://cs.nyu.edu/phd_students/fuchs
-----------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From agreene at bitstream.com  Thu Aug 28 18:56:14 1997
From: agreene at bitstream.com (Andrew Greene)
Date: Mon Jun  7 16:58:22 2004
Subject: Whitespace
In-Reply-To: <199708281621.JAA05124@scrumpox.rd.wdi.disney.com>
	(matt@wdi.disney.com)
Message-ID: <19970828165215.AAA6091@AGREENE-PC.bitstream.com>

But perl doesn't have to break $_ on newlines. Whenever I do SGML
"parsing" with perl, I start off with

    $/ = "<";

which says "the record-break character is '<', instead of newline."
Then, within my while (<>) loop, each $_ contains a single tag and
some content, (roughly) matching the regexp:

    ($etagP, $gi, $attlist, $content) =
       /(\/?)(\w+)\s*([^>]*)\>(.*)/;

[For purists only: Yes, GIs can contain a different set of characters
than \w+, and attributes can contain > if it's enclosed in quotations,
and this doesn't chop off the '<' at the end of all tags except the
last one, and so on and so forth.... For SGML, it assumes that the
first character of ETAGO is the same as STAGO; for XML, it doesn't
handle the /> syntax... but it's simplified to make a point.]

The point is that perl doesn't care whether you have whitespace or
not, and if your perl script is splitting on newlines then you're
probably not going to correctly handle tags that contain newlines,
such as

    <book
        id=TWENTYKDOWN
        authorid=VERNEJ
        pubid=PENGUIN
    ><title>20,000 Leagues Under the Sea</title
    ></book
    >

- Andrew Greene

        
xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Thu Aug 28 19:52:35 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:22 2004
Subject: Whitespace
Message-ID: <199708281752.SAA01530@mail.iol.ie>


[Sean Mc Grath]
>> >>Won't our desperate Perl hackers' beloved $_ variable be significantly less
>> >>useful if it contains the *entire* document.

[David Durand]
>> >Just as it's not useful in processing HTML. Regexps that don't match across
>> >line boundaries are the most common problem I've seen in HTML-processing
>> >Perl scripts. Looks like that will continue until people figure out that
>> >Perl's line "Feature" is jsut a big when used with XML/HTML.

[Sean Mc Grath]
>> Bang goes the the notion of a lightweigth XML app. then! Thou shalt always
>> parse!
>> 

[Matthew Fuchs]
>Nonsense! Regexps that fail across line boundaries are only due to
>lazy DPHs.

I think you miss my point. I am concerned about what happens to
line oriented tools when there are *no* line breaks not when there are.
Brief, grep,
awk, head, tail, xargs etc. are all line oriented tools.

I am concerned about their utility with WS-less XML. (Approximately zero?).
>From memory, 
Perl allows you to set the Record End pattern to whatever you like.
You certainly cannot with grep, tail etc. to my knowledge.

>> XML as a friendly format to, say, DPH needs some explaining.
>>To use Perl to read/write XML 
>> you *must* use an XML parser. Indeed any tool intending
>>to read/write XML needs to use a 
>> *fully blown parser* to get at the document. Bye bye the
>> entire Unix family of line oriented text processing apps:-(
>> 

[Matthew Fuchs]
>Maybe you just need to put a filter at the beginning of your pipeline
>to normalize whitespace to whatever you need.

I think you have missed my point again. I said *"read/write"* XML
applications. Putting
a filter at the start to normalize the WS *blows* my ability to losslessly
write the result. If I munge WS I have munged the doc. A doc with WS leads
to more
complex cross translations than, say, Monastic SGML, because of the
escalating state
space that intermingled significant WS brings with it.

FINALLY,

I would like to thank David Durand for the suggestion posted to this group
that a few paragraphs on WS be put together for developers. I think this
would be really, really useful and would have the inestimably beneficial
side-effect
of shutting me up:-)

I am acutely aware that this thread is annoying to some (many?!) and I
would like to take this opportunity to bow out and await the WS explanatory
note...


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Thu Aug 28 21:09:22 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:22 2004
Subject: Re  Whitespace
Message-ID: <v03007801b02b854d9f99@[205.181.197.125]>

>From: Sean Mc Grath <digitome@iol.ie>
>
>>Just as it's not useful in processing HTML. Regexps that don't match across
>>line boundaries are the most common problem I've seen in HTML-processing
>>Perl scripts. Looks like that will continue until people figure out that
>>Perl's line "Feature" is just a bug when used with XML/HTML.
>>
>
>Bang goes the the notion of a lightweigth XML app. then! Thou shalt always
>parse!
>
>XML as a friendly format to, say, DPH needs some explaining. To use Perl to
>read/write XML
>you *must* use an XML parser. Indeed any tool intending to read/write XML
>needs to use a
>*fully blown parser* to get at the document. Bye bye the entire Unix family
>of line oriented text processing apps:-(

Come on, This is a crock. I've set that crytic little variable
(funny that everything in Perl deserves that description) so that
linend won't block regexp matches. Once that was done, I wrote a few
regexps and parsed HTML just fine (It takes 1 line for a simple tag
pattern match, and 10 for a loop to create a reasonably full parse
into elements, content, and attribute values). I'm sure a "real" Perl
programmer (unlike me) can shrink that down to 2-3 lines of
triwty little characters, all of them different.

XML should be no harder. My understanding of the goal for the DPH was
always that XML would be no worse than HTML -- ie. for quick and dirty
transformations or operations, quick and dirty parsers would work. As
far as I can tell, "dirty" means that you know (or are pretty sure)
they will work with one document or corpus of documents, not
necessarily that they will work with any arbitrary document.

If you never break tags across lines in your documents, your Perl
desperation may work without worrying about this case; if you do, you
have to have smarter desperation. For _reliable_ parsing of
_arbitrary_ documents, you probably do need a full parser of the
instance language (10 productions in the standard, or so, wasn't it?).
There's no reason that that level of parsing can't be implemented
within no more than 20 lines of Perl. I can't remember (or abide) the syntax of
Perl enough to write it, but I'm sure there's a DPH on the list wh
would love to volunteer.

>>IT Sounds to me like what we really need is a small paper (about 5
>>paragraphs) explaining whitespace for developers:
>>
>I think this is an excellent idea!

Well, I gave the three sentence version. Feel free to expand it...
Acually I think the three sentences sum it up pretty well.

  --
David------------------------------------------+----------------------------
David Durand                 dgd@cs.bu.edu| david@dynamicDiagrams.com
Boston University Computer Science        | Dynamic Diagrams
http://www.cs.bu.edu/students/grads/dgd/  | http://dynamicDiagrams.com/


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Thu Aug 28 21:09:31 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:22 2004
Subject: My 5 paragraphs
Message-ID: <v03007802b02b868feb32@[205.181.197.125]>

I decided that it wasn't very helpful to punt on my own suggestion. A few
<P>'s on whitespace follow. Adapt/preserve/wrap-fish-in them as you like.

Whitespace in XML is always signficant wherever data may appear. This means
that XML truly treats each character that is not inside a tag or a comment
as a potentially meaningful piece of information. As comments are generally
not significant to most applications (but must be preserved by editors and
other transduction processes), whitespace in a few contexts is generally
best not made the crux of a critical distinction: For instance, in an
element whose sole allowed content according to the DTD is other elements
(SGML term: element content) it is probably a bad idea to base processing
semantics on the presence or absence of whitespace.

However, since linking depends critically on where any characters not in
tags occur, even such space should not be casually deleted, as it may cause
hyperlinks in other documents to break. This happens because hyperlinks
have to be able to count sub-parts of an element's content, and the
whitespace between two elements is such a sub-part.

XML provides some hints as to the treatment of whitespace in the SPACE
attribute. For most applications like browsing or typesetting or importing
into a databse, normalizing such whitespace should be harmless to the
semantic content. You might break links in these cases as well, but any
change can break a link.

If you are creating and editing or transduction application, you should
_not_ change any whitespace without explicit authorization from the author
-- auch changes may damage links and file offsets that the user wants to
preserve. This is the same kind of restriction as you must observe when
editing an XML file with comments or PIs in it. Even if you don't use or
understand these, they must be preserved in the general case.

In sum, there are 2 kinds of applications that can use XML: Those, like
editors, that should preserve all information, they can (including all
whitespace, comments, PIs, etc), unless instructed otherwise. We might call
these transduction applications, because they produce a representation of
the document they read as their output. The other sort of application --
call them processing applications -- is responsible for processing the
results of an XML parser, and may ignore comments and PIs, normalize
whitespace (as warranted by knowledge of the DTD, tags, or XML-SPACE
hints), and so forth. Such applications are generally creating a specific
view or result from the data in an XML document, and may do that in any way
that produces the desired result.

  -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jharmon at telecnnct.com  Thu Aug 28 21:51:22 1997
From: jharmon at telecnnct.com (Jim Harmon)
Date: Mon Jun  7 16:58:22 2004
Subject: Novice request...
Message-ID: <3405D2EA.31D2DE92@telecnnct.com>

Flameshield UP...

Off topic semi-related request...

I've been monitoring this list for a month-or-so now, and have found
much of the discussion enlightnening.

I'm a pure Novice with XML, and trying hard to follow some of the deeper
discussions.

This request is only slightly related---if you follow XML heritage back
to SGML, then to one of XML's cousins, CGI based HTML...

Does anyone know of a CGI-developers list like this one?  I have a
situation that needs some deeper experience than I've got, and I don't
want to waste badwidth here on the question...

Thanks for your indulgence...

Flameshield DOWN...
-- 
   Jim Harmon                           The Telephone Connection
jim@telecnnct.com                          Rockville, Maryland

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Thu Aug 28 23:03:47 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:23 2004
Subject: Re  Whitespace
Message-ID: <199708282103.WAA04324@GPO.iol.ie>

>> Bye bye the entire Unix family of line oriented text processing apps:-(
>
>Come on, This is a crock.

[Discussion about a *single* tool - Perl - from the genus "Unix familiy of line
oriented text processing apps" elided]

Since when is Perl == Unix family of line oriented text processing apps?

The world is littered with s/w tools that have line length
limits. These tools are *blown* by WS-less XML.

Throw out that grep, that text editor, that fgets(), that diff,sort,uniq utility
There all busted for XML use.

P.S.

"Crock". I'll add that to my collection of spicy ripostes I have had
accumulates over the course of this thread.:-)

Time to end.

If nothing else, David's five paragraphs have been born from this.
I suggest they should be mandatory reading for anyone approaching
XML development.

It is clear that I see a problem that others don't.
Thus the odds are I am wrong.
I hope so.


Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Thu Aug 28 23:03:50 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:23 2004
Subject: My 5 paragraphs
Message-ID: <199708282103.WAA04320@GPO.iol.ie>

>I decided that it wasn't very helpful to punt on my own suggestion. A few
><P>'s on whitespace follow. Adapt/preserve/wrap-fish-in them as you like.
>
Thanks. I found your 5 paragraphs very helpful. I'm sure other developers will
as well.

Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From merchise at iosphere.net  Fri Aug 29 01:27:23 1997
From: merchise at iosphere.net (Merchise)
Date: Mon Jun  7 16:58:23 2004
Subject: XAPI documentation
Message-ID: <01BCB3E8.B751B0A0@kryster.merchise.edu>

Hi everybody. It's my first message on this list. Can anyone tell me where can
I find documentation on XAPI-J ? I would also appreciate to hear from XML processors developed in another language.

Jose Ramon Rodriguez Concepcion


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Fri Aug 29 17:06:30 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:23 2004
Subject: Re  Whitespace
In-Reply-To: <199708282103.WAA04324@GPO.iol.ie>
Message-ID: <v03007800b02c94228b3c@[205.181.197.125]>

At 3:36 PM -0500 8/28/97, Sean Mc Grath wrote:
>>> Bye bye the entire Unix family of line oriented text processing apps:-(
>>
>>Come on, This is a crock.
>
>[Discussion about a *single* tool - Perl - from the genus "Unix familiy of
>line
>oriented text processing apps" elided]

Perl is of course the tool whose usage was made a part of the design goals
of XML. It's also the most common language of web-hackers, by far.

>Since when is Perl == Unix family of line oriented text processing apps?

>The world is littered with s/w tools that have line length
>limits. These tools are *blown* by WS-less XML.

The mainframe world was littered with tools that couldn't edit nything
other than 80 character fixed length records -- but that eventually changed.

It think a little less passion is in order here: there's _no requirement_
that XML tools not use whitespace, nor is there a requirement that they
_do_ use whitespace. People will do what is convenient for them, and for
the people whose convenience they care about.

This is as it always is. I suspect that line-breaks will in fact be common
in XML files for some time to come. The thing that worries me is that most
tools are not as smart as the editor I use on my Mac, that can edit and
save files in their native line-ending convention without even worrying
about it. And it is unfortunately true that stupid processors (like
emailers and non-XML editors) _are_ going to "convert" files. This won't
mess up PCDATA chunk counts, but will destroy character offsets (a riskiy
linking mechanism anyway). It is likely to cause problem for verbatim-style
formatting in carelessly written stylesheets, and I don't see any way
(other than painful experience) that solutions are to be found to this --
because the solutions are either reformed behavior (Don't convert linend
strings) or smarter processing software (prepare to accept CR, LF, or CRLF
at any time).

This is a problem that XML has not created, but simply tries not to make
worse, by at least picking a simple rule that can be understood.

>Throw out that grep, that text editor, that fgets(), that diff,sort,uniq
>utility
>There all busted for XML use.

gets is of course Broken As Designed, as the cause of most security bugs in
Unix systems.

Again, they are broken for XML use with files created a particular way.
They are also broken for HTML files created the same way, and I don't hear
the weeping and wailing.

>"Crock". I'll add that to my collection of spicy ripostes I have had
>accumulates over the course of this thread.:-)

I meant it as a description, in a similar, (but IMHO) slightly less-frantic
tone.

>Time to end.

Can you suggest any solution to the "grep" problem other than requiring a
fixed line-max in XML. Do you think that that hideous hack to accomodate
defective (if very useful) tools is really worth it. Can you suggest how we
would determine that buffer size? (Test Grep and AWK on our favorite 5
unices (what about wc, and Minix)) There are too many arbitrary lines that
would have to be drawn in the sand to "solve" that problem. What about
card-format editors like XEDIT, where editing lines of more than 72
characters is inconvenient  (and lines of more than about 1800 characters
is unbelievably convenient). There's still a lot of IBM iron out there. Or
should we only worry about _your_ favorite tools being able to handle any
XML document?

Certainly authors can work within the limits of their chosen tools with
XML. I don't see that we can realistically  provide them with more.

>If nothing else, David's five paragraphs have been born from this.
>I suggest they should be mandatory reading for anyone approaching
>XML development.

Edits for clarity would be appreciated, and if they pass muster by other
experts, maybe they should move to a section of the XML-FAQ for developers.
If there isn't such a section, maybe we should start one!

>It is clear that I see a problem that others don't.
>Thus the odds are I am wrong.
>I hope so.

Actually, I agree with you that there are problems (there are legal XML
documents that won't work with grep). There are plausible and common file
operations, like changing line-end marking conventions, that _may well_
cause problems with some documents and stylesheets. I just don't see any
solutions to these problems other than to let them work themselves out in
the many different environments where they must be worked out. There is no
solution that isn't so complex in its ramifications and details that it
wouldn't simply be another problem for some reasonable application of XML.

  -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Aug 29 21:59:20 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:23 2004
Subject: XML-DEV resources
Message-ID: <9727@ursus.demon.co.uk>

[catalysed by the WS discussion]
In message <199708282103.WAA04320@GPO.iol.ie> digitome@iol.ie (Sean Mc Grath) writes:
[DavidD]
> >I decided that it wasn't very helpful to punt on my own suggestion. A few
> ><P>'s on whitespace follow. Adapt/preserve/wrap-fish-in them as you like.
> >
> Thanks. I found your 5 paragraphs very helpful. I'm sure other developers will
> as well.

Agreed. I had finally come to the conclusion that the most important thing was 
to provide a guide on this. I hope that readers of this list will feel that, 
although the discussion has been long, it has been valuable. An important 
aspect of this list is that it is hypermailed and that people can refer back to 
seminal contributions or even threads. (I can see the current thread being
useful for some people revisiting this topic. It may not mirror their thought
processes, but it shows - I hope - that this is not an easy topic and
newcomers may take time to adjust.)

Although the list is indexed automatically (thanks Henry), I had thought about
providing a manually created set of pointers to what (I feel) have been 
useful contributions to come out of this list and to which people might wish
to refer. (For example, I frequently refer to Eliot Kimber's treatise on links
and David's WS paras would be similar.  There are also quite a few URLs of value
mentioned. OTOH, I know that Robin Cover indexes things from this list.

As for an XML-DEV FAQ, I am sure this would be valuable in the future - is
there a need for it at present? (remember that there are XML FAQ from Peter
Flynn, and Robin Cover's remarkable collection of everything *ML-like.) 
There was also - at one stage - enthusiasm for collecting and mirroring 
XML resources, so as to avoid long download problems - any volunteers (or
even sites I have missed :-) If there is a need and volunteers, this may be
a good time to start.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From gannon at commerce.net  Fri Aug 29 23:56:19 1997
From: gannon at commerce.net (Patrick Gannon)
Date: Mon Jun  7 16:58:23 2004
Subject: CommerceNet Portfolio Meeting & Launch of XML/EDI Task Force
Message-ID: <01BCB489.9CC97960@arrow-d27.sierra.net>

To all interested parties.

CommerceNet will be conducting its quarterly members meeting and research portfolio meetings September 15-17, 1997.  On Tuesday, Sep. 16, there will be two meetings that may be of interest to participants of this email list.  If you plan to attend, pleasae be sure to register online.  If you would like to participate via conference call facilities contact Patrick Gannon <mailto:gannon@commerce.net> or Rik Drummond <mailto:drummond@commerce.net>.

http://www.commerce.net/services/sept_97

Information Access Portfolio
Members Meeting
Tuesday, Sep. 16, 1997
9:30am - 11:30am
TechMart - Santa Clara, CA


9:30 am	Introductions & Status of Projects, Patrick Gannon, Executive Director, CN-IAP.  mailto:gannon@commerce.net

http://members.commerce.net/pw/portfolios/access/


9:40 am	Automating the Web, Phillip Merrick, President webMethods, Inc.  mailto:phillip@webmethods.com

This presentation and demonstration will show how webMethods is applying XML to provide automated access to Web data and services from mainstream business applications. A multi-carrier shipping and tracking service will be highlighted to illustrate real business use of the technology.

Phillip Merrick is founder and President of webMethods, Inc., the leading supplier of Web automation and integration solutions to the Global 2000. Over his career he has architected  and managed several application development tools in use at large IT organizations around the world.

webMethods
http://www.webmethods.com/


10:10 am	Shopping Assistants in Internet Commerce, Dan Weld, CTO, Netbot, Inc.   mailto:weld@cs.washington.edu

Dan will discuss Netbot's "parallel pull" agent technology, and how it has been used in the "Jango" internet shopping assistant. He will compare how this technology relates to comparable agent technologies and database integration technologies. He will also discuss NetBot's experience in using it for information integration across stores, review sites, and other sources. Dan will also include a discussion of the "comparison shopping" issue, and our experience with retailers.

Dan Weld is a founder and CTO of Netbot Inc. He is also a professor in the Department of Computer Science and Engineering at the University of Washington. He has led the Intelligent Agent research group for several years, during which they have developed many innovative approaches to aggregating information from a wide variety of Internet information sources.

Netbot Jango Download and information
http://www.jango.com

Parallel Pull from the Invisible Web
http://www.jango.com/company/tech/tech.html

Dan Weld
http://www.cs.washington.edu/homes/weld/weld.html


10:40 am	XML Catalog Project Results, Dr. Terry Allen, CNgroup.  Mailto:terry.allen@cngroup.com

Dr. Allen is leading the project to create three sample merchant catalogs and one Document Type Definition file using XML.  He will be explaining the process and showing the sample results.

Dr. Allen is a specialist in technical standards that support complex electronic publishing applications, including information discovery and retrieval, metadata, and internationalization.  He is a codesigner of the Docbook DTD, the SGML application most commonly used for computer documentation. He has participated in IETF and W3C working groups on HTML, XML, URIs, MIME, SGML, WEBDAV, and the OCLC metadata group.  He has a BA and MA from the Univ of Pennsylvania and a Ph.D. from Harvard U.


11:20 am	XML iMarket Action Planning, All Attendees

This project is being designed to build a demonstration virtual marketplace which utilizes the multiple vendor XML catalogs with standard DTDs and allows shoppers to search for products across vendors by specifying product and merchant attributes.  The use of style sheets to enable merchant differentiation and promotion of brand equity is also planned.


11:30 am	Adjourn.  

Attendees are invited to attend the EDI & Network Services Portfolio meeting at 3:00pm to 6:00pm where there will be an XML Tutorial and ically definable. 

SPEAKER BIO

Dr. Robert J. Glushko is the Director for Component-Based Commerce at CNgroup, a recent "spin-off" from CommerceNet that provides software and services in support of open Internet markets.  Until May 1997 he was the Chief Scientist and Vice President for Strategy at Passage Systems, a consulting and systems integration firm specializing in SGML-based publishing (which he co-founded in 1992). He has nearly twenty years of research, development, and consulting experience in online publishing and commerce, SGML and XML, and user interfaces for hypertext systems and information retrieval.  


All attendees must register online.  Non-CommerceNet members are invited to attend the portfolio meetings on Tuesday, Sep. 16.  The Wednesday meeting is for CommerceNet members only.

http://www.commerce.net/services/sept_97/


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From john at datachannel.com  Sat Aug 30 06:25:41 1997
From: john at datachannel.com (John Tigue)
Date: Mon Jun  7 16:58:23 2004
Subject: XAPI documentation
References: <01BCB3E8.B751B0A0@kryster.merchise.edu>
Message-ID: <340795A0.5BD481D@datachannel.com>

Xapi-J documentation is at http://www.datachannel.com/xml/dev


Merchise wrote:

> Hi everybody. It's my first message on this list. Can anyone tell me
> where can
> I find documentation on XAPI-J ? I would also appreciate to hear from
> XML processors developed in another language.
>
> Jose Ramon Rodriguez Concepcion
>
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


--
John Tigue
Sr. Software Architect
DataChannel
http://www.datachannel.com
jtigue@datachannel.com
206-462-1999

-------------- next part --------------
A non-text attachment was scrubbed...
Name: vcard.vcf
Type: text/x-vcard
Size: 263 bytes
Desc: Card for John Tigue
Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970830/88545fb7/vcard.vcf
From jtauber at jtauber.com  Sun Aug 31 05:24:06 1997
From: jtauber at jtauber.com (James K. Tauber)
Date: Mon Jun  7 16:58:23 2004
Subject: Re  Whitespace
Message-ID: <01BCB5FD.4EFEB540.jtauber@jtauber.com>

With all this talk about the desperate Perl hacker...

Wouldn't life have been much easier for the DPH if > was forbidden in PCDATA 
and only one type of quote was used as a delimiter?
--
James K. Tauber / jtauber@jtauber.com
Perth, Western Australia


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Sun Aug 31 06:55:18 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:23 2004
Subject: Re  Whitespace
Message-ID: <3.0.32.19970830215223.008bb100@pop.intergate.bc.ca>

At 11:36 AM 30/08/97 -0700, James K. Tauber wrote:
>With all this talk about the desperate Perl hacker...
>
>Wouldn't life have been much easier for the DPH if > was forbidden in PCDATA 
>and only one type of quote was used as a delimiter?

Yes.  Maybe we should do this. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)