From ht at cogsci.ed.ac.uk  Mon Sep  1 18:09:22 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun  7 16:58:23 2004
Subject: New release of LT XML toolkit, including Windows95/Windows NT binaries
Message-ID: <4005.199709011605@grogan.cogsci.ed.ac.uk>

The HCRC Language Technology Group is pleased to announce a new
release of LT XML, the first high-performance publicly available XML
toolset written in C.

For further information and access to the software distribution, see

  http://www.ltg.ed.ac.uk/software/xml/

The LT XML tool-kit includes stand-alone tools for a wide range of
processing of well-formed XML documents, including searching and
extracting, down-translation (e.g. report generation, formatting),
tokenising and sorting.  If you've been waiting for high throughput
XML tools with simple command-line interfaces to explore the potential
of XML, LT XML is just what you need to get started.  Basic throughput
is under 3 seconds/megabyte on a Pentium 133, fast enough to make
processing substantial XML datasets feasible.

LT XML is an integrated set of XML tools and a developers' tool-kit,
including a C-based API. As well as sources, this release includes
executable images for a range of platforms, including Windows 95 and
Windows NT, FreeBSD, Linux and Solaris.  A preliminary partial
Macintosh version is also available.  This release is restricted to
8-bit character input/output, and does NOT do validation, although it
does process and make use of DTDs in documents which include them.

Sequences of LT XML tool applications can be pipelined together to achieve
complex results.  Tools included in this release include:

  * sggrep -- extract sub-parts of XML documents, using patterns over
              element structure and text content;

  * textonly -- extract text content only;

  * sgsort -- reorder sub-elements within specified elements

  * sgmltrans -- pattern+action downtranslation tool

  * sgrpg -- Structure-based transformation tool

  * simple, simpleq -- event- and fragment-based examples of API use

For special purposes beyond what the pre-constructed tools can
achieve, extending their functionality and/or creating new tools is
easy using the LT XML API, which provides both event-oriented and
tree-fragment oriented access to the input document stream. Minimal
applications require less than one-half page of C code to express.

LT XML is available to anyone free of charge for non-commercial purposes.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tfj at apusapus.demon.co.uk  Tue Sep  2 11:22:01 1997
From: tfj at apusapus.demon.co.uk (Trevor Jenkins)
Date: Mon Jun  7 16:58:23 2004
Subject: Other  whitespace problems was Re: Whitespace rules (v2)
In-Reply-To: <3.0.32.19970818162238.00902760@pop.intergate.bc.ca>
Message-ID: <199709020007.tfj.2207@apusapus.demon.co.uk>

> At 09:52 PM 18/08/97 +0000, Trevor Jenkins wrote:
> > I'm 
> >convinced that as they stand the separator rules in XML are 
> >ambiguous.
> 
> Yes; Michael Sperberg-McQueen and I both agree that these need
> some more work.

Only "some". ;-)

> If it weren't for the $#*!@#%#!ing Parameter Entities, ...

These do seem to be allowed in some very odd places. Even for 
compatibility I see no reason to allow them in element declarations 
where %Name occurs. In SGML these was a useful feature; in XML these 
are obscurantist.

> all this would be simple and straightforward - designing a grammar
> for the SGML element declaration language is not exactly rocket
> science.

But it is computing science. I know some adherents of this list
despise computing scientists (I heard one of you say so publicly a
few months ago) but we can fix this problem.

> But when you try to pollute the grammar by saying where you can
> and can't replace chunks of it with PE references, it all of a
> sudden gets hideously difficult.

I've been on holiday since my original posting and relaxed by trying
to define an equivalent grammar to describe XML that does not have
the convolutions of the existing BNF one.

> ... SGML gets around this with the clever device ...

I get around this with the cunning plan of using a W-grammar rather
than BNF. Some may recall W-grammars as the formalism used to define
the Algol-68 programming language.

> ...

> Anyhow, further grammar engineering is in order.  One thing to 
> think about is simply to drop the 'S' (space) nonterminal, write
> a couple of simple tokenization rules, and take it that way.  CMSMcQ
> has investigated this at length, but it has problems too.

My equivalent W-grammar for XML does not have any S nonterminals at 
all. The number of rules is roughly the same as the "official" BNF 
set. I think that mine are simpler and correct. However, I did add 
some meta-productions and hyper-rules to accommodate the 
parameter entity problem and to enforce the quoting rules. This 
increase in size is justified as I also made the grammar LL(1), 
which the official one is not.

> Pardon me for whining; I'm sure we'll figure out something. -Tim

Any one interested in my version of the grammar should email me 
and I'll gladly send you a copy. Be warned though you have to be a 
computing scientist to understand it. :-) If there's enough interst 
I'll post it to the list.

Regards, Trevor.

--

"Real Men don't Read Instruction Manuals"
   Tim Allen, Home Improvement

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tfj at apusapus.demon.co.uk  Wed Sep  3 18:43:21 1997
From: tfj at apusapus.demon.co.uk (Trevor Jenkins)
Date: Mon Jun  7 16:58:23 2004
Subject: Parameter Entity Reference Considered Harmful
Message-ID: <199709031539.tfj.2212@apusapus.demon.co.uk>

In making one more pass through the official grammar for XML, before 
I despatch my alternative version to the 5 people who've requested 
copies, I spotted a real dumb error in the doctype declaration.

The existing definition says:

doctypedecl ::= '<!DOCTYPE' S  Name (S
                ExternalID)? S? ('['
                %markupdecl* ']'  S?)? '>'

Now the notational devie of prefixing a production name with %, and
I quote, "specifies that <i>in the external DTD subset</i>..."
(emphasis copied from the definition). But notice that this
%markupdecl is NOT in the external DTD subset at all! Also the
definition of the % device introduces another set of ambiguities
from white space.

Me thinks that the existing official grammar is in desparate need of 
a re-write.

Regards, Trevor.

--

"Real Men don't Read Instruction Manuals"
   Tim Allen, Home Improvement

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From gannon at commerce.net  Wed Sep  3 20:34:29 1997
From: gannon at commerce.net (Patrick Gannon)
Date: Mon Jun  7 16:58:23 2004
Subject: Internet Week Article on CommerceNet & XML
Message-ID: <01BCB85C.8DC17860@arrow-d86.sierra.net>

XML Grabs Markup Baton -- 
CommerceNet pilot aims push enabler at EDI, Web catalogs.

You can read the Internet Week article at:
http://www.techweb.com/se/directlink.cgi?INW19970901S0087

A good overview of XMl and CommerceNet's activities using XML.

Enjoy!

Patrick Gannon, Executive Director
Information Access Portfolio, CommerceNet
http://www.commerce.net
-----------------------------------------
President & CEO
Internet Shopping Directory, Inc.
865 Tahoe Blvd., Suite 211
Incline Village, NV  89451
702-831-2251   702-831-3925 (Fax)
mailto://patrick@shoppingdirect.com
http://www.shoppingdirect.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Wed Sep  3 21:54:40 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:23 2004
Subject: Character classification
Message-ID: <3.0.32.19970903125120.00796e20@pop.intergate.bc.ca>

I've been working on making Lark really do Unicode.  JDK 1.1 is supposed
to have, unlike 1.0, a usable input method; thus the problem is to check,
when you're reading a GI or Attribute name, whether the characters are
legal namestart/name characters.

It turns out to be quite a lot of work, so this is an offer to share.
I wrote a program (based on Lark) that pulls the relevant character
classes out of the XML spec, picks apart the markup, and writes another
Java class that has some static arrays and offers two methods:

package textuality.lark;
public class CharClasses
{
 public static boolean isNameC(char c)
 public static boolean isNameStart(char c)
}

It needs about 4k of tables (which it binary-searches); it might be faster
with 128k of byte-addressable tables or 16K of bitmaps, neither of which
would be hard to implement.

(a) is this a waste of time, i.e. are there Unicode library calls that
    do it?
(b) if not, has everyone else already done this?
(c) if not, if I'm going to publish this, is the API above OK?

I've attached the current Java source file for those who find the 
explanation above insufficiently clear.
-------------- next part --------------
// Synthetically generated; do not edit!
//
package textuality.lark;
import java.util.*;
public class CharClasses
{

  static final char[] sNameStart =
  {
    170,170, 181,181, 186,186, 192,214, 216,246, 
    248,501, 506,535, 592,680, 688,696, 699,705, 
    736,740, 890,890, 902,902, 904,906, 908,908, 
    910,929, 931,974, 976,982, 986,986, 988,988, 
    990,990, 992,992, 994,1011, 1025,1036, 1038,1103, 
    1105,1116, 1118,1153, 1168,1220, 1223,1224, 1227,1228, 
    1232,1259, 1262,1269, 1272,1273, 1329,1366, 1369,1369, 
    1377,1415, 1488,1514, 1520,1522, 1569,1594, 1601,1610, 
    1649,1719, 1722,1726, 1728,1742, 1744,1747, 1749,1749, 
    1765,1766, 2309,2361, 2365,2365, 2392,2401, 2437,2444, 
    2447,2448, 2451,2472, 2474,2480, 2482,2482, 2486,2489, 
    2524,2525, 2527,2529, 2544,2545, 2565,2570, 2575,2576, 
    2579,2600, 2602,2608, 2610,2611, 2613,2614, 2616,2617, 
    2649,2652, 2654,2654, 2674,2676, 2693,2699, 2701,2701, 
    2703,2705, 2707,2728, 2730,2736, 2738,2739, 2741,2745, 
    2749,2749, 2784,2784, 2821,2828, 2831,2832, 2835,2856, 
    2858,2864, 2866,2867, 2870,2873, 2877,2877, 2908,2909, 
    2911,2913, 2949,2954, 2958,2960, 2962,2965, 2969,2970, 
    2972,2972, 2974,2975, 2979,2980, 2984,2986, 2990,2997, 
    2999,3001, 3077,3084, 3086,3088, 3090,3112, 3114,3123, 
    3125,3129, 3168,3169, 3205,3212, 3214,3216, 3218,3240, 
    3242,3251, 3253,3257, 3294,3294, 3296,3297, 3333,3340, 
    3342,3344, 3346,3368, 3370,3385, 3424,3425, 3585,3630, 
    3632,3632, 3634,3635, 3648,3653, 3713,3714, 3716,3716, 
    3719,3720, 3722,3722, 3725,3725, 3732,3735, 3737,3743, 
    3745,3747, 3749,3749, 3751,3751, 3754,3755, 3757,3758, 
    3760,3760, 3762,3763, 3773,3773, 3776,3780, 3804,3805, 
    3904,3911, 3913,3945, 4256,4293, 4304,4342, 4352,4441, 
    4447,4514, 4520,4601, 7680,7835, 7840,7929, 7936,7957, 
    7960,7965, 7968,8005, 8008,8013, 8016,8023, 8025,8025, 
    8027,8027, 8029,8029, 8031,8061, 8064,8116, 8118,8124, 
    8126,8126, 8130,8132, 8134,8140, 8144,8147, 8150,8155, 
    8160,8172, 8178,8180, 8182,8188, 8319,8319, 8450,8450, 
    8455,8455, 8458,8467, 8469,8469, 8472,8477, 8484,8484, 
    8486,8486, 8488,8488, 8490,8497, 8499,8504, 8544,8578, 
    12295,12295, 12321,12329, 12353,12436, 12449,12538, 12549,12588, 
    12593,12686, 19968,40869, 44032,55203, 63744,64045, 64256,64262, 
    64275,64279, 64287,64296, 64298,64310, 64312,64316, 64318,64318, 
    64320,64321, 64323,64324, 64326,64433, 64467,64829, 64848,64911, 
    64914,64967, 65008,65019, 65136,65437, 65440,65470, 65474,65479, 
    65482,65487, 65490,65495, 65498,65500
  };

  static final char[] sNameC =
  {
    170,170, 181,181, 183,183, 186,186, 192,214, 
    216,246, 248,501, 506,535, 592,680, 688,696, 
    699,705, 720,721, 736,740, 768,837, 864,865, 
    890,890, 902,906, 908,908, 910,929, 931,974, 
    976,982, 986,986, 988,988, 990,990, 992,992, 
    994,1011, 1025,1036, 1038,1103, 1105,1116, 1118,1153, 
    1155,1158, 1168,1220, 1223,1224, 1227,1228, 1232,1259, 
    1262,1269, 1272,1273, 1329,1366, 1369,1369, 1377,1415, 
    1425,1441, 1443,1465, 1467,1469, 1471,1471, 1473,1474, 
    1476,1476, 1488,1514, 1520,1522, 1569,1594, 1600,1618, 
    1632,1641, 1648,1719, 1722,1726, 1728,1742, 1744,1747, 
    1749,1768, 1770,1773, 1776,1785, 2305,2307, 2309,2361, 
    2364,2381, 2385,2388, 2392,2403, 2406,2415, 2433,2435, 
    2437,2444, 2447,2448, 2451,2472, 2474,2480, 2482,2482, 
    2486,2489, 2492,2492, 2494,2500, 2503,2504, 2507,2509, 
    2519,2519, 2524,2525, 2527,2531, 2534,2545, 2562,2562, 
    2565,2570, 2575,2576, 2579,2600, 2602,2608, 2610,2611, 
    2613,2614, 2616,2617, 2620,2620, 2622,2626, 2631,2632, 
    2635,2637, 2649,2652, 2654,2654, 2662,2676, 2689,2691, 
    2693,2699, 2701,2701, 2703,2705, 2707,2728, 2730,2736, 
    2738,2739, 2741,2745, 2748,2757, 2759,2761, 2763,2765, 
    2784,2784, 2790,2799, 2817,2819, 2821,2828, 2831,2832, 
    2835,2856, 2858,2864, 2866,2867, 2870,2873, 2876,2883, 
    2887,2888, 2891,2893, 2902,2903, 2908,2909, 2911,2913, 
    2918,2927, 2946,2947, 2949,2954, 2958,2960, 2962,2965, 
    2969,2970, 2972,2972, 2974,2975, 2979,2980, 2984,2986, 
    2990,2997, 2999,3001, 3006,3010, 3014,3016, 3018,3021, 
    3031,3031, 3047,3055, 3073,3075, 3077,3084, 3086,3088, 
    3090,3112, 3114,3123, 3125,3129, 3134,3140, 3142,3144, 
    3146,3149, 3157,3158, 3168,3169, 3174,3183, 3202,3203, 
    3205,3212, 3214,3216, 3218,3240, 3242,3251, 3253,3257, 
    3262,3268, 3270,3272, 3274,3277, 3285,3286, 3294,3294, 
    3296,3297, 3302,3311, 3330,3331, 3333,3340, 3342,3344, 
    3346,3368, 3370,3385, 3390,3395, 3398,3400, 3402,3405, 
    3415,3415, 3424,3425, 3430,3439, 3585,3630, 3632,3642, 
    3648,3662, 3664,3673, 3713,3714, 3716,3716, 3719,3720, 
    3722,3722, 3725,3725, 3732,3735, 3737,3743, 3745,3747, 
    3749,3749, 3751,3751, 3754,3755, 3757,3758, 3760,3769, 
    3771,3773, 3776,3780, 3782,3782, 3784,3789, 3792,3801, 
    3804,3805, 3864,3865, 3872,3881, 3893,3893, 3895,3895, 
    3897,3897, 3902,3911, 3913,3945, 3953,3972, 3974,3979, 
    3984,3989, 3991,3991, 3993,4013, 4017,4023, 4025,4025, 
    4256,4293, 4304,4342, 4352,4441, 4447,4514, 4520,4601, 
    7680,7835, 7840,7929, 7936,7957, 7960,7965, 7968,8005, 
    8008,8013, 8016,8023, 8025,8025, 8027,8027, 8029,8029, 
    8031,8061, 8064,8116, 8118,8124, 8126,8126, 8130,8132, 
    8134,8140, 8144,8147, 8150,8155, 8160,8172, 8178,8180, 
    8182,8188, 8204,8207, 8234,8238, 8298,8303, 8319,8319, 
    8400,8412, 8417,8417, 8450,8450, 8455,8455, 8458,8467, 
    8469,8469, 8472,8477, 8484,8484, 8486,8486, 8488,8488, 
    8490,8497, 8499,8504, 8544,8578, 12293,12293, 12295,12295, 
    12321,12335, 12337,12341, 12353,12436, 12441,12446, 12449,12538, 
    12540,12542, 12549,12588, 12593,12686, 19968,40869, 44032,55203, 
    63744,64045, 64256,64262, 64275,64279, 64286,64296, 64298,64310, 
    64312,64316, 64318,64318, 64320,64321, 64323,64324, 64326,64433, 
    64467,64829, 64848,64911, 64914,64967, 65008,65019, 65056,65059, 
    65136,65470, 65474,65479, 65482,65487, 65490,65495, 65498,65500
  };


  public static boolean isNameC(char c)
  {
    return find(c, sNameC);
  }
  public static boolean isNameStart(char c)
  {
    return find(c, sNameStart);
  }

  // binary-search to find out if C is in one of the ranges in the
  //  map.  Remember that the map consists of pairs, not individuals.
  // If this turns into a horrible performance bottleneck, we could
  //  put the maps into a 64k byte array or as a compromise 2 * 8k bitmaps; the
  //  pair-array trick uses about 4k for both, at the cost of all this
  //  binary searching

  private static boolean find(char c, char[] map)
  {
    int high, low, probe;

    low = -1; high = map.length/2;

    while ((high - low) > 1)
    {
      // invariant (modulo division by 2):
      //  map[high] is strictly greater than c
      probe = (high + low) / 2;
      if (c < map[probe * 2])
        high = probe;
      else
        low = probe;
    }     
    return (low != -1 && c >= map[low*2] && c <= map[(low*2) + 1]);
  }
}
-------------- next part --------------


Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592
From andrewl at microsoft.com  Wed Sep  3 22:39:59 1997
From: andrewl at microsoft.com (Andrew Layman)
Date: Mon Jun  7 16:58:24 2004
Subject: Character classification
Message-ID: <7BB61B44F197D011892800805FD4F79201436095@RED-03-MSG.dns.microsoft.com>

JDK 1.1 is still broken for Unicode.  Take a look at the code in the
Microsoft XML Parser (http://www.microsoft.com/standards/xml) to see our
work-arounds.

--Andrew Layman
   AndrewL@microsoft.com

> -----Original Message-----
> From:	Tim Bray [SMTP:tbray@textuality.com]
> Sent:	Wednesday, September 03, 1997 12:51 PM
> To:	xml-dev@ic.ac.uk
> Subject:	Character classification
> 
> I've been working on making Lark really do Unicode.  JDK 1.1 is
> supposed
> to have, unlike 1.0, a usable input method; thus the problem is to
> check,
> when you're reading a GI or Attribute name, whether the characters are
> legal namestart/name characters.
> 
> It turns out to be quite a lot of work, so this is an offer to share.
> I wrote a program (based on Lark) that pulls the relevant character
> classes out of the XML spec, picks apart the markup, and writes
> another
> Java class that has some static arrays and offers two methods:
> 
> package textuality.lark;
> public class CharClasses
> {
>  public static boolean isNameC(char c)
>  public static boolean isNameStart(char c)
> }
> 
> It needs about 4k of tables (which it binary-searches); it might be
> faster
> with 128k of byte-addressable tables or 16K of bitmaps, neither of
> which
> would be hard to implement.
> 
> (a) is this a waste of time, i.e. are there Unicode library calls that
>     do it?
> (b) if not, has everyone else already done this?
> (c) if not, if I'm going to publish this, is the API above OK?
> 
> I've attached the current Java source file for those who find the 
> explanation above insufficiently clear.
> 
> Cheers, Tim Bray
> tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 <<
> File: CharClasses.java.txt >> 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From istvanc at microsoft.com  Thu Sep  4 00:24:57 1997
From: istvanc at microsoft.com (Istvan Cseri)
Date: Mon Jun  7 16:58:24 2004
Subject: Character classification
Message-ID: <91B7E292027DCF1195CD08002BB690B002457407@RED-93-MSG>

For better speed I would suggest an alternative solution: use a quick
array lookup for characters below 256 and go to the more expensive
method above... It will do wonders with your parser.

Istvan

> ----------
> From: 	Tim Bray[SMTP:tbray@textuality.com]
> Reply To: 	Tim Bray
> Sent: 	Wednesday, September 03, 1997 12:51 PM
> To: 	xml-dev@ic.ac.uk
> Subject: 	Character classification
> 
> <<File: CharClasses.java.txt>>
> I've been working on making Lark really do Unicode.  JDK 1.1 is
> supposed
> to have, unlike 1.0, a usable input method; thus the problem is to
> check,
> when you're reading a GI or Attribute name, whether the characters are
> legal namestart/name characters.
> 
> It turns out to be quite a lot of work, so this is an offer to share.
> I wrote a program (based on Lark) that pulls the relevant character
> classes out of the XML spec, picks apart the markup, and writes
> another
> Java class that has some static arrays and offers two methods:
> 
> package textuality.lark;
> public class CharClasses
> {
>  public static boolean isNameC(char c)
>  public static boolean isNameStart(char c)
> }
> 
> It needs about 4k of tables (which it binary-searches); it might be
> faster
> with 128k of byte-addressable tables or 16K of bitmaps, neither of
> which
> would be hard to implement.
> 
> (a) is this a waste of time, i.e. are there Unicode library calls that
>     do it?
> (b) if not, has everyone else already done this?
> (c) if not, if I'm going to publish this, is the API above OK?
> 
> I've attached the current Java source file for those who find the 
> explanation above insufficiently clear.
> 
> Cheers, Tim Bray
> tbray@textuality.com http://www.textuality.com/ +1-604-708-9592
> 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From colds at nwlink.com  Thu Sep  4 01:56:21 1997
From: colds at nwlink.com (Chris Olds)
Date: Mon Jun  7 16:58:24 2004
Subject: Character classification
References: <91B7E292027DCF1195CD08002BB690B002457407@RED-93-MSG>
Message-ID: <340DF8CE.FCD49856@nwlink.com>

How are people dealing with UTF-8 vs. unicode vs. Latin-1?  I have been
working on a lexer (using Flex) that assumes the input stream is either
Latin-1 or UTF-8 and returns byte strings to the caller.  Since Java
chars are Unicode, I assume that the Java XML parsers are doing the
opposite, right?  Is there any consensus on what form PCDATA or GI names
should take when they are returned to the application?  On a related
note, when do character entities get replaced - in the lexer or later
on?  My reading of the draft is that the scanner must do the replacement
if the examples of rescanning are to work.

	/cco

Chris Olds	colds@nwlink.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From istvanc at microsoft.com  Thu Sep  4 17:01:36 1997
From: istvanc at microsoft.com (Istvan Cseri)
Date: Mon Jun  7 16:58:24 2004
Subject: Character classification
Message-ID: <91B7E292027DCF1195CD08002BB690B00245740C@RED-93-MSG>

The Java parser is using Java char-s and Strings for storage so it is
using Unicode. The GI-s are actually 'atomized' for memory savings and
returned that way. PCDATA is stored in String chunks. The entities are
preserved in special nodes but can be made transparent to the reader
(user) of the parsed tree.

Istvan

> ----------
> From: 	Chris Olds[SMTP:colds@nwlink.com]
> Reply To: 	Chris Olds
> Sent: 	Wednesday, September 03, 1997 4:54 PM
> To: 	xml-dev@ic.ac.uk
> Cc: 	'Tim Bray'; Istvan Cseri
> Subject: 	Re: Character classification
> 
> How are people dealing with UTF-8 vs. unicode vs. Latin-1?  I have
> been
> working on a lexer (using Flex) that assumes the input stream is
> either
> Latin-1 or UTF-8 and returns byte strings to the caller.  Since Java
> chars are Unicode, I assume that the Java XML parsers are doing the
> opposite, right?  Is there any consensus on what form PCDATA or GI
> names
> should take when they are returned to the application?  On a related
> note, when do character entities get replaced - in the lexer or later
> on?  My reading of the draft is that the scanner must do the
> replacement
> if the examples of rescanning are to work.
> 
> 	/cco
> 
> Chris Olds	colds@nwlink.com
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Thu Sep  4 17:33:34 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:24 2004
Subject: Character classification
Message-ID: <3.0.32.19970904083026.008f1a90@pop.intergate.bc.ca>

At 04:54 PM 03/09/97 -0700, Chris Olds wrote:
> Is there any consensus on what form PCDATA or GI names
>should take when they are returned to the application?  On a related
>note, when do character entities get replaced - in the lexer or later
>on?  My reading of the draft is that the scanner must do the replacement
>if the examples of rescanning are to work.

Like Istvan says, Java chars and Strings.  However, you have to do 
lazy evaluation; if you foolishly make every little chunk of text you
read into a String, you'll spend all your time in the Java String class
implementation, and none doing useful work.

Character entitities have to be replaced in two places, when you find
them in an entity definition and when you find them in free text or
an attribute value. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tfj at apusapus.demon.co.uk  Thu Sep  4 17:54:06 1997
From: tfj at apusapus.demon.co.uk (Trevor Jenkins)
Date: Mon Jun  7 16:58:24 2004
Subject: Parameter Entity Reference Considered Harmful
In-Reply-To: <199709031539.tfj.2212@apusapus.demon.co.uk>
Message-ID: <199709040153.tfj.2217@apusapus.demon.co.uk>

> In making one more pass through the official grammar for XML, before 
> I despatch my alternative version to the 5 people who've requested 
> copies, I spotted a real dumb error in the doctype declaration.

Since posting this I did something even more useful. :-) I went back 
to ISO 8879 in which, of course, the use of paremeter entity 
references is allowed in both the "internal" and "external" subsets.

As a programmer I reckon that the different handling of parameter 
entities between the internal and external subsets makes things MORE 
complicated rather than simpler.

I knew there was something wrong with the avowed claim that XML was a
subset of SGML. If they're to be allowed (and they are allowed in
some odd places in XML) then let them occur wherever SGML lets them
occur. 

Regards, Trevor.

--

"Real Men don't Read Instruction Manuals"
   Tim Allen, Home Improvement

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Patrice.Bonhomme at loria.fr  Thu Sep  4 21:38:05 1997
From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme)
Date: Mon Jun  7 16:58:24 2004
Subject: parameter entity with msxml
Message-ID: <199709041935.VAA00751@chimay.loria.fr>


Hi there,

I am using msxml to read/parse my XML documents within a Java application. I 
wondered if msxml takes care of parameter entity. This example doesn't work :

<?XML version="1.0" encoding="UTF-8" ?>
<!DOCTYPE tei.2 [

<!ENTITY % ISOtech SYSTEM "/users/autoinfo/bonhomme/XML/ISOtech.pen" >
<!ENTITY % ISOlat1 SYSTEM "/users/autoinfo/bonhomme/XML/ISOlat1.pen" >
<!ENTITY % ISOlat2 SYSTEM "/users/autoinfo/bonhomme/XML/ISOlat2.pen" >
<!ENTITY % ISOgrk1 SYSTEM "/users/autoinfo/bonhomme/XML/ISOgrk1.pen" >
<!ENTITY % ISOgrk2 SYSTEM "/users/autoinfo/bonhomme/XML/ISOgrk2.pen" >
<!ENTITY % ISOgrk3 SYSTEM "/users/autoinfo/bonhomme/XML/ISOgrk3.pen" >
<!ENTITY % ISOgrk4 SYSTEM "/users/autoinfo/bonhomme/XML/ISOgrk4.pen" >

%ISOtech;
%ISOlat1;
%ISOlat2;
%ISOgrk1;
%ISOgrk2;
%ISOgrk3;
%ISOgrk4;
]>
<tei.2>
<p>
<s>J'ai d&eacute;cide d'&eacute;crire un livre sur l'Espace et le Temps 
&agrave; l'intention du grand public apr&egrave;s les conf&eacute;rences Loeb 
que j'ai donn&eacute;es &agrave; Harvard en 1982.</s></p>
</tei.2>

I've got this exception :
Error: test-ent.xml(21,17)
Context:  - <null> - <TEI.2> - <P> - <S>
com.ms.xml.ParseException: Missing entity eacute
        at com.ms.xml.Parser.error(Parser.java:110)
        at com.ms.xml.Parser.scanEntityRef(Parser.java:440)
        at com.ms.xml.Parser.scanText(Parser.java:395)
        at com.ms.xml.Parser.parseText(Parser.java:1223)
        at com.ms.xml.Parser.parseElement(Parser.java:1081)
        at com.ms.xml.Parser.parseDocument(Parser.java:643)
        at com.ms.xml.Parser.parse(Parser.java:47)
        at com.ms.xml.Document.load(Document.java:171)
        at msxml.main(msxml.java:50)

Thanks for any help...

Pat.
  ==============================================================
  bonhomme@loria.fr               |      Office : B.228
  http://www.loria.fr/~bonhomme   |      Phone  : 03 83 59 20 37
  --------------------------------------------------------------
    * Projet Aquarelle    
          http://aqua.inria.fr
    * Serveur Silfide     
          http://www.loria.fr/Projet/Silfide
    * Multilingual Concordancing
          http://www.loria.fr/~bonhomme/lingua/
  ==============================================================


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Fri Sep  5 01:40:56 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:24 2004
Subject: Lark 0.91 available
Message-ID: <3.0.32.19970904163748.00838210@pop.intergate.bc.ca>

Hi - Lark 0.91 is now available at 
 http://www.textuality.com/Lark

Only one real difference - it now does Unicode.  It reads the BOM and thus
UCS-2/UTF-16 (even byte-swaps); if there's no BOM, reads and tries to 
use the encoding declaration, boots it if it says anything but "UTF-8" or 
"UTF8".  Successfully parses Murata-san's translation of the XML
spec, would love to get my hands on some more internationalized
XML; in particular with non-ASCII markup.

Another 6K of .class files for I18n, sigh.

Lots of bug-fixes in the event-stream module.  I had to write a 
significant event-stream Lark application to pull the character classes
out of the XML spec in order to build the CharClasses.java file, and
ran across a few bodacious bugs in end-tag handling.

It's a bit bogus because it really doesn't do UTF-8 yet, just ASCII
masquerading as such.  UTF-8 Real Soon Now.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jjc at jclark.com  Fri Sep  5 07:10:54 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun  7 16:58:24 2004
Subject: Character classification
References: <91B7E292027DCF1195CD08002BB690B002457407@RED-93-MSG>
Message-ID: <340F92EB.E84772CB@jclark.com>

Istvan Cseri wrote:
> 
> For better speed I would suggest an alternative solution: use a quick
> array lookup for characters below 256 and go to the more expensive
> method above... It will do wonders with your parser.

Except of course when you're parsing non-Latin scripts.

There's another technique which exploits the fact that characters on the
same page often have similar properties, and this is true even more so
for characters in the same column.

The idea is to have a three-level table, the first level with 256
entries, the second and third levels with 16 entries.  The entries for
the first and second levels are a (possibly null) pointer to a sub-table
plus a value; the entries for the third level are just values. To look
up the value for a character, you use the high 8 bits to index into the
first-level table; if the pointer part of the entry is null, then return
the value part of entry; otherwise use the sub-table table addressed by
the pointer; use the next 4 bits to index into that in a similar way,
and, if necessary, the bottom 4 bits to index into the bottom table.

This is I believe quite a well-known technique; I got it from Glenn
Adams.

You can use this to implement case-folding by storing the difference
between a character and its upper-case equivalent modulo 2^16.

There's a C++ implementation of this in SP in include/CharMap.h.

James


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From crism at ora.com  Fri Sep  5 16:15:09 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun  7 16:58:25 2004
Subject: DSSSL Digest now publicly available
In-Reply-To: <998.199709051356@grogan.cogsci.ed.ac.uk> (ht@cogsci.ed.ac.uk)
Message-ID: <199709051417.KAA01715@geode.ora.com>

The announcement of the DSSSL Digest (or reference) at
<URL:ftp://ftp.ornl.gov/pub/sgml/WG8/DSSSL/digest.htm> sparked me to
get around to announcing my SGML reference on comp.text.sgml.  For
those of you who don't read c.t.s, it's at
<URL:http://www.oreilly.com/people/staff/crism/sgmldefs.html>.  I find
it very useful day-to-day, especially when checking that XML remains
valid HTML.

As posted to c.t.s, this information is copyright ISO, and is intended
only as a supplement to ISO 8879.  (You won't find it very useful
without the accompanying text anyhoo.)

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From istvanc at microsoft.com  Fri Sep  5 18:02:59 1997
From: istvanc at microsoft.com (Istvan Cseri)
Date: Mon Jun  7 16:58:25 2004
Subject: Character classification
Message-ID: <91B7E292027DCF1195CD08002BB690B00245740F@RED-93-MSG>

You are right, it is a well known technique, Java JDK1.1 in fact uses
very similar code for character classification. I replaced that with the
simple 256 member array lookup (for characters in that range) and it
sped up the parser ~10%.

Istvan

> ----------
> From: 	James Clark[SMTP:jjc@jclark.com]
> Reply To: 	James Clark
> Sent: 	Thursday, September 04, 1997 10:04 PM
> To: 	xml-dev@ic.ac.uk
> Subject: 	Re: Character classification
> 
> Istvan Cseri wrote:
> > 
> > For better speed I would suggest an alternative solution: use a
> quick
> > array lookup for characters below 256 and go to the more expensive
> > method above... It will do wonders with your parser.
> 
> Except of course when you're parsing non-Latin scripts.
> 
> There's another technique which exploits the fact that characters on
> the
> same page often have similar properties, and this is true even more so
> for characters in the same column.
> 
> The idea is to have a three-level table, the first level with 256
> entries, the second and third levels with 16 entries.  The entries for
> the first and second levels are a (possibly null) pointer to a
> sub-table
> plus a value; the entries for the third level are just values. To look
> up the value for a character, you use the high 8 bits to index into
> the
> first-level table; if the pointer part of the entry is null, then
> return
> the value part of entry; otherwise use the sub-table table addressed
> by
> the pointer; use the next 4 bits to index into that in a similar way,
> and, if necessary, the bottom 4 bits to index into the bottom table.
> 
> This is I believe quite a well-known technique; I got it from Glenn
> Adams.
> 
> You can use this to implement case-folding by storing the difference
> between a character and its upper-case equivalent modulo 2^16.
> 
> There's a C++ implementation of this in SP in include/CharMap.h.
> 
> James
> 
> 
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
> 

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Fri Sep  5 18:47:47 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:25 2004
Subject: Character classification
Message-ID: <3.0.32.19970905094447.008fcad0@pop.intergate.bc.ca>

At 12:04 PM 05/09/97 +0700, James Clark wrote:
>There's another technique 

Of course, then there's the space/time trade-off.  In particular, in XML,
the proportion of times when you're going to change parsing state based
on whether something's a NameChar/NameStart is not that high; so how much
table space & traversal code is it worth investing in speeding up that case?
Maybe a lot, maybe not.

What we need is a truly good profiler.  Anyone with a good Java profiler 
experience to share?  I speeded Lark up substantially with 0.91, just by 
code-walking and guessing.  This is not the right way to do it. -T.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Support at EpiphanySoftware.com  Fri Sep  5 20:37:52 1997
From: Support at EpiphanySoftware.com (Andrew Cogan)
Date: Mon Jun  7 16:58:25 2004
Subject: Resolving links
Message-ID: <341050F4.2932184C@EpiphanySoftware.com>

Introductory apology: I'm a newcomer to XML, so forgive me if this topic
has already been covered.

Does/will XML include a way to resolve links by using a mapping table
external to the originating document, or alternatively by calling a
process?

In this scenario, there would either a "library manager" process, or a
registry file containing a list of symbolic document names along with
their physical locations. This would enable a link in document "A" to
refer to document "B" without concern for whether document B's location
is on a CD-ROM, a hard disk, or the Web. It would also allow document
B's location to change over time without invalidating the link in
document A.

-- 
 Andy Cogan
 Epiphany Software

E-mail: andrew@EpiphanySoftware.com
Voice: (408) 378-6145 
Web: http://www.EpiphanySoftware.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Fri Sep  5 21:30:10 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:25 2004
Subject: SGML/XML Developer's Group, Research Triangle area, North
  Carolina
Message-ID: <1.5.4.32.19970905192951.009cdb5c@pop.mindspring.com>

I am interested in starting a user's group for people developing SGML
and XML applications in the Research Triangle area of North Carolina.
I would like this group to be oriented towards developers rather than
end users.  The goal of the group would be to learn from each other
about the emerging XML-based standards and APIs, new design techniques using 
architectural forms and components, tools, discuss various program 
architectures and document designs, and to get to know the
other people who are working on SGML and XML projects in our local
area.

If anybody would be interested in such a group, please contact me via
email.

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Fri Sep  5 22:51:40 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:25 2004
Subject: SGML/XML Developer's Group, Research Triangle area, North
  Carolina
Message-ID: <1.5.4.32.19970905205128.009fc340@pop.mindspring.com>

Very interesting - I'll contact you in private email. I'd be interested in
having a presentation on XML/EDI, and a focus group could develop out of that.

Let's do the rest offline (and over beer!)

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From zwang at pstat.ucsb.edu  Fri Sep  5 23:07:54 1997
From: zwang at pstat.ucsb.edu (Zheng Wang)
Date: Mon Jun  7 16:58:25 2004
Subject: Developer's Group
Message-ID: <Pine.GSO.3.95.970905140016.3674A-100000@fisher>

I have followed the discussion for a period time. I think it is time to
summarize the disscussion up to now and let the developers begin to work.
This group will be much help in this aspect.

Zheng Wang
Department of Statistics and Applied Probability 
University of California, Santa Barbara
E-mail: zwang@pstat.ucsb.edu; http://www.pstat.ucsb.edu/~zwang


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eliot at isogen.com  Sat Sep  6 01:00:55 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:25 2004
Subject: Resolving links
Message-ID: <3.0.32.19970905175827.00a1dbf4@swbell.net>

At 11:35 AM 9/5/97 -0700, Andrew Cogan wrote:
>Introductory apology: I'm a newcomer to XML, so forgive me if this topic
>has already been covered.
>
>Does/will XML include a way to resolve links by using a mapping table
>external to the originating document, or alternatively by calling a
>process?
>
>In this scenario, there would either a "library manager" process, or a
>registry file containing a list of symbolic document names along with
>their physical locations. This would enable a link in document "A" to
>refer to document "B" without concern for whether document B's location
>is on a CD-ROM, a hard disk, or the Web. It would also allow document
>B's location to change over time without invalidating the link in
>document A.

This problem, that of changes in the resource pointed to requiring changes
in the documents that point to it, is one of the fundamental weaknesses of
URLs as a form of address.  You cannot have "industrial strength"
addressing without some form of indirection that lets you isolate
references in A from changes in B.  SGML provides one fundamental form of
indirect address, the entity reference, which when used with public IDs
(rather that system IDs) protects the entity declarations from changes in
the system identifiers of storage objects.  However, entities alone cannot
protect you from changes inside a storage object, so you must have some way
to indirecting references to objects inside storage objects.

The current XML Link spec does not allow entity references as a form of
resource address.  It also does not provide any other form of indirection.
However, you're not limited to using only XML Link with XML documents--you
can use anything you want, including normal SGML mechanisms and other
public addressing architectures, such as the TEI and the HyTime architecture.

Here's how you do entity-based indirection:

<?XML Version=1.0?>
<!DOCTYPE MyDoc [
  <!NOTATION XML "-//W3C//NOTATION eXtensible Markup Language//EN" >
  <!ENTITY YourDoc PUBLIC "-//You//DOCUMENT Your Document//EN" 
           CDATA XML >
  <!ELEMENT Link EMPTY >
  <!ATTLIST Link
     resource   ENTITY #REQUIRED
  >
]>
<MyDoc>
 <link resource="YourDoc"/><!-- NOTE: this isn't legal XML syntax today -->
</MyDoc>

Somewhere else, you'd have a mapping for the public ID to the system ID:

-- SGML Open catalog --
PUBLIC "-//You//DOCUMENT Your Document//EN" 
       "/home/you/docs/mydoc.xml"
-- End of catalog --

You could imagine a service analogous to DNS that would resolve public IDs
to storage IDs (or rather, would resolve owner IDs to public ID servers,
that is "-//You" would be associated with your public ID server, which then
takes the rest of the public ID and resolves it to a storage object).

XML Lang, of course, does allow you to declare ENTITY attributes, as I've
done above, it's just that XML Link does associate any particular semantic
with ENTITY attributes.  So you can do the above, but you can't depend on
systems that only support XML Lang and XML Link to help you (but any
existing SGML system should handle the above).

Both the TEI spec and the HyTime architecture provide indirect addresses
that you can use to isolate a reference from the ultimate location of the
target.  For example, using HyTime indirect addresses, you could have a
separate document that provided the mapping of persistent object names to
URLs for those objects:

<?XML Version=1.0?>
<!-- URL of this document is "http://www.me.com/docs/urlmap.xml" -->
<!DOCTYPE URL.map.Table [
 <?IS10744 ArcBase HyTime ?>
 <!ELEMENT URL.map.Table (URLloc+) >
 <!NOTATION URL PUBLIC "-//IETF//NOTATION Uniform Resource Locator//EN" >
 <!ELEMENT URLloc  (#PCDATA) > <!-- Content is a URL -->
 <!ATTLIST URLloc 
    ID     ID   #REQUIRED
    HyTime NAME #FIXED "queryloc"
    notation NOTATION (url) #FIXED "url"
 >
]>
<URL.map.Table>
<urlloc id="my.document.1">http://www.me.com/docs/mydoc1.xml</urlloc>
<urlloc id="my.document.2">http://www.me.com/docs/mydoc2.xml</urlloc>
</URL.map.Table>

You could then use the mapping by making references to the URLloc elements:

<?XML Version=1.0?>
<!DOCTYPE MyDoc [
 <?IS10744 ArcBase HyTime ?>
 <!NOTATION URL PUBLIC "-//IETF//NOTATION Uniform Resource Locator//EN" >
 <!ELEMENT Link (#PCDATA) >
 <!ATTLIST Link
     href  CDATA #REQUIRED 
     HyTime NAME #FIXED "clink"
     loctype CDATA #FIXED "href queryloc URL"
     HyNames CDATA #FIXED "linkend href"
 >
]>
<Mydoc>
<link href="http://www.me.com/docs/urlmap.xml#my.document.1">Click here</link>
</MyDoc>

The HREF in my document points to a URLloc in the URL map document, which
then gets us to the real URL, which may change at any time.

One advantage of the entity approach is that you can use different catalogs
without changing any of the documents involved (because the entity
declaration and public ID provide an additional level of indirection, which
is outside of any documents, namely in the public ID mapping catalog).

As the XML Link specification is not yet finalized, its possible that we
may include a way to address entities as resources of links and do indirect
addressing.

It should be clear from the above that the mechanism at work is pretty
simple: given a two part address (storage object and ID within that
object), use it to look up the next stage in the address (i.e., the URL in
the content of the URLloc elements).  That's all there is to it, and the
above is 100% HyTime conforming (and if you implemented the above, you
could call your system a conforming HyTime application).

Cheers,

Eliot

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Dick_Hardt at ActiveState.com  Sat Sep  6 02:16:03 1997
From: Dick_Hardt at ActiveState.com (Dick Hardt)
Date: Mon Jun  7 16:58:25 2004
Subject: Perl utilities for XML
Message-ID: <3.0.1.32.19970905153037.016c4fac@pop3.activestate.com>

Hello all,

I searched the archive and it looks like there is some Perl development re:
XML but I have not seen anything specific. Does anyone have anything or
interested in a Perl module for XML?

-- Dick

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From matt at wdi.disney.com  Sat Sep  6 02:25:03 1997
From: matt at wdi.disney.com (Matthew Fuchs)
Date: Mon Jun  7 16:58:25 2004
Subject: Perl utilities for XML
In-Reply-To: Dick Hardt <Dick_Hardt@activestate.com>
        "Perl utilities for XML" (Sep  5,  3:30pm)
References: <3.0.1.32.19970905153037.016c4fac@pop3.activestate.com>
Message-ID: <9709051727.ZM4032@scrumpox.rd.wdi.disney.com>

I've been hacking away desperately, but I'm not sure if I can put anything in
the public domain.  I'll have to check and see.

Matthew

On Sep 5,  3:30pm, Dick Hardt wrote:
> Subject: Perl utilities for XML
> Hello all,
>
> I searched the archive and it looks like there is some Perl development re:
> XML but I have not seen anything specific. Does anyone have anything or
> interested in a Perl module for XML?
>
> -- Dick
>
> xml-dev: A list for W3C XML Developers
> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
> To unsubscribe, send to majordomo@ic.ac.uk the following message;
> unsubscribe xml-dev
> List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
>
>-- End of excerpt from Dick Hardt


-- 
-----------------------------------------------------
Matthew Fuchs
matt@wdi.disney.com
http://cs.nyu.edu/phd_students/fuchs
-----------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Sat Sep  6 23:10:00 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:25 2004
Subject: On Case and Performance
Message-ID: <3.0.32.19970906140647.008f1830@pop.intergate.bc.ca>

Recently, I whined:

>What we need is a truly good profiler.  Anyone with a good Java profiler 
>experience to share?  I speeded Lark up substantially with 0.91, just by 
>code-walking and guessing.  This is not the right way to do it. -T.

Disgusted with myself, I went and found the Java Workshop Beta from
java.sun.com, downloaded it (16M!) and ran its profiler.  Well well, 
surprise, Lark was spending 91% of its time in this little routine
that looks up a GI to see if we've seen it before.  And in that 
routine, it was spending most of its time in Character.toUpperCase.

Ouch. The code used to be:
 for (i = 0; i < name.length; i++)
   name[i] = sToUpper[name[i]];

Now it says

 for (i = 0; i < name.length; i++)
   if (name[i] < 127)
     name[i] = sToUpper[name[i]]; // 127-entry upcasing table
   else 
     name[i] = Character.toUpperCase(name[i]);

Note that toUpperCase is called only in the case when non-ASCII 
characters show up in GI/Attribute/Entity names.

Resulting performance improvement in Lark, in processing the XML spec:
a factor of 11.9.

The Sun profiler is not quite as slick as gprof of yore, but it's
not bad at all. -T.


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Sat Sep  6 23:36:50 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:25 2004
Subject: On Case and Performance
In-Reply-To: <3.0.32.19970906140647.008f1830@pop.intergate.bc.ca> (message from Tim Bray on Sat, 06 Sep 1997 14:06:53 -0700)
Message-ID: <199709062134.OAA03801@boethius.eng.sun.com>

[Tim Bray:]

| Resulting performance improvement in Lark, in processing the XML spec:
| a factor of 11.9.

Now you've made me too curious to resist asking.  What's the
performance difference if you just compare codes directly and don't
bother with case folding?

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Sat Sep  6 23:53:35 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:25 2004
Subject: On Case and Performance
Message-ID: <3.0.32.19970906145033.008f17e0@pop.intergate.bc.ca>

At 02:34 PM 06/09/97 -0700, Jon Bosak wrote:
>| Resulting performance improvement in Lark, in processing the XML spec:
>| a factor of 11.9.
>
>Now you've made me too curious to resist asking.  What's the
>performance difference if you just compare codes directly and don't
>bother with case folding?

To test that I'd have to go regularize the case of all the tags in the 
XML spec which <subtext>seems like an unreasonable amount of work</subtext>.  
Anyhow, the routine that checks whether we've seen a GI (where this stuff is)
is taking 8.7% of the total time.  So the gain from skipping the monocasing
entirely is not going to be dramatic.  In fact, it's now spending more
time in BufferedInputStream.read() (oh for a good old-fashioned getc()
macro). -Tim

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From phxsoft at ibm.net  Sun Sep  7 05:01:05 1997
From: phxsoft at ibm.net (phxsoft@ibm.net)
Date: Mon Jun  7 16:58:25 2004
Subject: unsubscribe phxsoft@ibm.net
Message-ID: <9709070306.AA0224@slip166-72-179-153.or.us.ibm.net>


 unsubscribe phxsoft@ibm.net

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tikvas at agentsoft.com  Sun Sep  7 12:55:17 1997
From: tikvas at agentsoft.com (Tikva Schmidt)
Date: Mon Jun  7 16:58:25 2004
Subject: Examples for new XML demo
Message-ID: <34127982.3142@agentsoft.com>

AgentSoft Ltd. is about to put out a demo XML application in the
next week or two.

     What we are missing is real valid xml content (dtd & xml files)
that we can use with our demo.

     If you have an idea of examples I can use please let me know.

     I'll let you know when you can actually see the demo.
        
        Thanks.
 
          Tikva Schmidt.

--------------------------------------------------------------------
Tikva Schmidt.
email: tikvas@agentsoft.co.il
corp:  Agentsoft Ltd.     http://www.agentsoft.co.il
Phone: 972-2-6480573
---------------------------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Mon Sep  8 17:11:02 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:25 2004
Subject: Examples for new XML demo
In-Reply-To: <34127982.3142@agentsoft.com> (tikvas@agentsoft.com)
Message-ID: <199709081508.IAA04251@boethius.eng.sun.com>

[Tikva Schmidt:]

|      What we are missing is real valid xml content (dtd & xml files)
| that we can use with our demo.
| 
|      If you have an idea of examples I can use please let me know.

These chestnuts are still available:

http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/religion.1.01.xml.zip
http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/shakespeare.1.01.xml.zip

The collections aren't complex enough to test parser features against
(there are no attributes and no empty elements), but what they lack in
complexity they make up for in size, so they're good for certain kinds
of benchmarking.

Jon

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dcarlson at ontogenics.com  Mon Sep  8 21:31:16 1997
From: dcarlson at ontogenics.com (Dave Carlson)
Date: Mon Jun  7 16:58:25 2004
Subject: parsing xml-data schemas
Message-ID: <2.2.32.19970908192629.00bd9900@pop.dimensional.com>

I'm new to XML, but have been going through all the specs, papers, and old
mail list archives that I can find.  I am especially interested in the
metadata proposals, which seem to be centered around MCF and XML-Data.
Apparently, these are being combined into the Reference Data Framework, and
a secret meeting was held in Redmond 2 weeks ago.  Well, at least secret to
those of us who are not allowed access to the W3C working group :-(   So,
I'm left to guess.

I want to start building a prototype using XML-Data, and probably
Microsoft's XML Java parser.  Am I wasting my time building something to
this spec, or is the current RDF completely different? 

Assuming that it's useful to proceed... I've created a small schema
according to xml-data and successfully parsed it using the DTD from Appendix
A.  My question (finally) is how can I use this schema to validate an XML
file that conforms to it?  After parsing the schema, should I convert it
within the XML processor to DTD objects, then proceed _as_if_ the schema
really originated from a DTD file?

It seems wrong to duplicate the existing validation logic that works for
DTD's and create another for schemas.  This is probably part of the argument
in support of those who say that the xml-data schema is a bad idea, and we
should write all schemas directly  in DTD syntax.  However, coming from an
artificial intelligence background, the idea of a metadata representation
language appeals to me.

Thanx for any thoughts or advice!
  Dave Carlson
  Ontogenics Corp.
  Boulder, CO


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tbray at textuality.com  Tue Sep  9 08:28:23 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:25 2004
Subject: Lark 0.92 available
Message-ID: <3.0.32.19970908232250.007a8ad0@pop.intergate.bc.ca>

Hi - Lark 0.92 is now available at 
 http://www.textuality.com/Lark

Pardon the quick releases, but thanks to Sun's JWS profiler, Lark 0.92
is now 11.9 times faster than 0.91.  Secondly, the accompanying "xh"
application, which formats the XML spec and related documents (including
what you get at the URL above) has been upgraded so that it now can
process the Japanese version of the XML specification and produce beautiful 
UCS-2 Japanese HTML output.  (Go to www.bitstream.com and download their
Cyberbit font if you want to see some damn nice-looking stuff on your 
screen - Netscape can do it, but be warned that Communicator 4 + Cyberbit 
between them will use all your memory, no matter how much you have).

When you can have a few tens of K of code do this kind of transformation 
on two violently different character sets, it bespeaks, I think, a couple 
of standards (Java and XML) in pleasing harmony.

The process of getting the Japanese formatting working would have been
completely impossible without all sorts of support and question-answering
and double-checking and pointing-to-useful-resources from Murata Makoto
of FXIS; many thanks to him.

Cheers, Tim Bray
tbray@textuality.com http://www.textuality.com/ +1-604-708-9592

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ht at cogsci.ed.ac.uk  Tue Sep  9 10:23:30 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun  7 16:58:25 2004
Subject: parsing xml-data schemas
In-Reply-To: Dave Carlson's message of Mon, 08 Sep 1997 13:26:29 -0600
References: <2.2.32.19970908192629.00bd9900@pop.dimensional.com>
Message-ID: <1683.199709090823@grogan.cogsci.ed.ac.uk>

For what it's worth, my view is that translation into vanilla DTD IS
the right way to go in the short term, if for no other reason than to
forestall rapid incompatible divergence of schema DTDs and semantics.
If and when we get to the point of standardising a schema DTD, then
direct implementation makes sense.

ht

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Thu Sep 11 08:59:57 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:25 2004
Subject: XML-DEV Jewels
Message-ID: <9906@ursus.demon.co.uk>

XML-DEV has been active for about 7 months and generated around 1000 postings.
This information is searchable thanks to Henry Rzepa. However there are some
postings which I feel are of lasting value and are not easy to locate by 
keywords and other places where the thread has been useful (and perhaps
re-usable by newcomers to XML). I have therefore created a page of links to
the archived postings which is at:

http://www.vsms.nottingham.ac.uk/vsms/xml/jewels.html

This does NOT attempt to duplicate the other XML resources such as the FAQ
and Robin Cover's comprehensive analysis. If you fail there and on the keyword 
search it may then be worth browsing this list.

Enjoy..

	P.
 
-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Sat Sep 13 07:40:16 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:26 2004
Subject: Recent XML WG decisions
Message-ID: <199709130538.WAA07282@boethius.eng.sun.com>

While it is not our usual policy to post decisions of the XML Working
Group to xml-dev, the last three WG meetings have seen a number of
issues decided that bear directly on current experimental XML
implementations.  Following are reports prepared by C. M.
Sperberg-McQueen and Tim Bray detailing recent decisions that will be
incorporated into the next working draft.

Jon

----------------------------------------------------------------------
 Jon Bosak, Online Information Technology Architect, Sun Microsystems
     901 San Antonio Road, MPK17-101, Palo Alto, California 94303
----------------------------------------------------------------------
  ISO/IEC JTC1/SC18/WG8::NCITS V1::Davenport::SGML Open::W3C XML WG    
            It is earlier than we think. -- Vannevar Bush
----------------------------------------------------------------------

 From: "C. M. Sperberg-McQueen" <cmsmcq@hd.uib.no>
 Subject: XML WG decisions of 27 August 1997

 The XML Work Group discussed the following questions, and made the
 decisions indicated, in the meeting of 27 August 1997.

 Present:  Jon Bosak, James Clark, Steve DeRose, Eliot Kimber,
 Eve Maler, Makoto Murata, Peter Sharpe, C. M. Sperberg-McQueen.

 1.  A decision on case folding was postponed.

 Background: The current draft XML spec requires that most names
 (i.e. generic identifiers, attribute names, IDs, IDREFs, name tokens
 in attribute values PI targets, notation names, and document type
 names) be case-folded, while entity names are case sensitive.  It has
 been repeatedly urged that this be changed and that all names be
 case-sensitive.  The arguments are familiar:

 For case folding: since the reference concrete syntax requires case
 folding, many current users of SGML and HTML are familiar with and
 have come to expect this behavior.

 For case sensitivity: since SGML parsers are required to fold up,
 rather than down, the XML spec is inconsistent with recommended
 Unicode practice.  (Unicode recommends folding down rather than up
 since there are slightly fewer unpleasant surprises and
 inconsistencies that way.)  There is *no* rule for case folding which
 works in the culturally expected manner for all speakers of all
 alphabetic languages: a lower-case e with acute accent is (correctly)
 uppercased one way in Quebec and a different way in metropolitan
 France.  Lowercase I (with a dot) is uppercased one way in Turkish and
 another way in other languages using the Latin alphabet.

 A strong majority of those participating felt that we should make XML
 case sensitive and drop case folding, but in view of the sensitive
 nature of the decision, it was decided to postpone the decision until
 a larger fraction of the work group was present.


 2.  XML characters range from #x0 to #x10FFFF.

 Decision: Legal XML characters are those representable in UTF-16 /
 Unicode 2.0, i.e. those in the first seventeen planes of ISO/IEC 10646.
 Unanimous.

 Rationale: The current spec says that XML characters may include any
 character defined by ISO/IEC 10646.  Currently, that standard defines
 characters only within the Basic Multilingual Plane, each of which can
 be represented by a string of 16 bits; in principle, however, ISO/IEC
 10646 defines a 31-bit character space, and production 2 accordingly
 defines Character as covering the range #x0 to #x7FFFFFFF, with some
 gaps for forbidden characters.

 XML processors, however, are not required to support the flat 32-bit
 character encoding UCS-4, only the 16- and 8-bit encodings of UCS-2
 and UTF-8.  (The latter can represent all the characters of the 31-bit
 character space, but UCS-2 cannot.)  In many places, the XML spec
 suggests, or at least allows incautious readers to believe, that XML
 characters are only 16 bits wide.

 Either way, it's important to eliminate the ambiguity in the spec.

 In favor of restricting XML characters to 16 bits: it simplifies life
 for users of Java and other tools.  It seems clear that the full 31-bit
 space of 10646 will not be needed, even for extremely specialized
 applications, in the foreseeable future.

 In favor of defining XML characters to be 31 bits wide: 16 bits is
 manifestly too few for anyone working with historical texts in Han
 characters.  Politically, it would be unwise to give the impression
 that only the Basic Multilingual Plane is of importance.  The
 surrogate method, while clever, is clearly a hack which demonstrates
 that the original Unicode claim (16 bits is enough to build an
 absolutely flat character space which will last for all time) has
 fallen apart under the pressure of fact; the surrogate method
 abandons the flat character space which is one of the most important
 advantages of Unicode.

 The compromise (BMP plus the next 16 planes) appears
   - well understood
   - compatible with Java and other tools which assume 16-bit characters
   - sufficient for realistic expectations (even the most extensive of
 known collections of historical Chinese characters is unlikely to take
 much more than one of the additional planes; even the user area is
 sufficiently large, with 131,072 character positions)


 3.  Processors must support UTF-16, not just UCS-2.

 Background: the current draft spec says (4.3.3): "All XML processors
 must be able to read entities in either UTF-8 or UCS-2."  It has been
 proposed to change this to require support for UTF-8 and UTF-16 (which
 is UCS-2 plus support for the surrogate-character mechanism by which
 characters outside the Basic Multilingual Plane may be encoded).

 Decision: (i) XML processors must support 16-bit data streams (i.e.
 UTF-16) for input.  (ii) They must not corrupt surrogate characters.
 (iii) If the processor uses a 16-bit buffer or a 16-bit interface to
 the downstream application, it must correctly represent numeric
 character references to non-BMP characters as pairs of surrogate
 characters.  Unanimous.

 Rationale: since all name characters in XML are in the Basic
 Multilingual Plane, characters outside the BMP can only appear in
 XML documents as data.  Since an XML processor is required to do
 nothing more to data than store it and pass it to the downstream
 application without corrupting it, no special handling is required for
 surrogate characters.  The only new requirement is that processors
 understand the surrogate-character mechanism for characters outside
 the BMP, and use it, when necessary, to handle numeric character
 references correctly.


 4.  XML will refer to Unicode 2.0 and ISO/IEC 10646 with Am. 1-7.

 The current draft spec refers to Unicode 2.0 and ISO/IEC 10646 with
 Amendments 1 through 5.  It has been suggested (a) that XML should refer
 *only* to Unicode, and (b) that the reference should be to "the current
 version" of Unicode, so that as Unicode is revised, XML automatically
 accepts the revisions.

 Decision:  refer to 10646 with Amendments 1 through 7, but otherwise
 retain the current reference.  I.e. do not drop the reference to
 ISO/IEC 10646, and do not phrase the reference so as to incorporate
 changes to Unicode automatically.  Unanimous.

 Rationale: the agreement between ISO/IEC JTC1/SC2 and the Unicode
 Consortium to keep Unicode and 10646 synchronized is extremely
 important to all users.  A joint reference to both standards makes
 clear to both parties that we, as users, wish them to honor that
 agreement.  A reference solely to Unicode would imply clearly that XML
 would follow Unicode even if Unicode were to diverge from ISO/IEC
 10646.  The joint reference makes clear our intent: if the Unicode
 Consortium and SC2 fail to keep the two standards in synch, then XML
 is not guaranteed to follow either of them.

 Reference to as yet unpublished standards (which is what reference to
 "the most recent version" amounts to) is unwise because there is and
 can be no guarantee that revisions in Unicode and 10646 will not
 require corresponding revisions to the XML spec.


 5.  Encoding of external text entities is kept as is.

 It has been suggested that by allowing external entities to be in
 different character encodings, XML is incompatible with ISO 8879,
 which does not allow this.

 The WG unanimously reaffirmed its belief that the current draft spec
 is in fact compatible with ISO 8879 under what is sometimes called the
 'new' character model.  SGML documents must have a single document
 character set declaration and thus a single document character set,
 but this reflects the output from, not the input to, the entity
 manager, and is thus independent of the character encoding encountered
 in the actual data stream of the external text entity.


 6.  Ideographic space is not white space.

 Decision (unanimous): ideographic space (#x3000) will be removed from
 the non-terminals S and PubidCharacter.

 Rationale:  Ideographic space corresponds more closely to the
 no-break space (#xA0, &nbsp;) than to the standard space character
 (#x20).  #xA0 is not allowed in S, and neither should ideographic
 space be.  It is unlikely, with current standard input methods for
 kanji, that any operator would unintentionally or accidentally insert an
 ideographic (#x3000) rather than a Latin (#x20) space within a tag.


 7.  Binding sources of information for character encodings will be
 specified.

 The current draft spec says nothing about the priority of various
 sources of information regarding character encodings.  Some
 participants (notably Gavin Nicol and Makoto Murata) have argued
 that this should be specified.

 Decision:  The spec should include wording to the following effect:

      If an XML document or entity is in a file, the Byte-Order Mark
    and encoding-declaration PI are used (if present) to determine
    the character encoding.  All other heuristics and sources of
    information are solely for error recovery.

      If an XML document is delivered via the HTTP protocol with a
    MIME type of text/xml, then the HTTP header determines the
    character encoding method; all other heuristics and sources of
    information are solely for error recovery.

      If an XML document is delivered via the HTTP protocol with a
    MIME type of application/xml, then the Byte-Order Mark and
    encoding-declaration PI are used (if present) to determine the
    character encoding.  All other heuristics and sources of
    information are solely for error recovery.

 -C. M. Sperberg-McQueen


 From: "C. M. Sperberg-McQueen" <cmsmcq@hd.uib.no>
 Subject: XML WG decisions of 3 September 1997


 The XML Work Group met today (3 Sept 1997) and made the decisions 
 described below.  Present were Jon Bosak (JB), Tim Bray (TB), James
 Clark (JC), Dan Connolly (DC), Steve DeRose (SJD), Paul Grosso (PG), 
 Dave Hollander (DH), Eliot Kimber (EK), Murray Maloney (MMa), Makoto 
 Murata (MMu), Joel Nava (JN), Jean Paoli (JP), Peter Sharpe (PS), and 
 Michael Sperberg-McQueen (MSM).

 1.  Procedures for determination of character encoding to be 
 described in an appendix.

 Background:  last week's report of decisions (31 August, posting 
 from U35395@UICVM.UIC.EDU), included as item 7 a decision regarding 
 "Binding sources of information for character encodings".  The WG
 revisited the issue, noted that in fact no formal vote on it had
 been taken (error in the report), and discussed whether such rules
 belong in the XML language spec or not.  

 Against inclusion:  the rules really apply to the delivery of XML in 
 very specific protocol environments, and should be included in the 
 specification of the protocol.  XML will be delivered by many protocols, 
 some of them not yet invented; the language spec should not have to be 
 revised every time a new protocol is deployed or invented.  

 For inclusion:  such conventions are important for encouraging 
 interoperability of XML software.  Conforming processors reading 
 the same material in the same environment should make the same 
 decisions about the character encoding.

 Decision:  The rules for locating binding information about the character
 encoding of XML entities (reported last week) will be described
 in an appendix.  They will be accompanied by a note making clear
 that the rules about http service properly belong in the RFC defining 
 the Mime types text/xml and application/xml, and that when those
 RFCs are available their text will supersede the recommendations
 of the appendix.

 The wording given in the posting of 31 August will be changed by
 replacing the phrases 'XML document or entity' and 'XML document' 
 with the phrase 'XML entity'.  (It has been argued that the term
 'entity' is not currently well defined in the XML spec; if the usage 
 of the term is later revised, this occurrence may be changed.)

 In favor:  all present.

 2.  A decision on case-folding was postponed again.

 A summary of the issues and a request for discussion by the SIG
 will be posted shortly.


 3.  XML processors to normalize CR, LF, and CRLF to LF.

 Background:  the current draft XML spec says nothing about whether 
 or how XML processors or applications should normalize the common
 line-break sequences CR, LF, and CRLF.  

 For normalization:  since the three sequences are intended, in practice,
 to have the same meaning, they can be normalized without loss of
 useful information.  If the XML processor does not normalize these
 sequences, every single downstream XML application will be forced to
 do so; experience shows that relying on them to do so will result in
 broken applications and inconsistent behavior.

 Against normalization:  right now the spec has no concept of line or
 line break; there is no need to introduce one, so for the sake of
 economy (and clarity) none should be introduced.

 For normalizing to LF:  thanks to C's standard IO model, it's what 
 most program libraries provide, and thus what most programs and most 
 programmers expect.

 For normalizing to CRLF:  it's more consistent with the specifications
 governing the Web.  Last time anybody looked at the ASCII spec, CRLF
 was the preferred form of this information.

 Against CRLF:  specifications?  On the Web?

 Decision:  When an XML processor encounters any of the character
 sequences CR (UTF-16 x000D), LF (UTF-16 x000A), or CR LF (UTF-16
 x000D x000A), the processor must pass a single LF character to the
 downstream application.  

 (Note:  this formulation of the decision presupposes that the set of 
 information which XML processors may or must make visible to downstream 
 applications will be described more fully than it is in the current 
 draft spec.  If the WG decides against such a description, this 
 substantive decision will need to be expressed in some other form.
 If the processor disappears from the XML language specification, as
 has been proposed, this decision may be expressed as a constraint on
 whether the differences among line-break sequences in the input
 stream are 'visible' or 'significant'.)

 -C. M. Sperberg-McQueen
  University of Illinois at Chicago
  tei@uic.edu


 From: Tim Bray <tbray@textuality.com>
 Subject: XML WG decisions of Wed. Sep. 10

 The XML WG met on Wed. Sep. 10th.  Present: Bosak, Kimber, Murata,
 Clark, Sperberg-McQueen, Wood, Nava, Bos, Maler, Bray, Tigue, Maloney,
 Paoli, DeRose.

 Errors in discussion summaries are, as usual, mine.

 1. Discussion of case sensitivity

 Few new arguments arose in the discussion of case sensitivity, aside
 from Steve DeRose's observation that disallowing case folding will,
 by removing the possibility that attribute values are case-folded,
 reduce the number of instances where the results of parsing can
 be affected by the presence/absence of a DTD.  (Note that the 
 handling of white space can still be affected in the case where 
 attribute values are known to be tokenized, so the problem hasn't
 entirely gone away).

 This is a summary of points made in a brief last-chance-to-speak-
 your-mind go-around:

 For Case Sensitivity: 
 - XML will rarely be created by hand and when it happens, it'll be by 
   experts.  
 - This is a chance to do the right thing early in XML's history and
   avoid living with a compromise forever.  
 - Case folding is very easy to specify and to understand.  
 - It would be   nice to be able to map case-sensitive objects, for example 
   DSSSL flow objects, to element types.  
 - Internationalization experts are unanimously against folding.  
 - Pleasant experiences with case-sensitive programming languages.  
 - Casefolding problems are truly vile.  
 - It will be easy to make XML processors recognize typical user errors 
   and provide helpful error messages.

 For Case Folding: 
 - It would be the right thing to do if we were starting from scratch, but 
   it's too late now.  
 - There will be serious difficulties dealing with the XML-in-HTML 
   scenario.  
 - It will make it impossible for HTML ever to be specified as an 
   application of XML as opposed to SGML.  
 - The XML spec has been out for nine months now; it's late in the game 
   to be making this change.

 The Question: Modify the XML specification to achieve the effect of
 NAMECASE GENERAL NO in SGML.

 Yes: Bosak Kimber Murata Clark Sperberg-McQueen Nava Bos
      Bray Tigue Maloney Paoli DeRose
 No: Wood
 Abstain: Maler

 So XML is now case-sensitive.

 1a: Since XML is case sensitive, we must specify the case of
 our keywords, i.e. <!ELEMENT or <!element.  Names not recorded,
 vote was
 Upper:  7  Lower: 3  Abstain: 4
 (In this vote, some of the abstains should be taken as don't-cares).

 2. Chris Maden's suggestion that NOTATION System Identifiers 
 should be mime types.  The WG liked the idea, but declined to 
 modify the spec to achieve tihs effect; among other things,
 URLs and mime types are not syntactically distinguishable.  It
 was the feeling of the group that it would be desirable that a 
 new URL scheme be created to allow a URL to locate a mime type.

 3. Discussion of the proposition that the XML spec should say
 more about what the processor passes the App.  John Tigue has
 volunteered to write an XML Grove Plan; while there is little 
 sentiment that this should be made normative, it might serve 
 usefully as either a separate application note or an appendix.

 The WG agreed that the editors should enrich the language of the
 spec sufficiently to make it clear (as it does with PIs and
 comments) what a processor may and must make available to an
 application.

 Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/

 PS: For your amusement, I attach the output produced by a 
 moments-ago-updated Lark when asked to process the XML spec:
 Loading
 Testing: Lark V0.92 Copyright (c) 1997 Tim Bray.
  All rights reserved; the right to use these class files for any purpose
  is hereby granted to everyone.
 Parsing...
 Syntax error at line 127:57: Start/End tags differ only in case: p/P
 Syntax error at line 367:23: Start/End tags differ only in case: ITEM/item
 Syntax error at line 369:51: Start/End tags differ only in case: ITEM/item
 Syntax error at line 370:69: Start/End tags differ only in case: item/ITEM
 Syntax error at line 454:4: Start/End tags differ only in case: P/p
 Syntax error at line 457:50: Start/End tags differ only in case: p/P
 Syntax error at line 750:50: Start/End tags differ only in case: termdef/TERMDEF
 Syntax error at line 752:34: Start/End tags differ only in case: lhs/LHS
 Syntax error at line 755:71: Start/End tags differ only in case: prod/PROD
 Syntax error at line 955:43: Start/End tags differ only in case: P/p
 Syntax error at line 956:7: Start/End tags differ only in case: ITEM/item
 Syntax error at line 959:19: Start/End tags differ only in case: p/P
 Syntax error at line 959:26: Start/End tags differ only in case: item/ITEM
 Syntax error at line 991:7: Start/End tags differ only in case: list/LIST
 Syntax error at line 1031:22: Start/End tags differ only in case: P/p
 Syntax error at line 1039:4: Start/End tags differ only in case: p/P
 Syntax error at line 1062:4: Start/End tags differ only in case: P/p
 Syntax error at line 1137:31: Start/End tags differ only in case: p/P
 Syntax error at line 1140:4: Start/End tags differ only in case: p/P
 Syntax error at line 1207:4: Start/End tags differ only in case: P/p
 Syntax error at line 1278:4: Start/End tags differ only in case: P/p
 Syntax error at line 1289:60: Start/End tags differ only in case: p/P
 Syntax error at line 1453:7: Start/End tags differ only in case: DIV2/div2
 Syntax error at line 1544:4: Start/End tags differ only in case: P/p
 Syntax error at line 1586:4: Start/End tags differ only in case: P/p
 Syntax error at line 1652:14: Start/End tags differ only in case: P/p
 Syntax error at line 1655:19: Start/End tags differ only in case: p/P
 Syntax error at line 1675:4: Start/End tags differ only in case: P/p
 Syntax error at line 1706:22: Start/End tags differ only in case: P/p
 Syntax error at line 1721:36: Start/End tags differ only in case: p/P
 Syntax error at line 1726:45: Start/End tags differ only in case: P/p
 Syntax error at line 1935:40: Start/End tags differ only in case: P/p
 Syntax error at line 2072:4: Start/End tags differ only in case: P/p
 Syntax error at line 2376:8: Start/End tags differ only in case: SCRAP/scrap
 Syntax error at line 2377:4: Start/End tags differ only in case: P/p
 Syntax error at line 2438:8: Start/End tags differ only in case: SCRAP/scrap
 Syntax error at line 2530:7: Start/End tags differ only in case: div3/DIV3
 Syntax error at line 2595:8: Start/End tags differ only in case: SCRAP/scrap
 Syntax error at line 2665:10: Start/End tags differ only in case: p/P
 Syntax error at line 2858:7: Start/End tags differ only in case: DIV2/div2
 Syntax error at line 3650:19: Start/End tags differ only in case: p/P
 Done.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Sat Sep 13 11:12:04 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:26 2004
Subject: XML Grove Plan
Message-ID: <199709130911.KAA26836@GPO.iol.ie>

[From Jon Bosak]
> 3. Discussion of the proposition that the XML spec should say
> more about what the processor passes the App.  John Tigue has
> volunteered to write an XML Grove Plan; while there is little 
> sentiment that this should be made normative, it might serve 
> usefully as either a separate application note or an appendix.

I raised this issue a long time ago and I am delighted to see it is being
considered for inclusion in XML. Having a grove plan gives developers
a sanity checker for their parsers. Having a grove plan with a syntactic form
that can be output from a parsers internal tree representation provides a
mechanism
for testing and comparing parsers. Having a grove plan allows apps to be
developed
that process post-parse data-structures as opposed to using an API.

>From my perspective the importance of this merits normative inclusion
in the spec. I am reminded of that well thumbed quintet of pages in
8879.  Annex G of appendix B, attatchement 1. Otherwise known as ESIS - 
the starting point for many an SGML structure controlled application.


Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat Sep 13 13:28:21 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:26 2004
Subject: Recent XML WG decisions
Message-ID: <9971@ursus.demon.co.uk>

In message <199709130538.WAA07282@boethius.eng.sun.com> Jon.Bosak@eng.Sun.COM (Jon Bosak) writes:
> While it is not our usual policy to post decisions of the XML Working
> Group to xml-dev, the last three WG meetings have seen a number of
> issues decided that bear directly on current experimental XML
> implementations.  Following are reports prepared by C. M.
> Sperberg-McQueen and Tim Bray detailing recent decisions that will be
> incorporated into the next working draft.

I would like to thank the XML-WG for posting the results of these decisions
and for providing so much of the detail.  [Note that the records are in 
chronological order, so that the final decision on case-folding comes towards
the end :-)]. I am sure that all xml-dev readers are aware that XML is still
at draft stage so that decisions which alter the current draft spec are
still possible.

As  someone privileged to be part of the XML-SIG discussion group I can confirm
that the discussion on these issues has been extremely constructive. The 
decision-making on the XML project is an impressive achievement in itself.
Whilst there is, and will not be, formal transmission from XML-DEV to XML-WG
it is carefully scanned by members of the WG and issues discussed here
constructively are taken note of.

Readers will note John Tigue's very generous offer to develop an Api for the
Grove Plan, and that this may accompany the spec in the future. I hope that
members of XML-DEV will help in this endeavour where appropriate.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sat Sep 13 13:28:29 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:26 2004
Subject: NOTATION/MIME (was Re: Recent XML WG decisions)
Message-ID: <9975@ursus.demon.co.uk>

In message <199709130538.WAA07282@boethius.eng.sun.com> Jon.Bosak@eng.Sun.COM (Jon Bosak) writes:
[... decision of XML-WG omitted...]
> 
>  2. Chris Maden's suggestion that NOTATION System Identifiers 
>  should be mime types.  The WG liked the idea, but declined to 
>  modify the spec to achieve tihs effect; among other things,
>  URLs and mime types are not syntactically distinguishable.  It
>  was the feeling of the group that it would be desirable that a 
>  new URL scheme be created to allow a URL to locate a mime type.

I am not wanting to re-open this discussion/decision, but I'd be very
grateful for clarification as to how a SytemID is used to identify the
type of a NOTATION. If I wish to identify it as 'image/gif', how do I
do this in practice? Is there a set of URLs that map onto current MIME types,
or is it impossible in XML to state what the MIME type of a NOTATION is?
[If so this is a pity, especially since HTTP, Java, etc. support MIME types.]
If it *is* impossible, how is a URL used with a NOTATION in practice, other 
than simply holding a textual description relating to it.

Does the last sentence mean that the XML-WG hopes to come up with such a
scheme or that some other body (e.g. IETF) may/might do so?

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Sat Sep 13 19:02:39 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:26 2004
Subject: Religion.1.02.xml and Shakespeare.1.02.xml
Message-ID: <199709131700.KAA07483@boethius.eng.sun.com>

I've updated my Religion and Shakespeare collections to be in what I
*hope* is accordance with the new case sensitivity rules.  (I am
firmly in favor of case sensitivity, but I'm the first to admit that
it will take some getting used to.)  I would appreciate it if the
parser-builders would check out these collections as soon as they've
incorporated case sensitivity and tell me whether I've got it right.

http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/religion.1.02.xml.zip
http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/shakespeare.1.02.xml.zip

As usual, I note that these collections don't really exercise very
many XML features, but they are useful for benchmarking and certain
kinds of stress testing.  In addition to being interesting reading, of
course.

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From eliot at isogen.com  Sat Sep 13 22:17:15 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:26 2004
Subject: XML Grove Plan
Message-ID: <3.0.32.19970913150653.00bc0140@swbell.net>

Note that a grove plan is not a property set: a grove plan is simply a
statement of which classes and properties are included in the property set
used by a particular processor or process.  For example, the HyTime default
grove plan is:

<grovplan propset=SGMLProp id=htdefgp>
<title>HyTime Default SGML Grove Plan</title>
<desc>
Removes processing instructions (pi) from and
adds pseudo-elements (pelement) to the default SGML
grove plan defined in the SGML property set.
</desc>
<inclmod>
pelement
</inclmod>
<omitclas>
pi
</omitclas>
</grovplan>

Which is itself a delta on the SGML default grove plan (indicated by the
presence of the "default" attribute on those modules, classes, and
properties included in the SGML default grove plan).

The discussion of grove plans can be found at:

http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-7.1.html#clause-7.1.4.2
and
http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.4.1.html#clause-A.4.1.6

At 09:44 AM 9/13/97 +0100, Sean Mc Grath wrote:
>I raised this issue a long time ago and I am delighted to see it is being
>considered for inclusion in XML. Having a grove plan gives developers
>a sanity checker for their parsers. Having a grove plan with a syntactic form
>that can be output from a parsers internal tree representation provides a
>mechanism
>for testing and comparing parsers. Having a grove plan allows apps to be
>developed
>that process post-parse data-structures as opposed to using an API.

There is a defined syntactic representation for *groves* (as opposed to
grove plans, which is what I think Sean meant), called the "canonical grove
representation" (CGR) document, described in
http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.4.5.html

CGR documents are designed such that two groves that are identical should
produce exactly the same CGR documents, character for character.  They are
designed specifically to enable the comparison of the groves produced by
different tools, which is useful both for checking tools and for doing
comparisons of documents by comparing their CGR documents (this allows
documents to be compared meaningfully without regard to their original
markup syntax as long as the groves used for comparison do not include any
markup properties).  CGR documents are also designed to be easy to process
with text processing tools like Perl so that they can be used must as you
would use the output of NSGMLS.

I'm in the process of creating a DSSSL spec to generate CGR documents using
Jade--I'll post something about it to comp.text.sgml when I get it working.

Cheers,

Eliot

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Sun Sep 14 01:06:31 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:26 2004
Subject: Recent XML WG decisions
In-Reply-To: <341AAE35.5C0B583A@technologist.com> (message from Paul Prescod on Sat, 13 Sep 1997 11:16:05 -0400)
Message-ID: <199709132304.QAA08273@boethius.eng.sun.com>

Memo to Paul Prescod:

1. Please do not put me down as the author of everything you quote
from something that I've forwarded to the group.  Everything that you
have attributed to me was in fact written by C. M. Sperberg-McQueen or
Tim Bray.

2. Please do not mindlessly copy me when replying to messages I happen
to post to the list.  I am not interested in receiving two copies of
everything you say.

3. Please do not post to the w3c-xml-wg list.

Jon


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From arnaud21 at club-internet.fr  Sun Sep 14 01:52:28 1997
From: arnaud21 at club-internet.fr (Arnaud Le Taillanter)
Date: Mon Jun  7 16:58:26 2004
Subject: Whitespace
Message-ID: <341B277D.3EFD@club-internet.fr>

Hello,

Still about white space, sorry :-)

First part : comments on the XML draft approach to
WS handling.
Second part : comments on Neil Bradley's five rules for
WS handling (version 1).

**First part** 

In the current draft, I see 3 rules concerning WS :

*Rule 1* : all WS is preserved and fed to the application.

A very simple rule indeed, in accordance with XML
design goals. But Neil Bradley five rules are
simple to implement too (though incorrect). On the contrary, consider
parameter entities: the committee members aknowledged
they had some difficulty designing a grammar for DTD
declarations, because of PEs. So implementing such a grammar
won't be trivial (BTW, someone said he had designed
a W grammar. It could be interesting
to see what it looks like. Please post!),
far less trivial than replacing CR, LF, CRLF by
a single character! (NB: the WG agreed a few days ago
on that rule :-)
So the simplicity argument doesn't hold.
The real issue is that the aplication must be fed with
a credible tree structure. Take a document without
a DTD:

<DOC>CR
<PART>CR
<P> foo</P>CR
</PART>CR
</DOC>

What kind of tree structure will the processor offer us?
A root node "DOC". So far, so good. But everybody
expects now a single child node (the "PART" element).
The processor gives us *three* for the same price:
the very useful "CR" element. The "PART" element.
And another "CR" node. What kind of ridiculous
tree is that ? A Tchernobyl tree I guess.

*Rule 2*: a validating parser must distinguish WS in
element content and signal to the application that such
WS is not significant.

I observe that it is not said how the parser will tell
the application about such insignificant WS. A minor point,
I concede. Wether the parser is validating or not, a
solution should be found where WS in element content
is *discarded* : this is the important point. No node
with only WS in it : it is completely against the
philosophy of SGML/XML: (well)*structured* content.
If the parser is able to distinguish what is element
content and what is not (the hard part without a DTD),
it should discard those completely useless WSs (the
easy part).

*Rule 3*: A special attribute may be inserted in documents
to signal an intention that the element to which this
attribute applies requires all white space to be treated as
significant by applications.
The value DEFAULT signals that applications' default white-space
processing modes are acceptable for this element; the value PRESERVE
indicates the intent that applications preserve all
the white space.

As someone observed, this is contradictory with the
position "the application should manage WS issues, the
parser doesn't intervene".
BTW, the attribute is hardly useful: suppose I put on the web a
document, with a "FOO" element with the attribute
"XML-SPACE" set to "DEFAULT". Application A
normalizes WS by default. Application B does nothing
with WS by default. As a result, an attribute set to "DEFAULT"
conveys absolutely no information. It will be the same as
"PRESERVE" with some applications. Basically, it
will be a mess :-) But we are used to that :-))
What is strange too, is that there is no default value
for this attribute by default. Those SGML guys are really
subtle :-)) A default value of "DEFAULT" would seem to be
natural, but in that case the application does anything
it wants to, so who cares :-)

**Second part**

Neil Bradley proposed some simple rules (this is "version 1", a second
version, a little more complex, but simple enough, was proposed). I
really like
the approach, even if it doesn't work for the moment.

*Rule 1*: standardization of input from different OSs.
 CR, LF, CRLF are translated to a line end code.
OBVIOUS!!!!!

*Rule 2*: line end codes after a start tag or before an end tag are
discarded. A simple rule. For usual elements, it is exactly what you
expect :
<P>
blabla
<P>
becomes <P>blabla</P>
for PRE-like elements:
<PRE>
SPSPblabla
</PRE>
becomes <PRE>SPSPblabla</PRE>, so two line ends are discarded.
It seems nevertheless natural that these line ends are dropped.
BTW, this rule was in the first (11/14/96) XML draft.
There is a first problem with this approach: in
default content (preserved content will be examined later):
<P><EM>Two
</EM>words</P>
becomes
<P><EM>Two</EM>words</P>
The space between "Two" and "words" evaporated.
Same thing with:
<P><EM>
Two
</EM>words</P>
I don't think this particular problem is important: the encoding
is not natural. It should be an error!
 I think everybody would write:
<P><EM>Two</EM> words</P>, or
<P>
<EM>Two</EM> words
</P>, etc...

 
Inside a preserved element, line end codes are wrongly discarded
after element start tags and before element end tags:
<PRE XML-SPACE="PRESERVE">
         blabla <EM>
         bloblo</EM>
         blublu
</PRE>

The coding in this case is natural: bla, blo and blu are very
aesthetically aligned!
But: a line end code is discarded after "<EM>", it shouldn't be.
So: preserved elements need a special rule. It seems quite natural
they need a special rule concerning line end codes (and
space codes).
A possibility: the parser closes a "default" (not preserved) element,
and opens a "preserved" element: the line end codes after the start tag
and before the end tag are discarded. But for a preserved element
directly embedded in a preserved element, line end codes
are left intact.
  
*Rule3*: WS in element content is discarded.
WS space in element content *must* be discarded. The problem
is: without a DTD, one doesn't know if an element contains only
other elements.
Suppose we have :
<P><EM>blabla</EM>SP<EM>bloblo</EM></P>
We could choose a rule like: an element in which the parser
finds only other elements and WS (no characters) is an element
content element. But as the above example shows, it doesn't work.
If we follow this rule, we have a tree with a root node "P" and
two child nodes "EM". And what we want is a root note with three
child nodes: two "EM" elements and between the two a "PCDATA"
element (the space between "blabla" and "bloblo")
So a different method must be found.
A radical constraint put on the user would be: don't input a single
space character in element content. With this rule the parser
will be able to recognize easily element content. But you
can forget about indentation in that case. The rule for the
user would be: "when you type a space, you mean a space".
BTW, this is always the case, except for indentation.
If the semantic overloading for the space character is removed
(a space is either a "real" space or an indentation space),
things are so much easier.

*Rule 4*: Except in preserved elements (elements
with a space attribute set to "PRESERVE") line end codes are
discarded when preceded by a hard or
soft hyphen (in the process, a soft hyphen is also discarded) and
remaining line end codes are treated as space. 

The rule concerning hyphens is not necessary. If it's a hard hyphen,
don't put it at line end (who would do that?)
Moreover, there is no use in an XML source file to put a soft
hyphen at line end. Who would do that? In my poor life, I have no occa-
sion to see some text with hyphens at line end.

There is a possible problem with the replacement of line end codes
in default (that is, not preserved) elements by a space character.
Suppose we have a text coded with Unicode (that could
happen :-)), with chinese ideographs. In chinese,
there is no concept of a word (sequence of letters): each ideograph is a
"word".
I don't know how in fact the chinese encode their texts, but there
is obviously no utility in putting a space after each ideograph.
The chinese must use nevertheless the end of line
character. And one shouldn't replace such a character by a space, which
would be an error, but simply discard it.
Depending on the class
of characters, there could be a different treatment of line end codes.
But this becomes complex :-(
Another approach: simply ignore line end codes. But you
have to put a space at the end of a line. The idea is quite
natural: line end codes are there for our eyes, they don't add
anything to the meaning of a text. The XML tree should
reflect the substance of a text, not the particular way it
was input:
<P>
We should 
get rid of 
line end 
codes 
</P>
and
<P>We should get rid of line end codes</P>
should give the same node in the document tree.
If line end codes must be preserved: use a preserved element, or
an empty element (<BR/>).
 

*Rule 5*: except in preserved elements, consecutive WS characters
are reduced to a single space.

I don't like this rule. If I put two spaces after a point, I mean two
spaces.
It's a typographic decision.
Rule 5 is meant to allow some indentation:

<P>
He said:
     <QUOTE>
           I need some
           indentation.SPSPIndentation is needed.
     </QUOTE>
</P>

In the above example, it is necessary to get rid of  spaces caused
by indentation. But the two spaces marked "SP" should be retained.
So the new rule would be: SPs at the beginning of a line should be
discarded.
This rule must happen before line end codes ere discarded, ie before
rule 2. What a headache :-)
Perhaps a simple rule could be: don't use indentation in XML files, or
you'll
get burned.
More generally, if we want the parser to produce a clean data structure
out
of an XML file, some burden will have to be put on the user's shoulders.
The contract could be: the user accepts some limitations on the way to
input the source code. He could have to write instead of the above
something like:

He said: 
<QUOTE>
I need some
indentation.SPSPIndentation is needed.
</QUOTE>
</P>

The reward (unvaluable) will be: a clean data
structure available for applications.

Thanks for your attention!

Regards,

Arnaud

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Sun Sep 14 02:33:49 1997
From: dgd at cs.bu.edu (David Durand)
Date: Mon Jun  7 16:58:26 2004
Subject: Arnaud Le Taillanter on whitespace
Message-ID: <199709140033.UAA08971@csb.bu.edu>


  I'll be very brief. There's little chance that there will be any new
whitespace ignoring rules in XML. Everyone involved has read (and
written!) literally hundreds of messages on the topic. Every variation
you discussed has been gone over and they all were either:
  1. unworkably complex (like the current SGML rules, whihc few
remember and even fewer remember correctly)).
  2. Not compatible with SGML, or unworkably ugly like the proposal to
quote all literal text.
  3. Failed to work without a DTD. This is the kicker, and it's
required by XML because you don't always have the DTD, and different
results in the has-DTD/doesn't-have-DTD cases are unacceptable.


The recent change (to normalize all linends) fills the one hole the
previous proposal had -- because it was nearly certain that some
processes would blindly change CRLF and their ilk anyhow.

My advice: don't waste you're bytes complaining about this -- we've
heard it _all_ before -- and the solution that works best is to leave
it to the application.

Aside:
  XML-SPACE doesn't affect this -- it's in the lines of a "standard
hint" that will allow applications like web-crawlers and full-text
indexers to make more sense out of markup according to DTDs about
which they lack special knowledge. So it doesn't contradict the "pass
all space" philosophy, but rather supplements it, to enhance document
re-use.

   -- David


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From neil at bradley.co.uk  Sun Sep 14 10:26:11 1997
From: neil at bradley.co.uk (Neil Bradley)
Date: Mon Jun  7 16:58:26 2004
Subject: Whitespace
Message-ID: <199709140826.JAA29099@andromeda.ndirect.co.uk>


> Reply-to:      Arnaud Le Taillanter <arnaud21@club-internet.fr>

> Neil Bradley proposed some simple rules (this is "version 1", a second
> version, a little more complex, but simple enough, was proposed). I
> really like
> the approach, even if it doesn't work for the moment.

I agree they are inadequate, but I think my second attempt was more 
acurate than my first, so I am surprised that you now dissect the 
first attempt. Still, I am happy to see this issue continue to be 
aired.

 
> *Rule 1*: standardization of input from different OSs.
>  CR, LF, CRLF are translated to a line end code.
> OBVIOUS!!!!!

Absolutely, but perhaps not to some programmers unfamiliar with, for 
example, the Mac line-end conventions.
 
> *Rule 2*: line end codes after a start tag or before an end tag are
> discarded. A simple rule. For usual elements, it is exactly what you
> expect :

> <P><EM>Two
> </EM>words</P>
> becomes
> <P><EM>Two</EM>words</P>
> The space between "Two" and "words" evaporated.
> Same thing with:
> <P><EM>
> Two
> </EM>words</P>
> I don't think this particular problem is important: the encoding
> is not natural. It should be an error!
>  I think everybody would write:
> <P><EM>Two</EM> words</P>, or
> <P>
> <EM>Two</EM> words
> </P>, etc...

I have long thought that 'some' formatting options should simply be 
made illegal, and that we should then ensure widespread knowledge of 
restrictions to future document authors. This is the main example I 
had already considered.

> Inside a preserved element, line end codes are wrongly discarded
> after element start tags and before element end tags:
> <PRE XML-SPACE="PRESERVE">
>          blabla <EM>
>          bloblo</EM>
>          blublu
> </PRE>

Again, I think this coding is very unnatural. 


> *Rule 4*: Except in preserved elements (elements
> with a space attribute set to "PRESERVE") line end codes are
> discarded when preceded by a hard or
> soft hyphen (in the process, a soft hyphen is also discarded) and
> remaining line end codes are treated as space. 
> 
> The rule concerning hyphens is not necessary. If it's a hard hyphen,
> don't put it at line end (who would do that?)

It is in fact a very natural action, which I have seen many times.

> Moreover, there is no use in an XML source file to put a soft
> hyphen at line end. Who would do that? In my poor life, I have no occa-
> sion to see some text with hyphens at line end.

I have. Many times.
 
> *Rule 5*: except in preserved elements, consecutive WS characters
> are reduced to a single space.
> 
> I don't like this rule. If I put two spaces after a point, I mean two
> spaces.
> It's a typographic decision.
> Rule 5 is meant to allow some indentation:
> 
> <P>
> He said:
>      <QUOTE>
>            I need some
>            indentation.SPSPIndentation is needed.
>      </QUOTE>
> </P>

NO IT WAS NOT! I have never said this, and I did not intend to imply 
this. The reason for this rule was purely to remove surplus spaces 
generated by the effect of previous rules.
 
> Arnaud

I am more than happy for people to pull-apart my proposed rules. That 
is what I put them here for. But please refer to the second attempt, 
not the first.

Neil.

-----------------------------------------------
Neil Bradley - Author of The Concise SGML Companion.
neil@bradley.co.uk
www.bradley.co.uk

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Sun Sep 14 11:00:12 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:26 2004
Subject: XML Grove Plan
Message-ID: <199709140859.JAA08220@GPO.iol.ie>

[Eliot Kimber]
>
>There is a defined syntactic representation for *groves* (as opposed to
>grove plans, which is what I think Sean meant), called the "canonical grove
>representation" (CGR) document, described in
>http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.4.5.html
>
Thanks for the correction + the pointer Eliot.

>CGR documents are designed such that two groves that are identical should
>produce exactly the same CGR documents, character for character. 
Wonderful.

> They are
>designed specifically to enable the comparison of the groves produced by
>different tools,
Wonderful++.

> CGR documents are also designed to be easy to process
>with text processing tools like Perl so that they can be used must as you
>would use the output of NSGMLS.
pow(Wonderful,10)

>
>I'm in the process of creating a DSSSL spec to generate CGR documents using
>Jade--I'll post something about it to comp.text.sgml when I get it working.

Thanks again Eliot. Can I ask John Tigue if he is thinking CGR as part of
his XML grove work? Can XML-DEVers do anything to help???


Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Sep 14 11:27:43 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:26 2004
Subject: Whitespace
Message-ID: <10005@ursus.demon.co.uk>

In message <199709140826.JAA29099@andromeda.ndirect.co.uk> "Neil Bradley" writes:

> > Reply-to:      Arnaud Le Taillanter <arnaud21@club-internet.fr>
> 
> > Neil Bradley proposed some simple rules (this is "version 1", a second
> > version, a little more complex, but simple enough, was proposed). I
> > really like
> > the approach, even if it doesn't work for the moment.
> 
> I agree they are inadequate, but I think my second attempt was more 
> acurate than my first, so I am surprised that you now dissect the 
> first attempt. Still, I am happy to see this issue continue to be 
> aired.

Any constructive discussion on this subject is appropriate for XML-DEV. 
As we have archives, it's important that posters read them beforehand,
especially on this subject.

[...]
> 
> I am more than happy for people to pull-apart my proposed rules. That 
> is what I put them here for. But please refer to the second attempt, 
> not the first.

Two procedural points (I am not commenting on the content):

- the postings are all referenceable by URLs on Henry Rzepa's archive,
so please use these if there is a chance of confusion.

[David D]
- please try to keep the same subject for the thread so that it can
later be read in hypermailed form more easily.

	P.


-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Sun Sep 14 12:26:38 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:26 2004
Subject: NOTATION/MIME (was Re: Recent XML WG decisions)
Message-ID: <199709141028.UAA14599@jawa.chilli.net.au>

 
> From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
 > 
> I am not wanting to re-open this discussion/decision, but I'd be very
> grateful for clarification as to how a SytemID is used to identify the
> type of a NOTATION. If I wish to identify it as 'image/gif', how do I
> do this in practice?  

Peter asked me to on-post this.

The standard way to stick a MIME type into a system identifier is
given as part of HyTime '97. First we have a notation declaration
(which is really only for documentation, so you don't need it
if you don't want it).

<!NOTATION mimetype PUBLIC "-//IETF/RFC1521//NOTATION?
??????????????????? FSISM PORTABLE
??????????????????? MIME Content Type//EN"><!-- Refer RFC 1700 -->

This notation declaration allows us to use "mimetype" in
Formal System Identifiers, which are system identifiers with
little pseudo-start tags giving the notation used in the rest
of the string. So we can then declare the notation "gif"
to be the mime type "image/gif" by

<!NOTATION gif SYSTEM "<mimetype>Content-Type=image/gif">

A full form for this with both public and system identifiers 
would be 

<!NOTATION gif PUBLIC
	"ISBN 0-7923-91::Graphic Notation//NOTATION?
	Compuserve Graphic Interchange Format//EN"?
	"<mimetype>Content-Type=image/gif">

Presumably you could also stick other MIME parameters in also,
after semicolons, e.g.

<!notation multipart-mime 
	PUBLIC "-//IETF/RFC1521//NOTATION?MIME Content Type Multipart Mixed//EN"
	SYSTEM '<mimetype>Content-Type=multipart/mixed;boundary="--@QQQ@--"'>


(There is also provision of a notation called simply "mime", which
can be used for burrowing into a MIME file for specific parts. )

Rick Jelliffe

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Sun Sep 14 13:43:08 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:27 2004
Subject: NOTATION/MIME (was Re: Recent XML WG decisions)
Message-ID: <10011@ursus.demon.co.uk>

Thanks very much Rick,
In message <199709141028.UAA14599@jawa.chilli.net.au> "Rick Jelliffe" writes:
[...]
> The standard way to stick a MIME type into a system identifier is
> given as part of HyTime '97. First we have a notation declaration
> (which is really only for documentation, so you don't need it
> if you don't want it).
> 
> <!NOTATION mimetype PUBLIC "-//IETF/RFC1521//NOTATION
>  FSISM PORTABLE
>  MIME Content Type//EN"><!-- Refer RFC 1700 -->

Being picky, this is not valid XML since prod [74] requires a SystemLiteral
as well as the PubidLiteral.
> 
> This notation declaration allows us to use "mimetype" in
> Formal System Identifiers, which are system identifiers with
> little pseudo-start tags giving the notation used in the rest
> of the string. So we can then declare the notation "gif"
> to be the mime type "image/gif" by
> 
> <!NOTATION gif SYSTEM "<mimetype>Content-Type=image/gif">

This is fine for my purposes, but I'm not clear how it fits with the XML spec.
4.3.2 says:
'The SystemLiteral that follows the keyword SYSTEM [...] is a URL, ...'
It says nothing about SystemLiterals which follow the PubidLiteral (your 
example is clearly not a URL). So my reading of the XML spec is that your
code above is invalid XML :-). If so, it would be useful if the WG had some way
that it was allowed.

[...]
	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From arnaud21 at club-internet.fr  Sun Sep 14 18:21:34 1997
From: arnaud21 at club-internet.fr (Arnaud Le Taillanter)
Date: Mon Jun  7 16:58:27 2004
Subject: whitespace
References: <199709140033.UAA08971@csb.bu.edu>
Message-ID: <341C0F4B.5FDB@club-internet.fr>

David Durand wrote:

> 
>   I'll be very brief. There's little chance that there will be any new
> whitespace ignoring rules in XML. Everyone involved has read (and
> written!) literally hundreds of messages on the topic.

Inside the XML WG mailing list the WS issue was surely
extensively discussed, but I don't have access to
the archive of this discussion. I know it's already
a favor that the XML draft is made public (all drafts
and standards of W3C are public, I think this
helps) and that XML WG members are participating
in the xml-dev mailing list (they could avoid it).
Well, I ask for another favor: could you please make the
discussion about WS that led to the WG decision
available on line? After such a reading, everybody
could become convinced of the appropriate nature
of the WG decision. Please!

> Every variation
> you discussed has been gone over and they all were either:
>   1. unworkably complex (like the current SGML rules, whihc few
> remember and even fewer remember correctly)).

Agreed.

>   2. Not compatible with SGML, or unworkably ugly like the proposal to
> quote all literal text.

If SGML rules concerning WS are to be discarded, any
other rule adopted is incompatible, including the draft rule.

>   3. Failed to work without a DTD. This is the kicker, and it's
> required by XML because you don't always have the DTD, and different
> results in the has-DTD/doesn't-have-DTD cases are unacceptable.

I agree. The tree structures must be exactly the same in either case.
Some constraint regarding WS is necessary on the way to input an
XML text I assume.

> 
> The recent change (to normalize all linends) fills the one hole the
> previous proposal had -- because it was nearly certain that some
> processes would blindly change CRLF and their ilk anyhow.
> 
> My advice: don't waste you're bytes complaining about this -- we've
> heard it _all_ before -- and the solution that works best is to leave
> it to the application.

I am sure I will get
convinced when I read the WG discussion :-)
Or I fear the WG members will have to hear it all (and more)
again :-))

> 
> Aside:
>   XML-SPACE doesn't affect this -- it's in the lines of a "standard
> hint" that will allow applications like web-crawlers and full-text
> indexers to make more sense out of markup according to DTDs about
> which they lack special knowledge. So it doesn't contradict the "pass
> all space" philosophy, but rather supplements it, to enhance document
> re-use.
> 
>    -- David
> 

Arnaud

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ricko at allette.com.au  Sun Sep 14 18:24:56 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:27 2004
Subject: NOTATION/MIME (was Re: Recent XML WG decisions)
Message-ID: <199709141627.CAA26463@jawa.chilli.net.au>

This is off topic for XML-DEV. Apologies.


> From: Peter Murray-Rust <Peter@ursus.demon.co.uk>
 
> In message <199709141028.UAA14599@jawa.chilli.net.au> "Rick Jelliffe" writes:
> [...]
> > The standard way to stick a MIME type into a system identifier is
> > given as part of HyTime '97.

Sorry, maybe I should have capitalized "standard" to be clearer.  XML is certainly
neither standard (common) nor Standard (adopted by a reputable open not-for-profit 
body whose job is to set standards without undue proprietary influence) at 
the moment. 


> Being picky, this is not valid XML since prod [74] requires a SystemLiteral
> as well as the PubidLiteral.

Yep.  And do the < and > have to be entity references too in XML?


> > <!NOTATION gif SYSTEM "<mimetype>Content-Type=image/gif">
> 
> This is fine for my purposes, but I'm not clear how it fits with the XML spec.

Yep, XML does not support "formal" system identifiers as I understand it. 
I think it is a shame, since there are things that are not URLs that would 
be nice as identifiers, even in web systems.  But support for FSIs can be 
retrofitted at some later stage to XML.  I hope there is no chance 
of them being added to XML 1.0.  But I hope people keep FSIs in mind as 
a good way to ramp up the power of URIs and other identifiers in the
near future, in particular for selecting particular system identifier 
notations (schemas).

For example, assuming hrefs could be FSIs, you could have

<a  href="&lt;xml publickey='ASDASKDKJHDFKSJH#(@#$HAHAJSDLKASHD'&gt;x.txt" />

in which data about the transfer and unpacking of the resource (e.g. here a
public key for encryption) is also marked up as a part of the system identifier. 


Rick Jelliffe

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From tikvas at agentsoft.com  Mon Sep 15 12:13:12 1997
From: tikvas at agentsoft.com (Tikva Schmidt)
Date: Mon Jun  7 16:58:27 2004
Subject: New AgentSoft XML demo
Message-ID: <341CFB78.658D@agentsoft.com>

New AgentSoft XML demo is now available on the Web 
at  http://www.agentsoft.com/xml/.

    In a nutshell, the demo reads an XML file along with its associated
DTD file and uses the information in the DTD file to guide the
user in specifying a semantically meaningful query.  The XML file
is then searched for elements matching the query.  While the system
will work on any valid XML and DTD files, Java applet security limits it
to files on our own server, which now consist of CDF files and an act
from a Shakespeare play. We would be happy to add any valid XML samples
to the demo.

    The demo has been developed as part of AgentSoft's initiative to
integrate XML support into our LiveAgent Pro system.  LiveAgent Pro
allows users to record agents that automate Web access and 
interaction.  For more information on LiveAgent Pro see our main Web
page at http://www.agentsoft.com.

Feel free to send any comments you have on our demo
to xml@agentsoft.com.

     Tikva Schmidt.
--------------------------------------------------------------------
Tikva Schmidt.
email: tikvas@agentsoft.co.il
corp:  Agentsoft Ltd.     http://www.agentsoft.co.il
Phone: 972-2-6480573
---------------------------------------------------------------------

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From russc at livepage.com  Mon Sep 15 15:27:18 1997
From: russc at livepage.com (Russell Chamberlain)
Date: Mon Jun  7 16:58:27 2004
Subject: Microsoft's XSL Proposal
Message-ID: <3.0.1.32.19970915092937.007a73d0@livepage.com>

Hi,

I'm _extremely_ happy that someone (Microsoft) has put forward an
XML-formatting proposal (XSL - Extensible Style Language) to the W3C that:

    1) Is represented in XML

       This is _absolutely_ necessary if XML is to have any mass appeal.
       Using a non-XML format (eg. DSSSL) flies in the face of what
       XML is hoping to accomplish. I confess that I laughed out loud
       when I heard that DSSSL was the chosen processing environment for XML.
       (Purely for the fact that DSSSL isn't represented in XML, and not
       for any other reason!)

       A major advantage of XML representation is, of course, that you
       can use your favourite XML editor as a stylesheet editor. The
       daunting task of matching braces and finding syntax errors is
       greatly reduced.

    2) Is complementary to DSSSL

       The proposal states explicitly that it is _complementary_ to DSSSL,
       with the same "principles and processing model". This will help to
       ensure consistent processing, regardless of its representation.

    3) Is (predominantly) declarative

       The programmatic nature of DSSSL is something that can severely
       limit its appeal. Remember the "Is DSSSL Hard?" thread in
       comp.text.sgml? My impression was that most of the folks who
       answered "No" to the above question were people who were 
       hard-core programmers. I am such a person, but I would answer
       a loud "Yes!". Maybe I've interacted with more non-programmers
       and/or users. Nevertheless, since formatting is what most
       novices start with, it is best that their tools be easier.
       The common processing model should allow for easy migration
       to DSSSL, if its greater power is desired.

       3.1) ...while retaining programmatic features

            Power is a good thing, so long as its presence doesn't
            prevent simple things from staying simple. I think that
            the scripting features of XSL are nicely unobtrusive.

    4) Lets you reorder and restructure the elements

       This is a _big_ plus. Most (all?) of the declarative formatting
       environments that I've been exposed to don't let you change the
       structure or sequence of the elements during formatting. When the
       chapter number came after the title, you could never put it in
       front of the title. The lack of such power may have kept a few
       of us from using/designing declarative processing environments.
       Not any more.

    5) Has inline styles

       This mechanism lets you specify formatting properties on the
       element itself. This is a remarkably simple way of formatting
       that _one_ element that has to be different from all the rest,
       but whose context is too complicated or difficult to express.
       Here's an example from the proposal:

           <para xsl:font-weight="bold">

       Note that this is from the source document itself, and not
       from an XSL stylesheet. Neat. 

    6) Supports named modes

       A mode is simply a named formatting scheme. Only those rules, 
       etc. that apply to the current mode are used. This lets you
       store rules for different presentations in the same stylesheet.
       In their example, the "toc-mode" mode is used for a Table of
       Contents presentation only, and the default mode is used for
       the usual presentation.

       This should also cut down on the duplication that usually
       occurs when different stylesheets are used for different
       presentations. It'll cut down on duplication errors, too,
       because it is possible for most of the rules, etc. to
       be centralized and shared.

    7) Has a clearly-defined conflict-resolution mechanism 

       Some formatting environments specify that the "first"
       applicable style in the stylesheet is always the one to
       be applied. A stylesheet's behaviour should not change
       based on the location of a style in the stylesheet's
       source file. XSL will let authors organize their styles
       in any way they see fit, with no effect on behaviour.
       
       Some environments also allow _multiple_ styles to be
       applied. Which ones, and in what order? Yuck! 
       XSL explicitly states that at most a single pattern
       will be chosen. Good idea.
       
So much for my praise of a terrific standards initiative.

I have a few questions regarding the proposal itself, though:

- Some XSL tags seem to be mutable, in that they can be empty
  or non-empty. The <target-element> tag, in particular, is used
  both ways in the examples, eg:

    <rule>
      <!-- EMPTY -->
      <target-element type="orders"/>
      . . .
    </rule>

    and later:

    <rule>
      <!-- non-EMPTY -->
      <target-element type="table">
        <element type="title"/>
      </target-element>
    </rule>

   Is this proper XML? Am I wrong in thinking that <tag/> is reserved
   for tags that are _always_ empty? Is this just a notational convenience
   within the proposal?

 - The DTD contains (gasp!) an exclusion rule. What's going on here?
   The fact that exactly one <target-element> should appear per rule
   is something that the XSL application must enforce. I recommend
   using a comment, instead, so that the DTD can eventually be valid XML.


All in all, I'm much more excited about the future of XML.

Sorry if this isn't the place to discuss this.

As usual, I can't end without an XSLvely bad pun,

 - Russ

PS - You can get the XSL spec at:

    http://www.microsoft.com/standards/xsl/xslspec.htm

------------------------------------------------------
Russ Chamberlain - Software Developer
INFORIUM (The Information Atrium Inc)
Waterloo, Ontario, Canada


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From crism at ora.com  Mon Sep 15 17:02:22 1997
From: crism at ora.com (Chris Maden)
Date: Mon Jun  7 16:58:27 2004
Subject: NOTATION/MIME (was Re: Recent XML WG decisions)
In-Reply-To: <199709141627.CAA26463@jawa.chilli.net.au> (ricko@allette.com.au)
Message-ID: <199709151505.LAA26058@geode.ora.com>

[Rick Jelliffe]
> [Peter Murray-Rust]
> > Being picky, this is not valid XML since prod [74] requires a
> > SystemLiteral as well as the PubidLiteral.
> 
> Yep.  And do the < and > have to be entity references too in XML?
> 
> Yep, XML does not support "formal" system identifiers as I
> understand it.  I think it is a shame, since there are things that
> are not URLs that would be nice as identifiers, even in web systems.
> But support for FSIs can be retrofitted at some later stage to XML.
> I hope there is no chance of them being added to XML 1.0.  But I
> hope people keep FSIs in mind as a good way to ramp up the power of
> URIs and other identifiers in the near future, in particular for
> selecting particular system identifier notations (schemas).

FSIs were discussed at the beginning.  A decision was made that they
were better left for later, and I agree.

A decision was also made that all system identifiers would have an
implicit FSI identifier of <URL>, which I also think is usually a good
idea.  This allows FSIs to be added later, and any unlabeled system ID
is implied to have <URL>.

What I was suggesting on the SIG was that for system identifiers in
notation declarations, the assumed FSI label would be <mimetype>.  As
Rick pointed out, this is legal HyTime 2 FSI notation, and would be
very useful.  However, the WG has made its decision.

I believe that XML authors are largely going to refer to images simply
by URLs instead of entities; in that case, file system associations or
HTTP headers can be used to ascertain the entity's type.  In cases
where NDATA entities are used, I would recommend that XML implementors
ignore the system identifier of the notation, and make their decision
based on the entity itself.

-Chris
-- 
<!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN">
<!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN"
"<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487
<USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek>

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From ht at cogsci.ed.ac.uk  Mon Sep 15 18:15:00 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun  7 16:58:27 2004
Subject: Microsoft's XSL Proposal
In-Reply-To: Russell Chamberlain's message of Mon, 15 Sep 1997 09:29:37 -0400
References: <3.0.1.32.19970915092937.007a73d0@livepage.com>
Message-ID: <446.199709151614@grogan.cogsci.ed.ac.uk>

Thanks for all your kind words.  All the better, your queries are
easily answered:

1) <target-element/>

The August 7th draft of XML-lang introduced the use of NET for
contingently, as well as declared, empty elements.  Until the NetSGML
TC is passed, use of this feature will produce XML documents which are
NOT valid SGML.

2) The DTD

The DTD in the appendix was simply intended to clarify a few points
about the structure of patterns and actions.  We should have been
clearer that it is NOT a constitutive part of the proposal.  A more
complete (and XML conformant) DTD should be forthcoming soon.

ht
-----------
Henry S. Thompson, Human Communication Research Centre, University of Edinburgh
      2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
               Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk  
                      URL: http://www.cogsci.ed.ac.uk/~ht/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From russc at livepage.com  Mon Sep 15 18:18:08 1997
From: russc at livepage.com (Russell Chamberlain)
Date: Mon Jun  7 16:58:27 2004
Subject: Microsoft's XSL Proposal
In-Reply-To: <199709151605.MAA06963@nathaniel.eps.inso.com>
References: <3.0.1.32.19970915092937.007a73d0@livepage.com>
Message-ID: <3.0.1.32.19970915121630.007c1c60@livepage.com>

Hi,

At 12:05 PM 97/09/15 -0400, you [Gavin Nicol] wrote:
>>I'm _extremely_ happy that someone (Microsoft) has put forward an
>>XML-formatting proposal (XSL - Extensible Style Language) to the W3C that:
>
>This is NOT a Microsoft proposal... other were (heavily) involved.

Thousands of apologies!!!!!!!

Here is a full list the folks who deserve credit
(as mentioned in the proposal itself):

    Sharon Adler, Inso Corporation
    Anders Berglund, Inso Corporation
    James Clark
    Istvan Cseri, Microsoft Corporation
    Paul Grosso, ArborText
    Jonathan Marsh, Microsoft Corporation
    Gavin Nicol, Inso Corporation
    Jean Paoli, Microsoft Corporation
    David Schach, Microsoft Corporation
    Henry S. Thompson, University of Edinburgh
    Chris Wilson, Microsoft Corporation

Good work!

 - Russ


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From russc at livepage.com  Mon Sep 15 20:04:55 1997
From: russc at livepage.com (Russell Chamberlain)
Date: Mon Jun  7 16:58:27 2004
Subject: Microsoft's XSL Proposal
In-Reply-To: <341D708C.26E27765@EpiphanySoftware.com>
References: <3.0.1.32.19970915092937.007a73d0@livepage.com>
Message-ID: <3.0.1.32.19970915140339.007c2800@livepage.com>

Hi Andy (& XML-DEVers),

At 10:29 AM 97/09/15 -0700, you [Andy Cogan] wrote:
>Hi Russell,
>
>Russell Chamberlain wrote:
>> I'm _extremely_ happy that someone (Microsoft) has put forward an
>> XML-formatting proposal (XSL - Extensible Style Language) to the W3C that:
>> 
>>     1) Is represented in XML
>[...snip...]
>
>First, I agree with your points in your original mail message. Well
>said! I've only recently started following the development of XML, and
>DSSSL-O looked pretty intimidating. I like the direction of XSL.
>
>How did you find out about the XSL initiative? It seems like a major new
>development, and I hate the feeling of being blindsided by being
>ignorant of such important efforts.

It came from the "XML/EDI Group Mailing List". 
The subject line was "More good news for XML/EDI !!!".
I must confess that it was forwarded to me by a co-worker (thanks, Rich!)
who subscribes to the list. I wouldn't have heard of it, otherwise.
That's why I posted to XML-DEV, since I thought is was important, yet
nobody had mentioned it.

>Finally, I've gotten the impression that XML formatting can happen via
>CSS, or XSL, or DSSSL-O. Can that be that right? It seems odd to offer
>three distinct formatting languages. Or am I just completely confused (a
>likely alternative!)?

I certainly have heard all three mentioned in an XML context.
Would anyone care to clarify this? I know that there may not be an
answer yet, as the XML style issues are still in draft form. Last I
heard, the deadline was around December of this year.

I'm not involved with the XML-format (XML-style?) discussions, so I don't
know the answer. I am willing to _guess_ that one reason for including all
three might be because various organizations have investments in one, but not
the other(s), so restricting to one just might upset a few people
(big understatement). 

Also, I can see a definite trend in ease-of-use and power that goes
CSS-->XSL-->DSSSL. Which one you want may depend on where your needs
lie on the spectrum.

There's also an existing application that has already been targeted
for XML: the WWW. This already has CSS defined.

>
>-- 
> Andy Cogan
> Epiphany Software
>****************************************
>* E-mail: support@EpiphanySoftware.com *
>* Voice: (408) 378-6145                *
>* Web: http://www.EpiphanySoftware.com *
>****************************************

Your guess speaker,

 - Russ


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From peter at techno.com  Mon Sep 15 21:15:34 1997
From: peter at techno.com (Peter Newcomb)
Date: Mon Jun  7 16:58:27 2004
Subject: NOTATION/MIME (was Re: Recent XML WG decisions)
In-Reply-To: <199709151505.LAA26058@geode.ora.com> (message from Chris Maden
	on Mon, 15 Sep 1997 11:05:12 -0400)
Message-ID: <199709151859.OAA13746@exocomp.techno.com>

> Date: Mon, 15 Sep 1997 11:05:12 -0400
> From: Chris Maden <crism@ora.com>
> 
> I believe that XML authors are largely going to refer to images simply
> by URLs instead of entities; in that case, file system associations or
> HTTP headers can be used to ascertain the entity's type.  In cases
> where NDATA entities are used, I would recommend that XML implementors
> ignore the system identifier of the notation, and make their decision
> based on the entity itself.

I would caution against ignoring the declared notation for an entity,
since it may be used to specify an interpretation other than the
default interpretation that would be made by the system.

By associating notations with chunks of data, entity declarations
allow the same chunk of data to be viewed in different ways.  The
"classic" example of this is an XML document that is treated as XML in
some places and as plain text in others (possibly as an example in a
book about XML).

It is true that most near-term applications can probably ignore
declared notations, since the web community is already used to the
limitations involved.  This may change, however, as documents become
increasingly object-oriented, providing different views of themselves
for different audiences (as is done with SGML architectures).

-peter

--
Peter Newcomb                           TechnoTeacher, Inc.
peter@petes-house.rochester.ny.us       peter@techno.com
http://www.petes-house.rochester.ny.us  http://www.techno.com

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From mrc at allette.com.au  Tue Sep 16 00:42:02 1997
From: mrc at allette.com.au (Marcus Carr)
Date: Mon Jun  7 16:58:27 2004
Subject: Microsoft's XSL Proposal
References: <3.0.1.32.19970915092937.007a73d0@livepage.com>
Message-ID: <341DB990.AA5A35AD@allette.com.au>

Russell Chamberlain wrote:

> I'm _extremely_ happy that someone has put forward an XML-formatting proposal
> (XSL - Extensible Style Language) to the W3C...

Can you direct us to the draft proposal?

--
Regards

Marcus Carr                  email:  mrc@allette.com.au
_______________________________________________________________
Allette Systems (Australia)  email:  info@allette.com.au
Level 10, 91 York Street     www:    http://www.allette.com.au
Sydney 2000 NSW Australia    phone:  +61 2 9262 4777
                             fax:    +61 2 9262 4774
_______________________________________________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From donpark at quake.net  Tue Sep 16 02:39:09 1997
From: donpark at quake.net (Don Park)
Date: Mon Jun  7 16:58:27 2004
Subject: Microsoft's XSL Proposal
Message-ID: <199709160038.RAA18714@gw.quake.net>

 http://www.microsoft.com/standards/xml/


-----Original Message-----
From: Marcus Carr <mrc@allette.com.au>
To: xml-dev@ic.ac.uk <xml-dev@ic.ac.uk>
Date: Monday, September 15, 1997 3:42 PM
Subject: Re: Microsoft's XSL Proposal


>Russell Chamberlain wrote:
>
>> I'm _extremely_ happy that someone has put forward an XML-formatting
proposal
>> (XSL - Extensible Style Language) to the W3C...
>
>Can you direct us to the draft proposal?
>
>--
>Regards
>
>Marcus Carr                  email:  mrc@allette.com.au
>_______________________________________________________________
>Allette Systems (Australia)  email:  info@allette.com.au
>Level 10, 91 York Street     www:    http://www.allette.com.au
>Sydney 2000 NSW Australia    phone:  +61 2 9262 4777
>                             fax:    +61 2 9262 4774
>_______________________________________________________________
>
>
>xml-dev: A list for W3C XML Developers
>Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
>To unsubscribe, send to majordomo@ic.ac.uk the following message;
>unsubscribe xml-dev
>List coordinator, Henry Rzepa (rzepa@ic.ac.uk)
>
>


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jtauber at jtauber.com  Tue Sep 16 03:06:44 1997
From: jtauber at jtauber.com (James K. Tauber)
Date: Mon Jun  7 16:58:27 2004
Subject: Microsoft's XSL Proposal
Message-ID: <01BCC280.5F482FA0.jtauber@jtauber.com>

On Monday, 15 September 1997 11:04, Russell Chamberlain 
[SMTP:russc@livepage.com] wrote:
<snip/>
> At 10:29 AM 97/09/15 -0700, you [Andy Cogan] wrote:
> >How did you find out about the XSL initiative? It seems like a major new
> >development, and I hate the feeling of being blindsided by being
> >ignorant of such important efforts.
<snip/>
> I must confess that it was forwarded to me by a co-worker (thanks, Rich!)
> who subscribes to the list. I wouldn't have heard of it, otherwise.
<snip/>

I'll try to make information like this available as soon as possible on 
http://www.jtauber.com/xml/
If you fill out the form at the bottom of the page, Netmind will email you 
whenever the page has been updated.

James
--
James K. Tauber / jtauber@jtauber.com
Perth, Western Australia
XML Pages: http://www.jtauber.com/xml/


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jjc at jclark.com  Tue Sep 16 07:37:40 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun  7 16:58:27 2004
Subject: Microsoft's XSL Proposal
References: <3.0.1.32.19970915092937.007a73d0@livepage.com>
Message-ID: <341E1692.5FF1A8B2@jclark.com>

Russell Chamberlain wrote:
 
>     7) Has a clearly-defined conflict-resolution mechanism
> 
>        Some formatting environments specify that the "first"
>        applicable style in the stylesheet is always the one to
>        be applied. A stylesheet's behaviour should not change
>        based on the location of a style in the stylesheet's
>        source file. XSL will let authors organize their styles
>        in any way they see fit, with no effect on behaviour.
> 
>        Some environments also allow _multiple_ styles to be
>        applied. Which ones, and in what order? Yuck!
>        XSL explicitly states that at most a single pattern
>        will be chosen. Good idea.

Only one construction rule can apply, but multiple style rules can
apply.  However, XSL does have a (hopefully) well defined conflic
resolution mechanism for dealing with this, and it doesn't depend on the
order of the rules in the stylesheet.

James


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jjc at jclark.com  Tue Sep 16 10:24:19 1997
From: jjc at jclark.com (James Clark)
Date: Mon Jun  7 16:58:27 2004
Subject: XSL requests for clarification/suggestions for enhancement
References: <5044147A23FED01195BF00609712EB6B5FA1@FLPS-NTSERVER1>
Message-ID: <341E414E.819E6E71@jclark.com>

Daniel Rivers-Moore wrote:

> What is the best place to get information about just what can go into a
> script? Is there a publicly available specification of the ECMAScript
> language?

You can get the ECMAScript spec from:

  http://developer.netscape.com/library/documentation/javascript.html

James

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From north at synopsys.com  Tue Sep 16 11:50:04 1997
From: north at synopsys.com (Simon North)
Date: Mon Jun  7 16:58:27 2004
Subject: XSL requests for clarification/suggestions for enhancement
In-Reply-To: <341E414E.819E6E71@jclark.com>
Message-ID: <199709160951.LAA00415@cadis.de>

You can get the official ECMA-262 (ECMAScript) spec 
in either MS-Word or Adobe Acrobat (PDF) form directly from 
the ECMA for free from:

http://www.ecma.ch/stand/ecma-262.htm

Simon.

Simon North                      north@synopsys.com
COSSAP Technical Writer, Aachen, Germany

To be or not to be, those are the parameters.

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From David.Rosenborg at uab.ericsson.se  Tue Sep 16 15:19:04 1997
From: David.Rosenborg at uab.ericsson.se (David Rosenborg)
Date: Mon Jun  7 16:58:27 2004
Subject: Recent XML WG decisions
In-Reply-To: <199709130538.WAA07282@boethius.eng.sun.com>
References: <199709130538.WAA07282@boethius.eng.sun.com>
Message-ID: <199709161318.PAA11663@uabs19c25.eua.ericsson.se>


Tim Bray wrote:

>  So XML is now case-sensitive.

Sounds good, but what is the general opinion about case-sensitivity
in XML applications? My own feeling is that it might be appropriate
too have case insensitivity when you for example do a structural
search in an XML browser or editor. It could also be useful when
specifying patterns in XSL and alike. These things may of course fail if
the document designer has chosen to distinguish elements only by
case but I think that's unlikly to happen. I also have the
feeling that the problem of case insensitive string comparison
is not as dificult as the one of case folding. Case folding
is a one to one mapping that might not be the same for different
languages but when comparing strings you can treat groups of character,
differentiated only in case and diacritics, to be the same. For
example the characters i, ?, ?, ?, I, ? etc could be treated as being
equal in this situation. Is this a correct assumtion or am I missing
something?

Cheers,

</David>

______________________________________________________________________________
David.Rosenborg@uab.ericsson.se             Ericsson Utvecklings AB (UAB/K/UG)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From David.Rosenborg at uab.ericsson.se  Tue Sep 16 16:21:02 1997
From: David.Rosenborg at uab.ericsson.se (David Rosenborg)
Date: Mon Jun  7 16:58:28 2004
Subject: Case sensitivity (Clarification)
In-Reply-To: <199709161337.IAA32194@mcconnel.ac.sil.org>
References: <199709161337.IAA32194@mcconnel.ac.sil.org>
Message-ID: <199709161420.QAA11840@uabs19c25.eua.ericsson.se>


robin@mcconnel.ac.sil.org writes:

> I think the discussion was just related to NAMECASE GENERAL NO
> in the SGML declaration, having therefore to do only with SGML
> names.  Your post to XML-DEV made me wonder if you were talking
> about character text in content...

No, I was thinking of the SGML names. As far as I can understand
case-sensitivity is only for the XML language it self i.e start
and end tags should match in case and also match the case of
a possible element declaration etc. This implies that the
parser also is case sensitive. But the actual application
accessing the resulting grove (if one is built) could
be case insensitive even about SGML names. My question was what
people think of this and also if my assumtions about
comparing strings case insensitively were right.

Thanks

</David>

______________________________________________________________________________
David.Rosenborg@uab.ericsson.se             Ericsson Utvecklings AB (UAB/K/UG)

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Tue Sep 16 19:16:44 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:28 2004
Subject: whitespace
In-Reply-To: <341C0F4B.5FDB@club-internet.fr>
References: <199709140033.UAA08971@csb.bu.edu>
Message-ID: <v03007805b0446fb1d028@[205.181.197.124]>

At 11:22 AM -0500 9/14/97, Arnaud Le Taillanter wrote:

>Inside the XML WG mailing list the WS issue was surely
>extensively discussed, but I don't have access to
>the archive of this discussion. I know it's already
>a favor that the XML draft is made public (all drafts
>and standards of W3C are public, I think this
>helps) and that XML WG members are participating
>in the xml-dev mailing list (they could avoid it).

I agree that it's rather unfair of me to make a reference to a discussion
that I can't produce.

>Well, I ask for another favor: could you please make the
>discussion about WS that led to the WG decision
>available on line? After such a reading, everybody
>could become convinced of the appropriate nature
>of the WG decision. Please!

Well, it's up to the W3C, not me -- as a member of the SIG (not even the
decision-making part of the working group) I have no power to do this.
There were some public archives of some parts of the discussion -- I think
this is no longer allowed for the current discussions, under the W3C's
confidentiality rules.

You could try an Altavista search for my name -- it used to come up with a
WWW archive of the old mailing list, and the URL may still work.

I do doubt that people will want to re-read that discussion, however, once
they have seen it. I was not exaggerating when I put the count at hundereds
of messages. Most of these were repetitive, because the total list of
factors involved, in the end, is the short list in my mail. The desire for
simple rules, and need to work without DTDs the same way as with DTDs, and
the desire for SGML compatibility all needed to be balanced. In fact, they
were incompatible -- SGML as it stands has complicated rules, that we
finally asked the ISO to relax. And _any_ solution that differentiates
element content from mixed content requires a DTD or other declaration
(under SGML rules or even new ones). The proposal to add a new declaration
for element content was abandoned because it's rendundant with a DTD, and
confusing without -- a likely source of errors rather than a convenience.

>> Every variation
>> you discussed has been gone over and they all were either:
>>   1. unworkably complex (like the current SGML rules, whihc few
>> remember and even fewer remember correctly)).
>
>Agreed.

So we have point 1 nailed down.

>>   2. Not compatible with SGML, or unworkably ugly like the proposal to
>> quote all literal text.
>
>If SGML rules concerning WS are to be discarded, any
>other rule adopted is incompatible, including the draft rule.

Yes, but the ISO was willing to add the pass-all-whitespace rule to SGML,
and it wil be official in a few months. No other proposal also solved the
very real problems of SGML->SGML transformation caused by parsers hiding
whitespace, and so there was little independent reason to add them into
SGML.

That nails down point 2.
>
>>   3. Failed to work without a DTD. This is the kicker, and it's
>> required by XML because you don't always have the DTD, and different
>> results in the has-DTD/doesn't-have-DTD cases are unacceptable.
>
>I agree.

So that nails down point 3. And we really agree! :)

.... oh:
> The tree structures must be exactly the same in either case.
>Some constraint regarding WS is necessary on the way to input an
>XML text I assume.

I'm not sure what you mean, here. Any method for ignoring whitespace must
enable:
  1. explicit whitespace to be posible wherever is is wanted (including
near element boundaries).

  2. Line-breaks to be preserved for some (verbatim, or <pre>-style) elements.

  3. Can't depend on the DTD or other declarations to control it.

The simplest proposal that does this is to pass all whitespace.

The only real drawback is that _some_ applications (like table formatters)
may have to explicitly ignore whitespace in _some_ contexts where a
traditional SGML parser would have been able to do it for them. Linking
applications must deal with (count), and can't ignore whitespace chunks
that in some cases may have little meaning to a user.

The benefits are "simplest possible rule", easy XML->XML transduction that
preserves the original formatting, a dependable way to count character data
in documents that contain whitespace, regardless of whether you have a DTD.

>> The recent change (to normalize all linends) fills the one hole the
>> previous proposal had -- because it was nearly certain that some
>> processes would blindly change CRLF and their ilk anyhow.

Note that this is only data normalization permitted in XML, and that it
only warrants processes like the changing of line-ending conventions (eg
from PC to Mac) -- that we all know would have taken place anyway, causing
errors, even if they were explicitly prohibited by the standard.

>> My advice: don't waste your bytes complaining about this -- we've
>> heard it _all_ before -- and the solution that works best is to leave
>> it to the application.
>
>I am sure I will get
>convinced when I read the WG discussion :-)
>Or I fear the WG members will have to hear it all (and more)
>again :-))

    My advice was just advice about what expectations you could have of
_results_ from whatever discussion ensure. Feel free to discuss whitespace
to your heart's content. But don't expect XML to change.

I'll see if there's any way the archives of the whitespace debate can be
made available, but I can honestly say that they're painful rather than
enlightening reading. Expect to devote several days to the reading, too, if
they do becom public.

I was a chief proponent of the current approach, even at the beginning,
when most in the group did not want to do anything so radical, so I agree
that explanations of the decision are worthwhile -- and I've tried to
contribute such -- but I'm certainly not going to read an extended rehash
on the issue. I've devoted my pound(s) of flesh to whitespace already.

  -- David

RE delenda est!

David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Thu Sep 18 16:13:55 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:28 2004
Subject: Re  Whitespace
Message-ID: <199709181413.PAA06449@mail.iol.ie>


Sorry for the lateness of this reply. It got a bit lost in my out-box for a
while!

[Sean Mc Grath]
>>Throw out that grep, that text editor, that fgets(), that diff,sort,uniq
>>utility There all busted for XML use.
>
[David Durand]
>gets is of course Broken As Designed, as the cause of most security bugs in
>Unix systems.

Sorry David, I cannot let you get away with that one. I said *fgets()* which
is an entirely different function to gets(). It takes
three paramaters one of which is the maximum number of characters to read.
It is not Broken As Designed.

>
>Again, they are broken for XML use with files created a particular way.
>They are also broken for HTML files created the same way, and I don't hear
>the weeping and wailing.

No weeping and wailing required because it is typically possible to splice in
line-ends into HTML *without affecting the content*. This is not the case
with XML.

>Can you suggest any solution to the "grep" problem other than requiring a
>fixed line-max in XML.

Yes. Ignore all line ends. I know this presents its own set of difficult
problems
but I'd prefer to tackle these - and maintain compatability with a decades worth
of tools - rather than break the tools.

> Do you think that that hideous hack to accomodate
>defective (if very useful) tools is really worth it.
Yes. Line oriented text processing has been a hugely popular paradigm for
many years now. I don't think of these tools as "defective" at all. I dare
say many wielders of these tools are of the same opinion. These people will
be rightly miffed at the suggestion that they are defective by virtue of the
use of a line oriented paradigm. They will also be rightly miffed that they
cannot bring their tools/skills to bear in the XML world.

>Can you suggest how we
>would determine that buffer size?
Question is Broken As Designed. No need for a silly fixed limit. Just a
recognition
of the existence *of* limits and a standardised mechanism for dealing with them.

Sean Mc Grath
sean@digitome.com
www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Thu Sep 18 18:09:41 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:28 2004
Subject: Re  Whitespace
In-Reply-To: <199709181413.PAA06449@mail.iol.ie>
Message-ID: <v03007807b0470ac84fbd@[205.181.197.101]>

>Sorry for the lateness of this reply. It got a bit lost in my out-box for a
>while!
>
>[Sean Mc Grath]
>>>Throw out that grep, that text editor, that fgets(), that diff,sort,uniq
>>>utility There all busted for XML use.
>>
>[David Durand]
>>gets is of course Broken As Designed, as the cause of most security bugs in
>>Unix systems.
>
>Sorry David, I cannot let you get away with that one. I said *fgets()* which
>is an entirely different function to gets(). It takes
>three paramaters one of which is the maximum number of characters to read.
>It is not Broken As Designed.

No, but fgets (unlike gets) can deal with long lines --- you have to
recognize that you overflowed and make accomodations, but you can do the
right thing. iw as giving you the benefit of the doubt, since gets, at
least, has the problem that you are raising, while fgets does not.

>>
>>Again, they are broken for XML use with files created a particular way.
>>They are also broken for HTML files created the same way, and I don't hear
>>the weeping and wailing.
>
>No weeping and wailing required because it is typically possible to splice in
>line-ends into HTML *without affecting the content*. This is not the case
>with XML.

Just try that in tables. You have to know the meaning of the markup, even
in HTML, if you want to do this. Now you can claim that table markup is
broken, and you might be right, but HTML does not suport your argument.

Similarly for pre elements: You can't do anything to lineneds in there --
maybe I'm using a 20K line in <pre> to force horisontal scrolling for a
rhetorical reason.

>>Can you suggest any solution to the "grep" problem other than requiring a
>>fixed line-max in XML.
>
>Yes. Ignore all line ends. I know this presents its own set of difficult
>problems
>but I'd prefer to tackle these - and maintain compatability with a decades
>worth
>of tools - rather than break the tools.

But this creates worse problems: lack of <pre>-style elements, inability to
write XML filters that preserve linespace jsut from generic XML parsers.
No way to use string offsets in linking.

>> Do you think that that hideous hack to accomodate
>>defective (if very useful) tools is really worth it.
>Yes. Line oriented text processing has been a hugely popular paradigm for
>many years now. I don't think of these tools as "defective" at all. I dare
>say many wielders of these tools are of the same opinion. These people will
>be rightly miffed at the suggestion that they are defective by virtue of the
>use of a line oriented paradigm. They will also be rightly miffed that they
>cannot bring their tools/skills to bear in the XML world.

But they can, they just need to limit their files to crrespond to the
limitation of their tools. People do this all the time, without difficulty.
Of course if the world at large decides to abandon the "line paradigm" then
those who stick to it will be inconvenienced. But then if "the world" make
the shift, then there's still not a very big problem, is there?

Even in that case, with some (usually minimal) human intervention, such
linend conversion/insertion is trivial in practice.

I'm sorry I still don't see how this is _worse_ than what we have with text
files today. And compared to HTML and SGML, I think XML's rules are more
consistent, and useful for more things.

I deal with the Mac (where line == paragraph), as well as Unix, all the
time. This problem is not usually of more than 10 seconds concern on the
few times in a month that it comes to mind. On occasion, of course, I find
myself spending 1-10 minutes in an editor fixing things (usually by
invoking a "wrap" command of some sort).

>>Can you suggest how we
>>would determine that buffer size?
>Question is Broken As Designed. No need for a silly fixed limit. Just a
>recognition
>of the existence *of* limits and a standardised mechanism for dealing with
>them.

I can't imagine what such a mechanism is: IBM text editors for decades had
an 80-character limit. Some still work best with 72 column files. if XML is
supposed to require lines no longer than some limit, we need to specify
that limit in the standard. Otherwise all we can say is that any XML
processor is free to reject any document if the lines are "too long for
that tool". That's en even worse prescription for interoperability.

If there are limits, a standard has to tell you how to be safe and not
break any of those limits. At least, a good standard should.

 -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Thu Sep 18 20:40:28 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:28 2004
Subject: Re  Whitespace
Message-ID: <199709181840.TAA04606@mail.iol.ie>

[Sean Mc Grath]
>>
>>Sorry David, I cannot let you get away with that one. I said *fgets()* which
>>is an entirely different function to gets(). It takes
>>three paramaters one of which is the maximum number of characters to read.
>>It is not Broken As Designed.
>
[David Durand]
>No, but fgets (unlike gets) can deal with long lines --- you have to
>recognize that you overflowed and make accomodations, but you can do the
>right thing. iw as giving you the benefit of the doubt, since gets, at
>least, has the problem that you are raising, while fgets does not.
>
[Sean Mc Grath]
You mentioned gets(). I didn't. How your insertion of an irrelevant reference
to gets() can be construed as giving me "the benefit of the doubt" I don't know.

[Sean Mc Grath]
>>No weeping and wailing required because it is typically possible to splice in
>>line-ends into HTML *without affecting the content*. This is not the case
>>with XML.
>
[David Durand]
>Just try that in tables. You have to know the meaning of the markup, even
>in HTML, if you want to do this. Now you can claim that table markup is
>broken, and you might be right, but HTML does not suport your argument.

[Sean Mc Grath]
Why not? Why cannot I replace say, "<TD>" with "<TD>\n" everywhere?
The problem then reduces to long data chunks such as...
pre elements:-

[David Durand]
>
>Similarly for pre elements: You can't do anything to lineneds in there --
>maybe I'm using a 20K line in <pre> to force horisontal scrolling for a
>rhetorical reason.

[Sean Mc Grath]
Absolutely agreed. the <data><line end><data> case is fundamentally different.
These line-ends are truly part of the data and a processor that adds new ones
is blowing the integrity of the data. Thus the plausible argument in favour
of not
using line-end as data content.

[David Durand]
>
>>>Can you suggest any solution to the "grep" problem other than requiring a
>>>fixed line-max in XML.
>>
[Sean Mc Grath]
>>Yes. Ignore all line ends. I know this presents its own set of difficult
>>problems
>>but I'd prefer to tackle these - and maintain compatability with a decades
>>worth
>>of tools - rather than break the tools.
>

[David Durand]
>But this creates worse problems: 

[Sean Mc Grath]

Worse?

[David Durand]
>lack of <pre>-style elements

Broken As Designed. If something has to give I think <pre> elements should
be first to go.
Alternatively the problem can alway be "arcformed" away. We use 
     <!ATTLIST <e> DIGITOME CDATA #FIXED "PREFORM">
all the time. Our pretty printing, word wrapping SGML processing tools use
this to
avoid adding extraneous WS that would blow the data content.

[David Durand]
>, inability to write XML filters that preserve linespace jsut from generic
XML parsers.

[Sean Mc Grath]
Line ends (at least those) tipping up to start-end tags would *not* be part
of the data. They
could thus be added/dropped without effecting the data. The CGR output of
the grove
would be the final arbiter on "equivalence" and the launching pad for
offsets used in
addressing.

>No way to use string offsets in linking.

If it ain't got a representation in the grove it ain't in the data and thus
is not counted
when totting up offsets.

[David Durand]
>
>>> Do you think that that hideous hack to accomodate
>>>defective (if very useful) tools is really worth it.

[Sean Mc Grath]
>>Yes. Line oriented text processing has been a hugely popular paradigm for
>>many years now. I don't think of these tools as "defective" at all. I dare
>>say many wielders of these tools are of the same opinion. These people will
>>be rightly miffed at the suggestion that they are defective by virtue of the
>>use of a line oriented paradigm. They will also be rightly miffed that they
>>cannot bring their tools/skills to bear in the XML world.

[David Durand]
>But they can, they just need to limit their files to crrespond to the
>limitation of their tools. People do this all the time, without difficulty.


[Sean Mc Grath]
No difficulty?

Problem : I receive an XML file from a user who works with <1024 lines in
his tools.

I use <512. how do I munge his file to suite my tools? I can't without
blowing the data. If tag-tipping line ends were transient I could make 
a stab at it. I would still have to address the "<data><line end><data>"
case. But hey! I never said this was simple! I just said that the alternate
set of problems this presents have the benefit of not throwing out our
existing line oriented tools and techniques.

[David Durand]
>Of course if the world at large decides to abandon the "line paradigm" then
>those who stick to it will be inconvenienced. But then if "the world" make
>the shift, then there's still not a very big problem, is there?

[Sean Mc Grath]
That is one-helluva shift IMHO! I am not sure to what extent the world is
   a) aware of this aspect of XML
   b) willing to bite that bullet.
 
[David Durand]
>if XML is
>supposed to require lines no longer than some limit, we need to specify
>that limit in the standard.

[Sean Mc Grath]
No we don't! We need to have a well defined mechanism whereby a tool with
a line length limit of N can work with XML with line length > N without
blowing the integrity of the data.

[David Durand]
>Otherwise all we can say is that any XML
>processor is free to reject any document if the lines are "too long for
>that tool". That's en even worse prescription for interoperability.
>
See above.

[David Durand]
>If there are limits, a standard has to tell you how to be safe and not
>break any of those limits. At least, a good standard should.
>

[Sean Mc Grath]
The standard does not have to establish a limit. It could help users
of "legacy" tools to *cope* with limits though. "Buy/build better tools" is one
line that can be taken but it is not the only one.


Sean Mc Grath
sean@digitome.com
www.digitome.com


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Thu Sep 18 21:38:33 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:28 2004
Subject: Re  Whitespace
In-Reply-To: <199709181840.TAA04606@mail.iol.ie>
Message-ID: <v03007801b0473ad59b1c@[205.181.197.101]>

At 1:40 PM -0500 9/18/97, Sean Mc Grath wrote:
>[David Durand]
>>No, but fgets (unlike gets) can deal with long lines --- you have to
>>recognize that you overflowed and make accomodations, but you can do the
>>right thing. iw as giving you the benefit of the doubt, since gets, at
>>least, has the problem that you are raising, while fgets does not.
>>
>[Sean Mc Grath]
>You mentioned gets(). I didn't. How your insertion of an irrelevant reference
>to gets() can be construed as giving me "the benefit of the doubt" I don't
>know.


Well, as fgets does not support your argument that "long lines cause
problems" I thought it might be a typo for gets (wh/ does have serious
problems w/ long lines, but is of course a canonical example of bad design,
and not something we want to accomodate).

as to fgets, I confess that I don't see that it should have any problem
with anyfile, newline-containing or not. Am I clear now?

>[David Durand]
>>Just try that in tables. You have to know the meaning of the markup, even
>>in HTML, if you want to do this. Now you can claim that table markup is
>>broken, and you might be right, but HTML does not suport your argument.
>
>[Sean Mc Grath]
>Why not? Why cannot I replace say, "<TD>" with "<TD>\n" everywhere?
>The problem then reduces to long data chunks such as...
>pre elements:-

Well, because people use tables to format, and that extra space queers the
pitch, inducing funny spacign bahavior. Agreed that a better table model
could avoid this.

>[David Durand]
>>
>>Similarly for pre elements: You can't do anything to lineneds in there --
>>maybe I'm using a 20K line in <pre> to force horisontal scrolling for a
>>rhetorical reason.
>
>[Sean Mc Grath]
>Absolutely agreed. the <data><line end><data> case is fundamentally different.
>These line-ends are truly part of the data and a processor that adds new ones
>is blowing the integrity of the data. Thus the plausible argument in favour
>of not
>using line-end as data content.

I confess to not understanding why a lineend cannot occur at the beginning
of an element. Even SGML never proposed to remove more than _1_ such line
break.

So you want to take them all away, so that grep won't break.

>[David Durand]
>>
>>>>Can you suggest any solution to the "grep" problem other than requiring a
>>>>fixed line-max in XML.
>>>
>[Sean Mc Grath]
>>>Yes. Ignore all line ends. I know this presents its own set of difficult
>>>problems
>>>but I'd prefer to tackle these - and maintain compatability with a decades
>>>worth
>>>of tools - rather than break the tools.

Well, it makes data rather unrevealing.

And of course, the tools are only broken if common practice leads to the
use of long lines -- and if that becomes the case, then it will only have
been because the tools are _not_ actually that important.

This is a social argument that you have not addressed yet, but it cuts to
the core of why we should not do this... We get a simpler easier model, and
there is  nothing to stop people from any self-imposed discipline their
tools require.

And if people are _not_ following such a discipline, then there's no reason
to worry about the tools, because it can only happen if people are not
using those tools for XML.

>[David Durand]
>>lack of <pre>-style elements
>
>Broken As Designed. If something has to give I think <pre> elements should
>be first to go.
Well, theoretically there's a lot of reasonableness to using explict markup
for such line breaks. But, the pragmatist in me has to note that there has
been _no_ successful markup or document processing language without such a
feature (except for word-processors, but the case there is complicated
because the user never _sees_ the relevant representation.

>Alternatively the problem can alway be "arcformed" away. We use
>     <!ATTLIST <e> DIGITOME CDATA #FIXED "PREFORM">
>all the time. Our pretty printing, word wrapping SGML processing tools use
>this to
>avoid adding extraneous WS that would blow the data content.

Doesn't solve the problem you raised. That data has a long line in it and
grep crashes. You have to split the line, and take the consequences, or not
use grep.
if you don't allow arbitrary line-break introduction anywhere, you haven't
solved the legacy tool problem, which weakens your argument somewhat. If
you do, you've mad it impossible to count on line-breaks _ever_ being
significant. The XML committee considered this and rejected it as too
divergent from current practice (that people did not want to give up).

>[David Durand]
>>, inability to write XML filters that preserve linespace jsut from generic
>XML parsers.
>
>[Sean Mc Grath]
>Line ends (at least those) tipping up to start-end tags would *not* be part
>of the data. They
>could thus be added/dropped without effecting the data. The CGR output of
>the grove
>would be the final arbiter on "equivalence" and the launching pad for
>offsets used in
>addressing.

Yes, and the "looks the same in my editor" arbiter of equivalence would
fail. This has long been felt unacceptable by those who use such
transformations. If any hand-editing is involved it is unacceptable
behaviour to change all the line-ends.

>[Sean Mc Grath]
>>>Yes. Line oriented text processing has been a hugely popular paradigm for
>>>many years now. I don't think of these tools as "defective" at all. I dare
>>>say many wielders of these tools are of the same opinion. These people will
>>>be rightly miffed at the suggestion that they are defective by virtue of the
>>>use of a line oriented paradigm. They will also be rightly miffed that they
>>>cannot bring their tools/skills to bear in the XML world.

>[David Durand]
>>But they can, they just need to limit their files to crrespond to the
>>limitation of their tools. People do this all the time, without difficulty.

Yes, If your editor and tools have a 72 character line limit, you don't
create files with long lines. Then your tools always work. If you want
everyone's tools to always work, and you admit a maximum line-length for
tools, you need to pick that number so I can make files that won't toast
your software. Either that, or someone with different software will exceed
the limits of your software, of whose existence she has never even heard!

>
>[Sean Mc Grath]
>No difficulty?
>
>Problem : I receive an XML file from a user who works with <1024 lines in
>his tools.
>
>I use <512. how do I munge his file to suite my tools? I can't without
>blowing the data. If tag-tipping line ends were transient I could make
>a stab at it. I would still have to address the "<data><line end><data>"
>case. But hey! I never said this was simple! I just said that the alternate
>set of problems this presents have the benefit of not throwing out our
>existing line oriented tools and techniques.

Look, we have a solution. Proposing a new solution based on a new problem
(grep and other tools with hard line-length limitations) requires that the
new solution actually _solve_ the problem. Your solution does not solve the
problem you yourself pose, so it's hard for me to take seriously.

>[David Durand]
>>Of course if the world at large decides to abandon the "line paradigm" then
>>those who stick to it will be inconvenienced. But then if "the world" make
>>the shift, then there's still not a very big problem, is there?
>
>[Sean Mc Grath]
>That is one-helluva shift IMHO! I am not sure to what extent the world is
>   a) aware of this aspect of XML
>   b) willing to bite that bullet.

In that case, they create files with short lines, and there is no bullet to
bite. The only way this problem can become common is if long lines become
very popular. I don't see how long lines can become popular if they create
fatal tool problems with popular tools. Either long lines will not be
common, or tools that cope with long lines will be common along with the
long lines themselves.

It's a simple feedback loop. No need to change the standard, just let
people's desire to share data feed back into the general knowledge of what
data is shareable.
>[David Durand]
>>if XML is
>>supposed to require lines no longer than some limit, we need to specify
>>that limit in the standard.
>
>[Sean Mc Grath]
>No we don't! We need to have a well defined mechanism whereby a tool with
>a line length limit of N can work with XML with line length > N without
>blowing the integrity of the data.
How do we do this for legacy tools like grep with a hard-compiled limit
(that is not documented, and varied from vendor to vendor)?
If files that work with arbitrary tools are to be possible, we need to know
the constraints that those tools impose.

>[David Durand]
>>Otherwise all we can say is that any XML
>>processor is free to reject any document if the lines are "too long for
>>that tool". That's en even worse prescription for interoperability.
>>
>See above.

I saw. I didn't see how you're going to fix grep (for your data\ndata
case). Or rather the "40K of data with no \n" case which is the real killer.

>[David Durand]
>>If there are limits, a standard has to tell you how to be safe and not
>>break any of those limits. At least, a good standard should.
>>
>
>[Sean Mc Grath]
>The standard does not have to establish a limit. It could help users
>of "legacy" tools to *cope* with limits though. "Buy/build better tools"
>is one
>line that can be taken but it is not the only one.

Well, how could the standard do that?

Actually, since the standard is almost certainly not going to change, I
don't really care how it could do it. My sense is that people won't do
without <pre> equivalents -- so you can never get total freedom to
remove/add linends. So since the problem is unsolvable, lets not waste
time, and complicate the standard to get a partial solution (ie. solution
that fails to solve the problem) at the cost of a popular feature.

  -- David

I think that's it for me.

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________


xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Sep 19 01:58:21 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:28 2004
Subject: XML-WG and XML-SIG deliberations
Message-ID: <10145@ursus.demon.co.uk>

Two postings on XML-DEV have explicitly or implicitly referred to the 
discussion of XML-SIG and XML-WG. The formal position is that the
discussions of XML-WG (the current W3C-appointed decision-making body) and
XML-SIG (a group of experts who offer advice to XML-WG) are confidential to
W3C member organisations (and the invited experts). This confidentiality
is important as it represents part of the value of being a member of W3C.

There is potential confusion about the archives, since the XML discussion group
was originally called the 'WG' and its archives were (and are) public. They
ended about June 1997 (any precise dates and current URLs for these?) They
are of historical interest and there *might* be some useful discussion there
but there is a huge amount to read through. Maybe some of the whitespace 
discussion is in the public archives, though I wouldn't rush.
The archives of XML-WG since June 1997 (?) are not publicly available. Nor
are those of XML-SIG.

However the discussion on this list, and the publicly reported developments
contributed by posters/readers of this list are valued by the XML-groups. 
For example the recent WG posting emphasised the value of APIs and their
possible co-publication with XML specs. 

The proposal for XSL (XML-STYLE) *is* publicly visible and URLs have been
posted on this list. Unfortunately for XML-DEVers, any XML-SIG and XML-WG 
discussion on this is confidential.  I leave it to any XML-WG readers of this
list to keep XML-DEV aware of what is happening. Perhaps it could be useful
to remind us of the proposed milestones/timescales for the various XML 
components to be published/accepted.

	P. 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jeremy at allaire.com  Fri Sep 19 06:15:15 1997
From: jeremy at allaire.com (Jeremy Allaire)
Date: Mon Jun  7 16:58:28 2004
Subject: Custom Tags
Message-ID: <34220BAC.2F83@allaire.com>

For anyone interested in CFML custom tags:

http://www.allaire.com/TagGallery/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Alice.Portillo at PSS.Boeing.com  Tue Sep 23 23:05:47 1997
From: Alice.Portillo at PSS.Boeing.com (Portillo, Christina)
Date: Mon Jun  7 16:58:28 2004
Subject: Use of Character Escape Codes
Message-ID: <F1B781B35CA8D011B10F00805FEA27F597BA18@xch-rtn-01.ca.boeing.com>

Thought I would share Peter Flynn response on escape codes with you all.

Christina Portillo
Product Definition and Image Technology

The Boeing Company               Phone: 425.237.3351
PO Box 3707   M/S 6H-AF        Fax: 425.237.3428
Seattle, WA  98124-2207            christina.portillo@boeing.com


> ----------
> From: 	Peter Flynn[SMTP:pflynn@imbolc.ucc.ie]
> Sent: 	Monday, September 22, 1997 7:15 PM
> To: 	Christina Portillo
> Subject: 	Use of Escape Codes and Characters
> 
> At 20:13 22/05/97 +0100, you wrote:
> >Q == "Question=0D How do you encode in your XML document references
> to=
> >characters above 126 in the ISO646 character set. 
>
>So of the character=
> >classes defined in the standard: space, char, letter, Base Char, =
> >Ideographic, CombiningChar, Letter, Digit, Ignorable, and Extender
> which=
> >of these has to be escaped to be used in a document. OR from what =
> >index value down must escape codes be used."
> 
> I'm sorry to have delayed answering this but the character set
> question
> became rather vexed :-)
> 
> The simple answer is you escape any code you can't type as a character
> or byte combination. In other words, if you are working in ASCII, but
> you can generate an e-acute with the correct code (ie ISO 10646, not
> Windows :-) then you should be able to do so, and embed that byte in
> the file. If you need a Hangul glyph and you can't type it, then you
> need to use the escaped code: presumably users on Hangul systems can
> generate all their own characters at the keyboard. 
> 
> But in practice I think we'll need to see how/if the browsers
> implement 
> non-Latin character repertoires.
> 
> ///Peter
> 

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From gannon at commerce.net  Thu Sep 25 01:58:25 1997
From: gannon at commerce.net (Patrick Gannon)
Date: Mon Jun  7 16:58:28 2004
Subject: XML iMarket Project Planning Meeting
Message-ID: <01BCC907.BE1F2F00@arrow-d83.sierra.net>

CommerceNet XML iMarket Project Team,

The XML iMarket Project Planning Team will meet on Monday, October 6, 1997, 9:00am to 12:00pm PDT.  

The meeting location will be the CommerceNet offices, 4005 Miranda Ave, Suite 175, Palo Alto, CA 94304 (650-858-1930) unless otherwise notified.  

I will arrange for 800# conference call facilities for those unable to attend in person and send the 800# information to those who have replied and confirmed their interest in participating.

If you can attend, please reply confirming whether you will be able to attend in person or whether you will attend via the 800# conference call.  Please note that attendence in person or phone is limited to members of CommerceNet's Information Access Portfolio only.

The goal of the meeting is to develop a detailed project plan and Request For Proposal (if needed) to identify companies or consultants with expertise required to help on the project.  The iMarket Project is designed to take the XML catalog files and Document Type Definition files produced during the recently completed XML Catalog project.  The general plan is to build a demonstration virtual marketplace which utilizes the multiple vendor XML catalogs with standard DTDs and allows shoppers to search for products across vendors by specifying product and merchant attributes.  Another goal of this project is to demonstrate how the use of XML stylesheets will allow vendors/merchants to maintain "brand equity" while using common description templates (DTDs).

The XML Catalog tutorial and sample XML/DTD files are available for members at:
http://members.commerce.net/pw/portfolios/access/xml/xml-demo.html

CommerceNet IA Portfolio Members, please review these XML documents and let me know if you or someone else in your company is interested in participating.

Non-members, please reply if you are interested in becoming members or being put on the RFP list.

Thank you for your continued support.

Patrick Gannon, Executive Director
Information Access Portfolio, CommerceNet
http://www.commerce.net/services/portfolios/
------------------------------------------------------
President & CEO, Internet Shopping Directory, Inc.
865 Tahoe Blvd., Suite 211, Incline Village, NV  89451
702-831-2251   702-831-3925 (Fax)
mailto://patrick@shoppingdirect.com
http://www.shoppingdirect.com


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Thu Sep 25 17:53:42 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:28 2004
Subject: XML iMarket Project Planning Meeting
In-Reply-To: <CMM.0.90.0.875192558.ark@Office.Stanford.EDU> (message from Arthur Keller on Thu, 25 Sep 97 6:02:38 PDT)
Message-ID: <199709251550.IAA13057@boethius.eng.sun.com>

| The requirement of standard DTDs by all vendors and participants
| presumes that these are adequate to satisfy the differentiation needs
| of the various participants.  "Brand equity" is not sufficient
| differentiation.  Rather, one company may use more detailed
| characteristics than another company in order to differentiate their
| products.

I think you're missing the point.

What I as a consumer want to be able to do is quite simple.  I want to
be able to say, "Hey, I need a new jacket," sit down at my computer,
call up my find-a-product robot, enter my jacket parameters, and then
come back a while later to find all the jackets that fit those
parameters offered by all the vendors whose products I'm interested in
considering.  If the catalog scheme isn't standardized enough to
support this, then I as a consumer am not interested in using it.  If
one of the vendors differentiates itself by adopting a scheme of data
representation that doesn't allow this kind of transparent direct
comparison, then it differentiates itself right out of the class of
vendors I'm interested in, because if all it's giving me is the
ability to cruise its catalog in isolation, I can get the same
functionality from the printed version; it no longer participates in a
way that allows the net to add value to me as a consumer.

I'm not denying that vendors will want to differentiate their
offerings, but if they can't do it in a way that supports detailed
direct comparisons based on the differentia that I am interested in
*as a consumer* then they are simply not in the game at all.

Jon


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From srn at techno.com  Thu Sep 25 22:12:40 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:58:29 2004
Subject: XML iMarket Project Planning Meeting
In-Reply-To: <199709251550.IAA13057@boethius.eng.sun.com>
	(Jon.Bosak@eng.Sun.COM)
Message-ID: <199709252008.QAA01199@bruno.techno.com>

[Jon Bosak:]

> What I as a consumer want to be able to do is quite simple.  I want to
> be able to say, "Hey, I need a new jacket," sit down at my computer,
> call up my find-a-product robot, enter my jacket parameters, and then
> come back a while later to find all the jackets that fit those
> parameters offered by all the vendors whose products I'm interested in
> considering.  If the catalog scheme isn't standardized enough to
> support this, then I as a consumer am not interested in using it.  If
> one of the vendors differentiates itself by adopting a scheme of data
> representation that doesn't allow this kind of transparent direct
> comparison, then it differentiates itself right out of the class of
> vendors I'm interested in, because if all it's giving me is the
> ability to cruise its catalog in isolation, I can get the same
> functionality from the printed version; it no longer participates in a
> way that allows the net to add value to me as a consumer.
> 
> I'm not denying that vendors will want to differentiate their
> offerings, but if they can't do it in a way that supports detailed
> direct comparisons based on the differentia that I am interested in
> *as a consumer* then they are simply not in the game at all.

There is a very serious problem here that bears strikingly on an
ongoing discussion in XML-land: the discussion of so-called
"namespaces".  The idea that there will be consortia of vendors, or
any other sort of authority who will determine some list of names of
characteristics of each sort of product, so that characteristics can
be directly and automatically compared, is dangerous to innovation,
competition, and commerce, and it is totally unnecessary, too.  It
will open the door for existing businesses to use such architectures
as weapons against upstarts in niche markets and in unusual or new
market combinations.  Moreover, the use of information architectures
as weapons will always seem like perfectly reasonable business
practices, so it will be nobody's fault when new concepts fail to be
accepted in the marketplace, because the internet failed to live up to
its promise of helping people find what they are looking for and make
informed purchasing decisions.  The macroeconomy will be damaged.

Andrew Layman (whom I do not know, but would like to) has laid out a
list of requirements for the implementation of namespaces which, if
used as guidance in the development of XML's namespace features, will
create a need for authorities who give "standard" names to such things
as product characteristics.  The concentration of power in such
authorities will hinder innovation, by making it difficult to compare
products regarded as "out of category" for some authority's set of
defined names.  I quote from Andrew's "Universal Names" posting of 23
September 1997 on the w3c-xml-sig@w3.org list:

  [Andrew Layman:]

  I've agreed to summarize the set of requirements that I have
  championed in the past under the term "namespaces." Because this
  word has also meant several alternate sets of requirements, I'm
  temporarily using an entirely different term, "universal names," so
  that we can understand this set of requirements without being
  confused by other useful, but different, goals.  ...

  [Here] I'm going to describe one set of requirements, as best I
  understand it, in my own words. The name is not important. This set
  of requirements is.  ...

  Let me mention a few things that are not requirements of this
  facility.  They may be useful features in some other context, but
  they are not needed in order to have universal names, and should not
  be confused with universal names:

  We do not require an ability to rename elements, so that they can be
  called one thing in a schema and something else in a document instance.
  We do not require the ability to associate multiple semantic meanings
  with a single name.

  In short, what we need, and all that we need, is a facility that
  gives every element's type a universal name, and allows a single
  element type to be known by the same name across disparate
  documents, where the documents have different "document types" or
  where there is no specific document type.


When Andrew Layman says, "We do not require an ability to rename
elements, so that they can be called one thing in a schema and
something else in a document instance," he is backhandedly stating a
requirement that conflicts with the evolutionary process of defining
and marketing new products.  How will the catalog of everything that
is for sale handle a case where the same product characteristic, or
even the same entire product, arises from multiple industries
simultaneously, and each of those industries already uses its own
authoritative schema?  Will the contents of documents have to be
duplicated and translated so as to conform with multiple schemas, so
that different comparisons can be made?  If so, that will cause much
of the value of making the comparisons in the first place to be lost;
features regarded by authorities as "out of category" will simply
disappear.  Imagine a single device that is a fax machine, a
telephone, a copier, a computer, and a stereo sound system.  Should it
appear in a list of telephones?  Maybe.  Should the output wattage of
its amplifier be listable in a comparison with the output wattage of
other telephones?  Maybe.  Should the people who figure out what are
the interesting characteristics of telephones anticipate that output
wattage may be an important characteristic of telephones?  It's
completely unrealistic to expect those people to anticipate that.
And, yet, it's an interesting and relevant statistic and it may be
important to some consumers.

The ugly truth is that we can't predict whether information that is
now thought to be irrelevant to other information (or, maybe we don't
even know about the existence of the other information yet) will turn
out to be semantically identical or semantically mappable.  In my own
mind, anyway, the real justification for the existence of businesses
that provide "yellow pages on steroids" in support of internet
commerce is to provide the added value of mapping semantics to each
other in such a way that they can be directly compared, just as Jon
says.  That mapping can be expressed in some proprietary fashion, or
it can be done using SGML documents that inherit from multiple SGML
architectures, or, if XML supports it, it can be done with XML
documents that inherit from multiple XML architectures, with no limit
on the number of XML architectures that can be inherited, and no
limits on the number of architectures that can usefully be fielded by
old and new industries.  If Andrew Layman's much more limited
requirements govern the design of XML, though, XML documents that
represent such semantic mappings will be more costly to create and
maintain.  (I guess you'd have to do it all with hyperlinks.  Anything
can be done with hyperlinks, but that doesn't mean that everything
*should* be done with hyperlinks.  In general, hyperlinks are best
regarded by information managers as a last resort because they cost
more to maintain and their structure is arbitrary and external.  It's
better if the information, in effect, maps itself.  Inheritable SGML
architectures allow information to map itself in complex ways.  Why
shouldn't it be possible to accomplish the same end in XML, without
requiring the use of hyperlinks?)

So, I continue to harp on the importance of allowing a single element
to inherit multiple semantics (and/or the _same_ semantic differently
named or named within different namespaces).  Andrew Layman says, "We
do not require the ability to associate multiple semantic meanings
with a single name."  But, in my own mind, anyway, this really *is* a
requirement for cataloging companies to extract maximum value from
their listings at minimum information management cost in a dynamic,
non-authoritarian market environment.  It would allow internet catalog
providers to map each new DTD into their existing DTDs simply by
tweaking their existing DTDs.  For example, in the DTD for their
catalog of telephone products, when the output wattage issue first
arises (i.e., when a telephone appears on the market that lists an
output wattage), a declaration is added that allows the
characteristics listed in the DTD for the manufacturer's product
description document to be inherited.  In the same declaration, the
features of the product, such as its "colour", can be mapped to the
things that are the same that are already in the DTD, (such as
"color").  The new feature, "outputWattage", can be made to appear
with a default value of "not applicable", so now all the existing
telephone product listings have this feature, and they can all respond
meaningfully (if uninterestingly) to queries about it.  No need to
create and maintain (!) any hyperlinks.  No need to write or maintain
any extra documents.  One change in one place updates all telephone
products listed in the catalog, regardless of how many there are.  The
amount of information stored hardly increases at all, but the value of
the information increases quite a lot.  Essentially the same change
can be applied to the DTDs for stereo systems (now they can have a
redial feature, yes or no), the DTD for copiers, etc.  Cheap and very
powerful, no?  The catalog provider gets to add a terrific amount of
value at very little cost.  New products can be found by consumers
even if they didn't know the hybrid category existed.  ("I want a very
loud telephone.  Hmmm.")  New products for untried niches can be
usefully listed in multiple catalogs.  Innovation is not penalized for
being unanticipated by the authorities who created DTDs for product
listings in various categories, or by the failure to recognize a
viable category.  Indeed, there is no need for such authorities at
all.  There is only a need for catalogers who can read and understand
incoming DTDs and perform these cheap semantic mapping tricks.

You can do all this now with SGML (as of August 1, 1997; see
http://www.ornl.gov/sgml/wg8/document/1920.htm).  The only question is
whether XML will be able to do it.  Maybe it doesn't matter; providers
of internet shopping directories can always maintain their source
information in SGML and simply deliver it in XML form, if they like.
(Or in HTML form, for that matter.)

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Thu Sep 25 22:34:16 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:29 2004
Subject: XML iMarket Project Planning Meeting
Message-ID: <1.5.4.32.19970925200436.01683f9c@pop.mindspring.com>

At 10:14 AM 9/25/97 -0700, ark@DB.Stanford.EDU wrote:

>I certainly agree with your goal, but I don't agree with the means.
>The experience I have is that standards do not work well in this area.
>What we need is an approach that allows the cross-comparison that you
>want, and yet allows for differentiation, experimentation, and
>evolution.

Perhaps the standards could describe architectural forms which would be the
basis for more individual DTDs created by each vendor. This allows searches
to be done for anything in the architectural forms, but still allows each
vendor to have additional information. Because each vendor has a DTD,
documents can still be validated when they are authored, even though they
have vendor-specific information. Because the DTDs are based on common
architectures, searches can be done across vendors.

Jonathan


***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From gannon at commerce.net  Fri Sep 26 00:24:09 1997
From: gannon at commerce.net (Patrick Gannon)
Date: Mon Jun  7 16:58:29 2004
Subject: XML & Catalogs
Message-ID: <01BCC9C3.094BCEA0@sphynx-d105.sierra.net>

Steven,

Nice to hear from someone who "gets it" regarding the impact of XML on future usage & searchability of internet catalogs.

Since this topic has spilled over from the original meeting posting and generated significant interest, I will request a listserv be established for xml-catalog.  This will allow for application oriented discussions of XML that are now related to development (XML-DEV) or EDI (XML-EDI) issues that have their own listserv.

Patrick Gannon


----------
From: 	Steven R. Newcomb[SMTP:srn@techno.com]
Sent: 	Thursday, September 25, 1997 1:08 PM
To: 	Jon.Bosak@eng.sun.com
Subject: 	Re: XML iMarket Project Planning Meeting

[Jon Bosak:]

> What I as a consumer want to be able to do is quite simple.  I want to
> be able to say, "Hey, I need a new jacket," sit down at my computer,
> call up my find-a-product robot, enter my jacket parameters, and then
> come back a while later to find all the jackets that fit those
> parameters offered by all the vendors whose products I'm interested in
> considering.  If the catalog scheme isn't standardized enough to
> support this, then I as a consumer am not interested in using it.  If
> one of the vendors differentiates itself by adopting a scheme of data
> representation that doesn't allow this kind of transparent direct
> comparison, then it differentiates itself right out of the class of
> vendors I'm interested in, because if all it's giving me is the
> ability to cruise its catalog in isolation, I can get the same
> functionality from the printed version; it no longer participates in a
> way that allows the net to add value to me as a consumer.
> 
> I'm not denying that vendors will want to differentiate their
> offerings, but if they can't do it in a way that supports detailed
> direct comparisons based on the differentia that I am interested in
> *as a consumer* then they are simply not in the game at all.

There is a very serious problem here that bears strikingly on an
ongoing discussion in XML-land: the discussion of so-called
"namespaces".  The idea that there will be consortia of vendors, or
any other sort of authority who will determine some list of names of
characteristics of each sort of product, so that characteristics can
be directly and automatically compared, is dangerous to innovation,
competition, and commerce, and it is totally unnecessary, too.  It
will open the door for existing businesses to use such architectures
as weapons against upstarts in niche markets and in unusual or new
market combinations.  Moreover, the use of information architectures
as weapons will always seem like perfectly reasonable business
practices, so it will be nobody's fault when new concepts fail to be
accepted in the marketplace, because the internet failed to live up to
its promise of helping people find what they are looking for and make
informed purchasing decisions.  The macroeconomy will be damaged.

. . . 


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From peat at erols.com  Fri Sep 26 01:17:12 1997
From: peat at erols.com (peat)
Date: Mon Jun  7 16:58:29 2004
Subject: XML & Catalogs
Message-ID: <199709252308.TAA11756@smtp1.erols.com>

Before you do this, we need to ask ourselves, is there or should there be a
significant difference in namespace and other mechanisms depending on use of
the object. Is there that much of a difference on how we describe an article;
say a "red sweater" if the item is in a catalog, stored in an object
repository or exchanged in a Purchase Order? Significant enough to split the
group? 

Let me propose we introduce a collaborative means to keeping the collection
(which is still relatively small) of people together. The XML/EDI Group will
soon have this capability through its subgroups and via a generous donation
from outside corporation.  It should be up and running in a few weeks. Just a
thought, before splintering off the main path.

- Bruce
 

----------
Steven,

Nice to hear from someone who "gets it" regarding the impact of XML on future
usage & searchability of internet catalogs.

Since this topic has spilled over from the original meeting posting and
generated significant interest, I will request a listserv be established for
xml-catalog.  This will allow for application oriented discussions of XML
that are now related to development (XML-DEV) or EDI (XML-EDI) issues that
have their own listserv.

Patrick Gannon


----------
From: 	Steven R. Newcomb[SMTP:srn@techno.com]
Sent: 	Thursday, September 25, 1997 1:08 PM
To: 	Jon.Bosak@eng.sun.com
Subject: 	Re: XML iMarket Project Planning Meeting

[Jon Bosak:]

> What I as a consumer want to be able to do is quite simple.  I want to
> be able to say, "Hey, I need a new jacket," sit down at my computer,
> call up my find-a-product robot, enter my jacket parameters, and then
> come back a while later to find all the jackets that fit those
> parameters offered by all the vendors whose products I'm interested in
> considering.  If the catalog scheme isn't standardized enough to
> support this, then I as a consumer am not interested in using it.  If
> one of the vendors differentiates itself by adopting a scheme of data
> representation that doesn't allow this kind of transparent direct
> comparison, then it differentiates itself right out of the class of
> vendors I'm interested in, because if all it's giving me is the
> ability to cruise its catalog in isolation, I can get the same
> functionality from the printed version; it no longer participates in a
> way that allows the net to add value to me as a consumer.
> 
> I'm not denying that vendors will want to differentiate their
> offerings, but if they can't do it in a way that supports detailed
> direct comparisons based on the differentia that I am interested in
> *as a consumer* then they are simply not in the game at all.

There is a very serious problem here that bears strikingly on an
ongoing discussion in XML-land: the discussion of so-called
"namespaces".  The idea that there will be consortia of vendors, or
any other sort of authority who will determine some list of names of
characteristics of each sort of product, so that characteristics can
be directly and automatically compared, is dangerous to innovation,
competition, and commerce, and it is totally unnecessary, too.  It
will open the door for existing businesses to use such architectures
as weapons against upstarts in niche markets and in unusual or new
market combinations.  Moreover, the use of information architectures
as weapons will always seem like perfectly reasonable business
practices, so it will be nobody's fault when new concepts fail to be
accepted in the marketplace, because the internet failed to live up to
its promise of helping people find what they are looking for and make
informed purchasing decisions.  The macroeconomy will be damaged.

. . . 


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

----------


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Sep 26 01:22:26 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:29 2004
Subject: XML & Catalogs
Message-ID: <10312@ursus.demon.co.uk>

In message <01BCC9C3.094BCEA0@sphynx-d105.sierra.net> Patrick Gannon writes:
> Steven,
> 
> Nice to hear from someone who "gets it" regarding the impact of XML on 
> future usage & searchability of internet catalogs.
> 
> Since this topic has spilled over from the original meeting posting and 
> generated significant interest, I will request a listserv be established 
> for xml-catalog.  This will allow for application oriented discussions 

I think there is potential confusion in the word 'catalog', because of the
SGML Open Catalog.  Some XML software such as NXP supports such Catalogs,
although at present (I think) it is not formally part of XML.

If possible I would hope that 'XML Catalog' and xml-catalog (if they exist
at all) were reserved for this usage - otherwise there could be a lot of
confusion. 

A general point is the use of the XML-* prefix. Within XML itself it is
reserved (e.g. xml-space, xml-link) and I think we should avoid pre-empting
possible uses of XML-*.  Of course 'XML-DEV' falls into the same trap... :-)

I'm assuming that this is not a request for Henry and me to set up another
listserv, because one is about our limit :-).

	P.

> of XML that are now related to development (XML-DEV) or EDI (XML-EDI) 
> issues that have their own listserv.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From elm at arbortext.com  Fri Sep 26 01:35:54 1997
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun  7 16:58:29 2004
Subject: XML iMarket Project Planning Meeting
Message-ID: <3.0.32.19970925193549.00ab5490@village.doctools.com>

(I just posted this directly to xml-dev; if any of the iMarket folks wants
to post this to the original recipients of the thread, be my guest...)

At the Montreal face-to-face XML WG meeting, Eliot Kimber mentioned a cool
idea: Schemas can be in the business of providing synonyms for semantics
published in other schemas.  Schemas can also be in the business of
providing mappings from names to multiple schemas.

Thus, if you want to use your own name for something, you can create a
schema (why not even use AF syntax?) that does nothing but map your name to
the "standard" one or to several "standard" ones.  So my personal schema
can map eve:gazorninplat to both dc:subject and docbook:subject if I want
it to.

This could have some interesting consequences:

  o You could chain schemas as much as necessary to get your desired effect.

  o An interesting market in derivative schemas could develop.

  o XML-only documents wouldn't require full AFDR functionality.

So Jonathan's suggestion below could be seen as a suggestion to create a
base schema using AFDR syntax, which others could use directly, or in
modified form by inserting another schema.

I don't know, maybe all this is obvious to everybody else, but seeing the
problem this way blows my mind.  It makes me think that (ironically?) the
first obvious candidate for "non-DTD" schema syntax is AFDRs.

	Eve

At 04:04 PM 9/25/97 -0400, Jonathan Robie wrote:
>Perhaps the standards could describe architectural forms which would be the
>basis for more individual DTDs created by each vendor. This allows searches
>to be done for anything in the architectural forms, but still allows each
>vendor to have additional information. Because each vendor has a DTD,
>documents can still be validated when they are authored, even though they
>have vendor-specific information. Because the DTDs are based on common
>architectures, searches can be done across vendors.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From srn at techno.com  Fri Sep 26 05:11:18 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:58:29 2004
Subject: Retraction and apology
In-Reply-To: <199709252008.QAA01199@bruno.techno.com> (srn@techno.com)
Message-ID: <199709260306.XAA01444@bruno.techno.com>

Some of you who received the note I sent to you earlier today should
not have received the material written by Andrew Layman that I quoted
and which was previously distributed only within the confines of W3C.
I should not have quoted it in a note that was being publicly
distributed.

In my own (pretty weak) defense: I didn't notice that, for example,
the xml-dev list was in the address list; I merely scanned the list of
addresses it to verify that, in fact, it was a list with a lot of
insiders.  I should have verified that the list contained no
*outsiders*, but I inexplicably failed to do that, blithely assuming
from the list's provenance, insider topic, insider tenor, and
recognizable insider addressees that it was a discussion taking place
within the family.  I should have been more careful; this was
definitely a poor algorithm.

I must ask you folks who were not supposed to see the Layman material
to destroy it and forget it.  Anyway, it's an internal discussion,
and, therefore, you can't know the context.

W3C people: I would not blame you for withdrawing my access to the
discussion.  My blunder has caused some pain, and I regret that.

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Fri Sep 26 17:33:38 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:29 2004
Subject: XML & Catalogs
In-Reply-To: <01BCC9C3.094BCEA0@sphynx-d105.sierra.net> (message from Patrick Gannon on Thu, 25 Sep 1997 14:55:12 -0700)
Message-ID: <199709261530.IAA13761@boethius.eng.sun.com>

| Since this topic has spilled over from the original meeting posting
| and generated significant interest, I will request a listserv be
| established for xml-catalog.  This will allow for application oriented
| discussions of XML that are now related to development (XML-DEV) or
| EDI (XML-EDI) issues that have their own listserv.

Thanks, Patrick.  Like Steve Newcomb, I didn't notice that this thread
was being copied to xml-dev when I posted to it.  We should start over
on the new list server.

Jon


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Sep 26 21:18:34 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:29 2004
Subject: Retraction and apology
Message-ID: <10329@ursus.demon.co.uk>

In message <199709260306.XAA01444@bruno.techno.com> "Steven R. Newcomb" writes:
> 
> I must ask you folks who were not supposed to see the Layman material
> to destroy it and forget it.  Anyway, it's an internal discussion,
> and, therefore, you can't know the context.

Mailings to xml-dev are not only posted to subscribers, but also hypermailed.
I have no idea what people or robots copy material from this list, but I expect
that this happens. The messages are stored in a mail box, regenerated into 
hypertext at regular intervals and it isn't feasible to delete messages from
the archive without a great deal of work. The moving finger writes... sorry.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From tbray at textuality.com  Sat Sep 27 00:30:46 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:29 2004
Subject: First XML Book?
Message-ID: <3.0.32.19970926152707.00944510@pop.intergate.bc.ca>

Just got my copy in the mail of "Presenting XML", mostly by Richard Light,
from SamsNet.   400 pages, suffers from being a snapshot of a moving target,
but, I think, a worthy first volume in the soon-to-be-large XML library.
ISBN 1-57521-334-6. -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From srn at techno.com  Mon Sep 29 04:35:49 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:58:29 2004
Subject: please consider whether
Message-ID: <199709290233.WAA01182@bruno.techno.com>


[Patrick Gannon:]

> Since this topic has spilled over from the original meeting posting
> and generated significant interest, I will request a listserv be
> established for xml-catalog.  This will allow for application
> oriented discussions of XML that are now related to development
> (XML-DEV) or EDI (XML-EDI) issues that have their own listserv.

Patrick -- Here is a note to post on the listserv. -- Steve

**********************************************************************

This note asks those in the online product catalog business to
consider whether they need XML to support SGML Architectures --
multiple architectural inheritance.  (Others may also find it
interesting.)

The designers of XML want to know why multiple architectural
inheritance is a feature that should remain unsupported, at least
temporarily.

If you want to use and benefit from the "SGML Architectures" notion
outlined in my earlier note (attached below), I believe you should now
consider (while you still have an option in the matter) whether you
want to be able to use XML for your company-internal "information
source code" for all the information that is the essence of your
company's value.  An ISO standard alternative, SGML/HyTime, is also
available for that purpose.

On the one hand, SGML/HyTime is one helluva strong set of paradigms,
of which XML and all the things currently present in or planned for
XML (linking, addressing, metadata) are a proper subset.  Together,
these paradigms put the information manager and owner in maximum
control of the cost of creating and maintaining information about
information.

On the other hand, XML will have a wider audience.  XML data will flow
across the internet to an awful lot of users (or so we think, anyway)
who won't have full SGML/HyTime capabilities in their systems any time
soon.

If, because your internal databases are limited in functionality to
the representational power of XML, your internal applications cannot
deliver the cost-cutting power of SGML/HyTime for creating and
maintaining massive amounts of n-dimensional (and n-dimensionally
interrelated) information, maybe that's ok because the potential for
higher code maintenance costs is worth the convenience of being able
to dump copies of sections of your metadata source code directly out
to the internet.  (Somehow the latter doesn't seem to me a very good
business idea, but that's for you to decide.)

You might be able to avoid having to make this decision early by
letting the w3c-xml-sig group know that your business applications
expect to benefit from multiple architectural inheritance a la SGML
Architectures, so you'd like to have XML support SGML Architectures
sooner, rather than later.

I'm not particular about whatever reason you may have for expressing
to the w3c-xml-sig group your interest (if any) in SGML Architectures;
I just think the online product catalog industry should consider doing
so, and very soon indeed.

I've already made clear my own reasons for bringing this issue up in
my earlier note.  For your convenience, I'm attaching it below (sans
some stuff I shouldn't have put in in the first place because it was
from an unpublished W3C discussion about XML).

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA

********************************************************************************

*** Not as originally posted.  Unpublished W3C material has been deleted. ***

Date: Thu, 25 Sep 1997 16:08:44 -0400 
Message-Id: <199709252008.QAA01199@bruno.techno.com>
From: "Steven R. Newcomb" <srn@techno.com>
To: Jon.Bosak@eng.Sun.COM
CC: ark@DB.Stanford.EDU, gannon@commerce.net, brucek@agentsoft.com,
         btait@mercantec.com, caallen@webmethods.com,
         claire_celeste_carnes@ccm.jf.intel.com, dmarquis@kinetoscope.com,
        f.deschamps@bull.com, harvey@eccnet.eccnet.com, jmt@commerce.net,
        Jon.Bosak@eng.Sun.COM, jonathan@poet.com, jonlewis@cngroup.com,
         marthao@icat.com, Michael.Leventhal@grif.fr, paul@arbortext.com,
        pjordan@microstar.com, ptrevithick@bitstream.com, rcw@commerce.net,
         smith@adobe.com, tbadger@kodak.com, trung@ondisplay.com,
         weld@cs.washington.edu, xml-dev@ic.ac.uk, andrewl@microsoft.com,
         higginsc@lanepowell.com
In-reply-to: <199709251550.IAA13057@boethius.eng.sun.com>
	(Jon.Bosak@eng.Sun.COM)
Subject: Re: XML iMarket Project Planning Meeting

[Jon Bosak:]

> What I as a consumer want to be able to do is quite simple.  I want to
> be able to say, "Hey, I need a new jacket," sit down at my computer,
> call up my find-a-product robot, enter my jacket parameters, and then
> come back a while later to find all the jackets that fit those
> parameters offered by all the vendors whose products I'm interested in
> considering.  If the catalog scheme isn't standardized enough to
> support this, then I as a consumer am not interested in using it.  If
> one of the vendors differentiates itself by adopting a scheme of data
> representation that doesn't allow this kind of transparent direct
> comparison, then it differentiates itself right out of the class of
> vendors I'm interested in, because if all it's giving me is the
> ability to cruise its catalog in isolation, I can get the same
> functionality from the printed version; it no longer participates in a
> way that allows the net to add value to me as a consumer.
> 
> I'm not denying that vendors will want to differentiate their
> offerings, but if they can't do it in a way that supports detailed
> direct comparisons based on the differentia that I am interested in
> *as a consumer* then they are simply not in the game at all.

There is a very serious problem here that bears strikingly on an
ongoing discussion in XML-land: the discussion of so-called
"namespaces".  The idea that there will be consortia of vendors, or
any other sort of authority who will determine some list of names of
characteristics of each sort of product, so that characteristics can
be directly and automatically compared, is dangerous to innovation,
competition, and commerce, and it is totally unnecessary, too.  It
will open the door for existing businesses to use such architectures
as weapons against upstarts in niche markets and in unusual or new
market combinations.  Moreover, the use of information architectures
as weapons will always seem like perfectly reasonable business
practices, so it will be nobody's fault when new concepts fail to be
accepted in the marketplace, because the internet failed to live up to
its promise of helping people find what they are looking for and make
informed purchasing decisions.  The macroeconomy will be damaged.

*** Mr. (or Ms.) X *** (whom I do not know, but would like to) has
laid out a list of requirements for the implementation of namespaces
which, if used as guidance in the development of XML's namespace
features, will create a need for authorities who give "standard" names
to such things as product characteristics.  The concentration of power
in such authorities will hinder innovation, by making it difficult to
compare products regarded as "out of category" for some authority's
set of defined names.

*** [To say that there is no industrial requirement for XML to support
multiple architectural inheritance is to place the design of
XML in conflict] *** with the evolutionary process of defining
and marketing new products.  How will the catalog of everything that
is for sale handle a case where the same product characteristic, or
even the same entire product, arises from multiple industries
simultaneously, and each of those industries already uses its own
authoritative schema?  Will the contents of documents have to be
duplicated and translated so as to conform with multiple schemas, so
that different comparisons can be made?  If so, that will cause much
of the value of making the comparisons in the first place to be lost;
features regarded by authorities as "out of category" will simply
disappear.  Imagine a single device that is a fax machine, a
telephone, a copier, a computer, and a stereo sound system.  Should it
appear in a list of telephones?  Maybe.  Should the output wattage of
its amplifier be listable in a comparison with the output wattage of
other telephones?  Maybe.  Should the people who figure out what are
the interesting characteristics of telephones anticipate that output
wattage may be an important characteristic of telephones?  It's
completely unrealistic to expect those people to anticipate that.
And, yet, it's an interesting and relevant statistic and it may be
important to some consumers.

The ugly truth is that we can't predict whether information that is
now thought to be irrelevant to other information (or, maybe we don't
even know about the existence of the other information yet) will turn
out to be semantically identical or semantically mappable.  In my own
mind, anyway, the real justification for the existence of businesses
that provide "yellow pages on steroids" in support of internet
commerce is to provide the added value of mapping semantics to each
other in such a way that they can be directly compared, just as Jon
says.  That mapping can be expressed in some proprietary fashion, or
it can be done using SGML documents that inherit from multiple SGML
architectures, or, if XML supports it, it can be done with XML
documents that inherit from multiple XML architectures, with no limit
on the number of XML architectures that can be inherited, and no
limits on the number of architectures that can usefully be fielded by
old and new industries.  *** [Without multiple architectural
inheritance, XML documents that represent such semantic mappings will
be more costly to create and maintain.  (I guess you'd have to do it
all with hyperlinks.  Anything can be done with hyperlinks, but that
doesn't mean that everything *should* be done with hyperlinks.  In
general, hyperlinks are best regarded by information managers as a
last resort because they cost more to maintain and their structure is
arbitrary and external.  It's better if the information, in effect,
maps itself.  Inheritable SGML architectures allow information to map
itself in complex ways.  Why shouldn't it be possible to accomplish
the same end in XML, without requiring the use of hyperlinks?)

So, I continue to harp on the importance of allowing a single element
to inherit multiple semantics (and/or the _same_ semantic differently
named or named within different namespaces).  *** [Other opinions
notwithstanding,] *** in my own mind, anyway, this really *is* a
requirement for cataloging companies to extract maximum value from
their listings at minimum information management cost in a dynamic,
non-authoritarian market environment.  It would allow internet catalog
providers to map each new DTD into their existing DTDs simply by
tweaking their existing DTDs.  For example, in the DTD for their
catalog of telephone products, when the output wattage issue first
arises (i.e., when a telephone appears on the market that lists an
output wattage), a declaration is added that allows the
characteristics listed in the DTD for the manufacturer's product
description document to be inherited.  In the same declaration, the
features of the product, such as its "colour", can be mapped to the
things that are the same that are already in the DTD, (such as
"color").  The new feature, "outputWattage", can be made to appear
with a default value of "not applicable", so now all the existing
telephone product listings have this feature, and they can all respond
meaningfully (if uninterestingly) to queries about it.  No need to
create and maintain (!) any hyperlinks.  No need to write or maintain
any extra documents.  One change in one place updates all telephone
products listed in the catalog, regardless of how many there are.  The
amount of information stored hardly increases at all, but the value of
the information increases quite a lot.  Essentially the same change
can be applied to the DTDs for stereo systems (now they can have a
redial feature, yes or no), the DTD for copiers, etc.  Cheap and very
powerful, no?  The catalog provider gets to add a terrific amount of
value at very little cost.  New products can be found by consumers
even if they didn't know the hybrid category existed.  ("I want a very
loud telephone.  Hmmm.")  New products for untried niches can be
usefully listed in multiple catalogs.  Innovation is not penalized for
being unanticipated by the authorities who created DTDs for product
listings in various categories, or by the failure to recognize a
viable category.  Indeed, there is no need for such authorities at
all.  There is only a need for catalogers who can read and understand
incoming DTDs and perform these cheap semantic mapping tricks.

You can do all this now with SGML (as of August 1, 1997; see
http://www.ornl.gov/sgml/wg8/document/1920.htm).  The only question is
whether XML will be able to do it.  Maybe it doesn't matter; providers
of internet shopping directories can always maintain their source
information in SGML and simply deliver it in XML form, if they like.
(Or in HTML form, for that matter.)

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From paul_madsen at qmail.newbridge.com  Mon Sep 29 16:11:21 1997
From: paul_madsen at qmail.newbridge.com (Paul Madsen)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <n1336594837.96237@qmail.ca.newbridge.com>

                                          9:31 AM             29/09/97

Hi, I posted this to comp.text.sgml but didn't get much response (thanks J.R.)
_________

The XML-Data specification from Microsoft
(http://www.sil.org/sgml/xml-data9706223.htm) proposes
that the logic traditionally expressed in the DTD (content models, attribute
lists, entity definitions,
etc.) be expressed using the syntax of XML instances instead. 

For instance, instead of the DTD element declaration 

<!ELEMENT book - - (p+) > 

the XML-Data scheme rule would be something like 

<elementType id="book"> 
     <elt href="#p" occurs="PLUS"/> 
</elementType> 

I'm attracted to the the idea if only because it seems "cool". 

But what does this gain us? What deficiencies with the DTD formalism does it
address? 

Is it the ability to extend object types so that one class of object is a
specialization of another more
general class? 

Do not Architectural forms provide the traditional DTD syntax just that
ability? 

Thanks for any insight. 

Paul 


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From RMcDouga at JetForm.com  Mon Sep 29 16:26:38 1997
From: RMcDouga at JetForm.com (Rob McDougall)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <c=CA%a=_%p=JetForm%l=ROSSINI-970929142331Z-50359@rossini.jetform.com>

If I remember correctly, the advantages are listed in the spec.  The main 
advantage being that you can include the XML-Data definition within the XML 
file itself, so that you now can send a completely self-describing file 
that can be read by a single (XML) parser.

Rob
=======================================================
Rob McDougall            Phone:  (613)751-4800 ext.5232
JetForm Corporation      Fax:    (613)594-8886
http://www.jetform.com   mailto:rmcdouga@jetform.com
=======================================================

-----Original Message-----
From:	Paul Madsen [SMTP:paul_madsen@qmail.newbridge.com]
Sent:	Monday, September 29, 1997 9:46 AM
To:	XML DEV
Subject:	XML-Data: advantages over DTD syntax?

                                          9:31 AM             29/09/97

Hi, I posted this to comp.text.sgml but didn't get much response (thanks 
J.R.)
_________

The XML-Data specification from Microsoft
(http://www.sil.org/sgml/xml-data9706223.htm) proposes
that the logic traditionally expressed in the DTD (content models, 
attribute
lists, entity definitions,
etc.) be expressed using the syntax of XML instances instead.

For instance, instead of the DTD element declaration

<!ELEMENT book - - (p+) >

the XML-Data scheme rule would be something like

<elementType id="book">
     <elt href="#p" occurs="PLUS"/>
</elementType>

I'm attracted to the the idea if only because it seems "cool".

But what does this gain us? What deficiencies with the DTD formalism does 
it
address?

Is it the ability to extend object types so that one class of object is a
specialization of another more
general class?

Do not Architectural forms provide the traditional DTD syntax just that
ability?

Thanks for any insight.

Paul


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following 
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From michael at textscience.com  Mon Sep 29 17:41:18 1997
From: michael at textscience.com (Michael Leventhal)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
In-Reply-To: <n1336594837.96237@qmail.ca.newbridge.com>
Message-ID: <3.0.1.32.19970929080238.0083c5c0@aimnet.com>

At 09:46 AM 9/29/97 -0400, Paul Madsen wrote:
>But what does this gain us? What deficiencies with the DTD formalism does it
>address? 
>
>Is it the ability to extend object types so that one class of object is a
>specialization of another more general class? 

IMHO, this is a strong reason to chuck DTDs as they now exist.  But not
a goal of XML-DATA.

>Do not Architectural forms provide the traditional DTD syntax just that
>ability? 

So say some but not really.

Michael Leventhal

______________________________________________________________________
  Michael Leventhal           Internet  : http://www.grif.com
  G R I F , S. A.             Email     : Michael.Leventhal@grif.fr
  VP, Technology              Telephone : 510-444-2962
  1800 Lake Shore Ave Ste 14  Fax       : 510-444-1672
  Oakland, California  94606  France    : (011) 33 1 30121430 (fr US)
______________________________________________________________________

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Mon Sep 29 17:51:05 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <1.5.4.32.19970929154902.00a202a4@pop.mindspring.com>

XML-Data adds several features that hard-core object oriented folks
appreciate:

1. True inheritance, with semantics more similar to that of OO
languages than indirect mechanisms used to simulate inheritance when
using architectural forms. Architectural forms do not really give us
what OO folks call inheritance.

2. Reflection - the ability to modify the content model at run-time.

3. The syntax for the content model is the same as the syntax for
data, making it easier to write code to manipulate both.

Of course, all existing SGML and XML tools know how to deal with DTDs,
and this is a rather major departure from traditional SGML. It has not
been blessed by any standardization committee. Given the way Microsoft
has approached Java, insisting that it need not implement the portable
libraries everyone else is using, and encouraging people to use their
platform-specific libraries instead, it is easy to wonder what will
happen to the SGML world if Microsoft is in control of an alternative
method of specifying content models.

According to MS representatives, there *will* be tools to transform
XML-Data content models into DTDs, but still, the "real" content model
is in the XML-Data. Is it worth it in order to gain true inheritance
and reflection? For some applications, it may well be. If Microsoft
controls XML-Data, and some vendors support it but others do not, will
we have the same kind of market fragmentation that we have in the Java
world today, where Microsoft is refusing to support the Java standard
libraries, and instead insists that developers should use their own
libraries, which run only on Windows operating systems?

Who knows!

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From gray at interlog.com  Mon Sep 29 18:08:46 1997
From: gray at interlog.com (Graydon Hoare)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
In-Reply-To: <n1336594837.96237@qmail.ca.newbridge.com>
Message-ID: <Pine.BSI.3.95.970929115726.27169A-100000@shell1.interlog.com>


> I'm attracted to the the idea if only because it seems "cool". 

I think the general reasoning behind xml-data and XSL (shiver of horror) 
is that if we settle on a uniform representation for graph-structured data
in transit then we can (soon) live in a world where nobody has to write a
parser for the stuff ever again. I mean, a scheme parser isn't exactly
brain surgery so I'm less inclined to enjoy this argument when used in
favour of XSL, but XSL has other reasons for existing. writing a DTD
parser with architectural forms support is just another stumbling block to
wide deployment of XML, and xml-data nicely circumvents the question. You
can just write an XML parser (in a shoddy one-off proof of concept as many
people are busy writing) and write your validator in terms of the objects
the tried and true parser hands you.  Given that those objects have really
simple property-querying methods, it makes your code better encapsulated,
less likely to mix validating with the parsing of architectural forms.

at least that's the principal advantage I see. 

cool side note: you can use a DSSSL engine to customize an XML-DATA grove
and dump out a new document type ;) or at very least typeset the metadata
in a nice way..

-graydon <graydon@pobox.com>
______________________
peccatum poena peccati 


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From peter at techno.com  Mon Sep 29 18:46:19 1997
From: peter at techno.com (Peter Newcomb)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
In-Reply-To: <1.5.4.32.19970929154902.00a202a4@pop.mindspring.com> (message
	from Jonathan Robie on Mon, 29 Sep 1997 11:49:02 -0400)
Message-ID: <199709291643.MAA29767@exocomp.techno.com>

[Jonathan Robie <jwrobie@mindspring.com> on Mon, 29 Sep 1997 11:49:02 -0400]
> XML-Data adds several features that hard-core object oriented folks
> appreciate:
> 
> 1. True inheritance, with semantics more similar to that of OO
> languages than indirect mechanisms used to simulate inheritance when
> using architectural forms. Architectural forms do not really give us
> what OO folks call inheritance.

Could you elaborate upon this distinction between architectural form
inheritance and "true OO inheritance"?  What about XML-data makes it
capable of supporting "truer" inheritance than architectural forms?

-peter

--
Peter Newcomb                           TechnoTeacher, Inc.
peter@petes-house.rochester.ny.us       peter@techno.com
http://www.petes-house.rochester.ny.us  http://www.techno.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Mon Sep 29 19:29:20 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <1.5.4.32.19970929172831.0098672c@pop.mindspring.com>

At 12:43 PM 9/29/97 -0400, Peter Newcomb wrote:
>[Jonathan Robie <jwrobie@mindspring.com> on Mon, 29 Sep 1997 11:49:02 -0400]
>> XML-Data adds several features that hard-core object oriented folks
>> appreciate:
>> 
>> 1. True inheritance, with semantics more similar to that of OO
>> languages than indirect mechanisms used to simulate inheritance when
>> using architectural forms. Architectural forms do not really give us
>> what OO folks call inheritance.
>
>Could you elaborate upon this distinction between architectural form
>inheritance and "true OO inheritance"?  What about XML-data makes it
>capable of supporting "truer" inheritance than architectural forms?

Let me preface this by saying that I am fairly new to both XML-data and
architectural forms, and I am perfectly willing to be shown wrong on this
statement. Let me explain some properties I see in XML-Data which I have not
yet been able to mirror completely using architectural forms. Since you know
much more about architectural forms than I do, I'll let you tell me if there
is an exact equivalent using architectural forms. In fact, this could be a
great opportunity to do a better comparison than I can do by myself.

In C++, Java, Smalltalk, and other OO languages, if I say that "a duck is an
animal", that means: (1) a duck always has all the data associated with an
animal, (2) a duck has the behavior associated with an animal (unless you
specifically say that a duck does certain things differently), and (3)
references to generic animals can also point to ducks.  To put this in
traditional OO terms, Duck inherits data, behavior, and type from Animal. In
SGML, it can't inherit behavior, but it can inherit data and type.

Microsoft's XML-Data allows me to inherit data and type in a manner very
similar to OO languages. For instance, their description of XML-Data at
their XML standards page gives the following example:

<xml:schema>
  <elementType id="animalFriends">
    <elt href="#pet" occurs="PLUS"/>
  </elementType>

  <elementType id="pet">
    <any/>
    <attribute id='name'/>
    <attribute id='owner'/>
  </elementType>

  <elementType id="cat" extends="#pet"/>
    <elt href='#kittens'/>
    <attribute id='lives' type='NMTOKEN'/>
  </elementType>

  <elementType id="dog" extends="#pet"/>
    <elt href='#puppies'/>
    <attribute id='breed'/>
  </elementType>
<xml:schema>

Now I can use this type declaration to create an animalFriends element,
which is a list of pets:

<animalFriends>
  <cat name="Fluffy" lives='9'/>
  <pet name="Diego"/>
  <dog name="Gromit" owner='Wallace' breed='mutt'/>
</animalFriends>

So the pet hrefs can point to pets, cats, or dogs.

How would I create this schema using architectural forms?

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From eliot at isogen.com  Mon Sep 29 20:21:12 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <3.0.32.19971129131853.00b5a2c8@swbell.net>

At 01:28 PM 9/29/97 -0400, Jonathan Robie wrote:

>                                                             To put this in
>traditional OO terms, Duck inherits data, behavior, and type from Animal. In
>SGML, it can't inherit behavior, but it can inherit data and type.

In fact, you can inherit behavior if your processor is architecture aware
such that you can write rules that will apply the architecture-specific
behavior in the absense of element-specific behavior.  This could either be
indirectly through object-oriented processors where the implementing
element-specific objects inherit from architecture-specific objects or
explicitly through scripts that embody the architecture derivation rules,
e.g., something like this in DSSSL (here using a 'query' element rule):

(query (case (arch-form-of (current-node) 'myarch')
        (('foo')
         (make paragraph ...))
        (('bar')
         (make sequence ...))))

Behavior is simply processing code associated with types--the only question
is how is the binding done.  With SGML, the binding is [almost] always
loose and indirect and architecture-based binding is just another level of
indirection, similar to, if not identical to, the indirection you get by
inheriting methods from supertypes.

>Microsoft's XML-Data allows me to inherit data and type in a manner very
>similar to OO languages. For instance, their description of XML-Data at
>their XML standards page gives the following example:
>
><xml:schema>
>  <elementType id="animalFriends">
>    <elt href="#pet" occurs="PLUS"/>
>  </elementType>
>
>  <elementType id="pet">
>    <any/>
>    <attribute id='name'/>
>    <attribute id='owner'/>
>  </elementType>
>
>  <elementType id="cat" extends="#pet"/>
>    <elt href='#kittens'/>
>    <attribute id='lives' type='NMTOKEN'/>
>  </elementType>
>
>  <elementType id="dog" extends="#pet"/>
>    <elt href='#puppies'/>
>    <attribute id='breed'/>
>  </elementType>
><xml:schema>
>
>Now I can use this type declaration to create an animalFriends element,
>which is a list of pets:
>
><animalFriends>
>  <cat name="Fluffy" lives='9'/>
>  <pet name="Diego"/>
>  <dog name="Gromit" owner='Wallace' breed='mutt'/>
></animalFriends>
>
>So the pet hrefs can point to pets, cats, or dogs.
>
>How would I create this schema using architectural forms?

I see a one-level schema hierarchy from which the document in the example
is derived:

superclass animalFriends 
   contains pet+
superclass pet
   contains ANY
   attribute owner
   attribute name 

To duplicate this using architectures, I create a meta-DTD that defines the
two supertypes and a document that derives its element types from the
supertypes.  

First the derived document, which declares its derivation from the
architecture (schema):

<!DOCTYPE animalFriends [
<!-- Animal Friends DTD -->
<!NOTATION animalFriends PUBLIC "-//ME//DTD Animal Friends Architecture/EN"
>
<!ATTLIST #NOTATION animalFriends
    arcDTD   CDATA #FIXED "animalFriends.meta-DTD"
    ArcFormA NAME  #FIXED "anfriend"
>
<!ENTITY animalFriends.meta-DTD SYSTEM "animalfriends.mtd" >

<!ATTLIST (cat | dog) 
    anfriend  NAME #FIXED "pet"
>
<!-- NOTE: No other declarations necessary when using XML syntax. -->
]>
<animalFriends>
  <cat name="Fluffy" lives='9'/>
  <pet name="Diego"/>
  <dog name="Gromit" owner='Wallace' breed='mutt'/>
</animalFriends>

Now the architectural meta-DTD, which defines the types:

<!-- animalFriends architecture meta-DTD -->
<!ELEMENT animalFriends - - (pet+) >

<!ELEMENT pet               ANY >
<!ATTLIST pet
   name -- The name of the pet --
     CDATA #IMPLIED -- Not clear from example if this is required --
   owner -- The name of the pet's owner --
     CDATA #IMPLIED
>
<!-- End of animalFriends architectural meta-DTD -->

The relationship of the types in the document to the types in the meta-DTD
is clear and machine processible (because of the architecture notation and
meta-DTD entity).  The relationship of the individual elements to their
supertypes is clear, either through the automatic mapping (names in the
document automatically map to the same name in the architecture, e.g.,
'animalFriends' in the document maps to 'animalFriends' in the meta-DTD) or
through the explicit mapping as for the types cat and dog.  The 'extends'
semantic is inherent in architectural derivation.  The architecture conveys
no less information than the example and takes about the same amount of
characters in this case (the verbosity of the XML-Data syntax offset by the
need for the architecture notation and entity declaration in the document).

The architecture approach requires no specialized processors in order to
process the document by architecture-unaware processors and
architecture-aware processing can be added easily through either ad-hoc
means in style sheets or transforms or using more complete architecture
engines (e.g., SP, GroveMinder, etc.).

Note that neither the XML-Data nor the architectural meta-DTD are complete
definitions of the schema--you still need human-understandable definitions
of all the parts (what is a "pet"? What are the rules for pet names? What
are the rules for owner names? What, if any, is the significance of pet
element content? etc.).  You also need to define the expected behavior for
the types in various contexts: formatting, transformation, online display,
etc.  Neither the XML-Data nor the architecture formalism will or can
provide these--they must be provided by other means, mostly
non-standardized and relying heavily on prose to communicate ideas to
humans, not processing to computers.

The only really important part of the schema discussion is how is a schema
associated with its documentation and definitions and how are things
associated with that schema.  That's why the architecture mechanism
requires that you declare the notation for the architecture--that is the
pointer to the authoritative definition of what the architecture rules are.
 The meta-DTD for the architecture is just a convenience that makes it
easier to do processing and validation, but the presence of it doesn't give
you that much and the lack of it doesn't preclude doing architecture-based
processing.  The same will be true of any other formal syntax for defining
the meta-syntax rules for documents.  At least architectures use an
existing syntax that is well understood by all SGML tools.

Given that most XML tools will need to be able to deal with DTDs anyway, I
can see no compelling reason in the short term to define an alternative
syntax for DTDs.  Rethinking how document schemas are created and managed
over the long term needs doing, now doubt, but that is a project that will
take years of careful study and thought and must be done in conjunction
with a major revision to SGML, one in which many different ideas and
requirements can be brought to bear.

In my opinion, none of the name-space requirements and none of the
DTD-editing requirements require a change to existing mechanisms in order
to be satisfied in a reasonable way.  Given that, there can be no good
reason for trying to reinvent the DTD mechanism at this time and trying to
do so is a waste of time that is better spent on more pressing issues.
Certainly people are free to invent whatever document types they want for
representing schemas, but to suggest that any such definition should be
used as standard within XML or SGML is premature, unwise, and unwarranted.
If Microsoft (or anybody else) wants to build tools to support such a
system and see if people will use or buy them, let them do so.  Let the
marketplace decide.  But this is not an area of SGML or XML for which the
standards need to change at this time and we should not attempt to change
them.

Cheers,

E.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ricko at allette.com.au  Mon Sep 29 20:25:09 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <199709291829.EAA23860@jawa.chilli.net.au>


> From: Jonathan Robie <jwrobie@mindspring.com>
 
> Of course, all existing SGML and XML tools know how to deal with DTDs,
> and this is a rather major departure from traditional SGML. It has not
> been blessed by any standardization committee. Given the way Microsoft
> has approached Java, insisting that it need not implement the portable
> libraries everyone else is using, and encouraging people to use their
> platform-specific libraries instead, it is easy to wonder what will
> happen to the SGML world if Microsoft is in control of an alternative
> method of specifying content models.
 
XML-data would probably fail, that's what.

Because their form of schemas are so complicated and verbose to read
that you will need browsing tools to manipulate them.  This in turn
gives schemas (even though they are written in XML) the nature
of binary objects rather than textual objects.  It seems the weight
of experience is against people making successful schema languages
in non-textual forms.  

For example, Bento and the OpenDoc storage system included API-driven 
routines for decorating cleverly stored objects with all sorts of 
interesting type information, including type conversion, and so it 
can be considered -- in part -- a schema system.  Failing to
have a text form, the thing failed to thrive.  The XML-data 
system does have a text form, but it complicates matters so much by
not having a simple text form (e.g. a separate declaration
syntax) that it seems to be unreadable.

In my view, declarations are actually a kind of processing instruction,
targetted at the parser or entity manager, which also may be of
interest to the application (sorry for using SGML jargon). 
The XML-data view seems to be that they are, more essentially, 
data rather than processing instructions. Tim Bray has said
frequently "metadata is data", to which I would say 
"processing instructions are sometimes data, sometimes not".

Have the XML-data people ever made any requests to ISO for
suggested improvements to the declaration syntax to give
them the functionality they need? (This is unfair really,
since I think XML-data is an experimental system, and 
therefore a good place to generate user requirements for
a less verbose syntax.) Have they proved that
a single-tag language is easier to use than one with multiple
types of tags?  

I am certainly in 100% favour of schema systems and stronger typing
and abstracting interesting information about data into 
header elements. I proposed the SEEALSO parameter in the 
current WebSGML TC specifically to allow richer declarations 
of syntax using any kind of exotic notations including natural 
language, so I am the last person to say that SGML declarations
are enough for all uses.

But I am simply not convinced that XML-data represents a 
usable alternative to the standard declarations (in the
same market), and I think XML-data should not compete 
(or been talked about as competing!) with the standard 
declarations. Their purposes are, I hope, quite
different.


Rick Jelliffe


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ddb at criinc.com  Mon Sep 29 20:45:57 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <3.0.32.19970929114620.009b3590@mailhost.criinc.com>

At 01:28 PM 9/29/97 -0400, Jonathan Robie wrote:
>At 12:43 PM 9/29/97 -0400, Peter Newcomb wrote:
>>> [snip Jonathan Robie's original post]
>>Could you elaborate upon this distinction between architectural form
>>inheritance and "true OO inheritance"?  What about XML-data makes it
>>capable of supporting "truer" inheritance than architectural forms?
>
>[snip]
>In C++, Java, Smalltalk, and other OO languages, if I say that "a duck is an
>animal", that means: (1) a duck always has all the data associated with an
>animal, (2) a duck has the behavior associated with an animal (unless you
>specifically say that a duck does certain things differently), and (3)
>references to generic animals can also point to ducks.  To put this in
>traditional OO terms, Duck inherits data, behavior, and type from Animal. In
>SGML, it can't inherit behavior, but it can inherit data and type.
>[snip]

One thing which Henry Thompson's presentation at HyTime '97 brought forth
in my mind was SGML's lack of support for (3) above.  Architectural forms
do little or nothing to rectify this, although AF could provide a solution
if used in an envirnment which supports simultanious view of the source and
AF instances with links between the two.  Part of the problem is that AF's
do little, if anything to make life easier when I want to build a DTD which
extends an existing DTD.  I have to copy the existing DTD and modify it and
then add the AF meta-info which maps the new DTD back tot he old.  But now
I have a completely different DTD, from the point of view of _all_ existing
SGML software.  Sure I can map my documents to the original, but I can not
see it as both... I must either remove all value added by my modified DTD,
or abandon existing options based on the original DTD, since the new
document is not conforming to the original DTD.  Obviously, since I put the
time into building the new DTD, I think there is some significant value
added, but I can not leverage the value added while at the same time
leveraging the use of the existing DTD as a base architecture.

This is exactly what OO Inheritance allows a programmer to do.  You need
an extra attribute? Easy!  With AF's I either see the document as the new
DTD or I can not see the attribute... value lost either way.

I want to be able to treat it as the original DTD until that special moment
when I can test to see if this has my extended info.. and perform extra
processing based on that...

-derek

     Derek E. Denny-Brown II      ||   ddb@criinc.com
     "Reality is that which,      ||   Seattle, WA USA
  when you stop believing in it,  ||  WWW/SGML/HyTime/XML
 doesn't go away."  -- P. K. Dick || Java/Perl/Scheme/C/C++

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ricko at allette.com.au  Mon Sep 29 21:03:11 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:30 2004
Subject: Animal-friends implemented as a pattern (Re: XML-Data: advantages over DTD syntax?)
Message-ID: <199709291907.FAA24375@jawa.chilli.net.au>


> From: Jonathan Robie <jwrobie@mindspring.com>
 
> <xml:schema>
>   <elementType id="animalFriends">
>     <elt href="#pet" occurs="PLUS"/>
>   </elementType>
> 
>   <elementType id="pet">
>     <any/>
>     <attribute id='name'/>
>     <attribute id='owner'/>
>   </elementType>
> 
>   <elementType id="cat" extends="#pet"/>
>     <elt href='#kittens'/>
>     <attribute id='lives' type='NMTOKEN'/>
>   </elementType>
> 
>   <elementType id="dog" extends="#pet"/>
>     <elt href='#puppies'/>
>     <attribute id='breed'/>
>   </elementType>
> <xml:schema>
> 
> Now I can use this type declaration to create an animalFriends element,
> which is a list of pets:
> 
> <animalFriends>
>   <cat name="Fluffy" lives='9'/>
>   <pet name="Diego"/>
>   <dog name="Gromit" owner='Wallace' breed='mutt'/>
> </animalFriends>
> 
> So the pet hrefs can point to pets, cats, or dogs.
> 
> How would I create this schema using architectural forms?

And you do not even need architectural forms. Here is a very
simple pattern for doing everything your example does using
a single DTD and standard SGML! (The suffixes "-content"
and "-attributes" are reserved for use in patterns. The
attribute "is-a" is reserved to allow inheritence labelling.)

<!DOCTYPE animal-friends
[

<!-- Handle animal friends ================================= -->
<!ENTITY % animal-friends-content 
	" ( pet | cat | dog )+")
<!ENTITY % animal-friends-attributes
	" ">
<!ELEMENT  animal-friends
	( %animal-friends-content; )>
	<!ATTLIST animal-friends
		%animal-friends-attributes;
	>

<!-- Handle pets =========================================== -->
<!ENTITY % pet-content 
	"ANY" >
<!ENTITY % pet-attributes 
	" name ID #IMPLIED
	owner ID #IMPLIED 
	is-a CDATA #FIXED 'pet' " >  <!-- does not handle multiple inheritance! -->
<!ELEMENT pet
	( %pet-content; ) >
	<!ATTLIST pet
		%pet-attributes;
	>

<!-- Handle cats =========================================== -->
<!ENTITY % cat-contents
	( " (kittens)? " )
<!ENTITY % cat-attributes
	" lives NMTOKEN #IMPLIED ">
<!ELEMENT cat
	( %pet-content;, %cat-contents; )
 	<!ATTLIST cat
		%pet-attributes;
		%cat-attributes;
	>

<!-- Handle dogs =========================================== -->
<!ENTITY % dog-contents
	( " (puppies)? " )
<!ENTITY % dog-attributes
	" breed CDATA #IMPLIED ">
<!ELEMENT dog
	( %pet-content;, %dog-contents; )
 	<!ATTLIST dog
		%pet-attributes;
		%dog-attributes;
	>
]>

<animalFriends>
  <cat name="Fluffy" lives='9'/>
  <pet name="Diego"/>
  <dog name="Gromit" owner='Wallace' breed='mutt'/>
</animalFriends>


If you want multiple inhereitance, then you can just 
define a different suffix, and search through attributes
based on that to collect the inheritance tree. I can
provide an example if anyone is interested.

Anyone who is aware of the pattern can see this and implement
it just as easily as they could using XML-data's syntax,
but without breaking SGML compatibility, which generating
new element types outside declarations does.

Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Mon Sep 29 21:07:12 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:30 2004
Subject: Animal-friends implemented as a pattern (Re: XML-Data:
  advantages over DTD syntax?)
Message-ID: <1.5.4.32.19970929190623.00a56820@pop.mindspring.com>

At 05:02 AM 9/30/97 +1000, Rick Jelliffe wrote:
 
>If you want multiple inhereitance, then you can just 
>define a different suffix, and search through attributes
>based on that to collect the inheritance tree. I can
>provide an example if anyone is interested.
 
Please!

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From eliot at isogen.com  Mon Sep 29 21:17:13 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <3.0.32.19971129141415.00acab48@swbell.net>

At 11:46 AM 9/29/97 -0700, Derek Denny-Brown wrote:

>>specifically say that a duck does certain things differently), and (3)
>>references to generic animals can also point to ducks.  To put this in
>>traditional OO terms, Duck inherits data, behavior, and type from Animal. In
>>SGML, it can't inherit behavior, but it can inherit data and type.
>>[snip]
>
>One thing which Henry Thompson's presentation at HyTime '97 brought forth
>in my mind was SGML's lack of support for (3) above.  Architectural forms
>do little or nothing to rectify this, although AF could provide a solution
>if used in an envirnment which supports simultanious view of the source and
>AF instances with links between the two.  

I'm not sure I follow you.  If you have an architecture-aware search
engine, then you should be able to do a query of the form "find all
elements derived from the form 'animal'", which will include both 'animal'
elements and 'duck' elements.  How is this not 3?  Or do I misunderstand
Henry's requirement?

Something in the system has to know that a duck is a kind of
animal--architectures convey this information as clearly as any other
method, so I don't see how they can't satisfy the requirement.

>                                          Part of the problem is that AF's
>do little, if anything to make life easier when I want to build a DTD which
>extends an existing DTD.  I have to copy the existing DTD and modify it and
>then add the AF meta-info which maps the new DTD back tot he old.  But now
>I have a completely different DTD, from the point of view of _all_ existing
>SGML software.  Sure I can map my documents to the original, but I can not
>see it as both... I must either remove all value added by my modified DTD,
>or abandon existing options based on the original DTD, since the new
>document is not conforming to the original DTD.  Obviously, since I put the
>time into building the new DTD, I think there is some significant value
>added, but I can not leverage the value added while at the same time
>leveraging the use of the existing DTD as a base architecture.

Again, I don't follow you.  Either you really have a completely new DTD and
you have to define the processing for it completely or you have a DTD
derived from an architecture *and* you have architecture-aware processors
that let you apply the architeture-specific processing to your new
documents, leaving only the new stuff to be defined.  How do architectures
not do this? How would the XML-Data proposal do this any better? In both
cases, it's a function of the processing code both providing the methods
for the base classes and the processing system understanding the derivation
hierarchy.

You can also use the trick of defining the architecture such that its
declarations (and in particular, the parameter entities used to configure
and modularize it) can be also used to create declarations for documents
derived from the architecture.  In essessence you combine architectural
derivation with the sort of clever modularization typified by the TEI and
Docbook declaration sets.

Your comments suggest that you are confusing *parsing* with *processing*.
Parsing is not an issue, because the document is either valid to its DTD or
it isn't, and is either valid with respect the governing schema or isn't.
Whether or not the document is valid doesn't affect how it is *processed*
after parsing, which is purely a function of methods applied to types, not
parsing, and is entirely independent of how the type information got
associated with the data (whether by the architecture syntax or the
interpretation of some XML-Data document).

>This is exactly what OO Inheritance allows a programmer to do.  You need
>an extra attribute? Easy!  With AF's I either see the document as the new
>DTD or I can not see the attribute... value lost either way.

This is only true if you define your processing in terms of architectural
instances derived from documents, but clearly, that is not the way
architectures are intended to be used in the general case.  The
architecture provides part of the processing and an architecture-aware
processor must be able to associate architecture-specific processing with a
document, but it's not an all-or-nothing proposition.  I must always be
aware of the document's architectural nature as well as its base nature
unless the only processing I care about at the moment is that defined by
the architecture.

The XML-Data proposal (to the degree I understand it) and architectures
appear to convey exactly the same information about a schema and a
document's derivation from it.  The fact that the XML-Data syntax appears
to be more "object-oriented" must be a red herring because in both cases
you are providing a purely declarative data description, not the definition
of active methods.  The only way in which XML-Data might appear to be
object-oriented is XML-Data-specific semantics for generating complete
declarations from XML-Data specifications based on implication rules, but
these will either be effectively identical to features in the AFDR syntax,
such as multiple attlists for the same element type, or facilities of
limited utility, such as content model implication (which can be managed
pretty well with parameter entities).  In other words, I don't see that
it's possible for anything like XML-Data to provide significantly more
assistance in creating and managing declaration sets and meta-DTDs than you
already get with the AFDR and normal SGML facilities.

This is why confusing architectures with object-oriented programming
approaches is so dangerous: they are not the same thing and thinking that
they are leads to erroneous conclusions and unrealistic expectations (such
as that content models can be somehow inherited in any but the most trivial
ways).

Note too that when you have DTD-less documents, problems of DTD syntax
munging go away because you don't have any DTD syntax to mung.  Any munging
is managed by the creators of derived schemas.  This is one of the beauties
of XML--it frees us from the need to conflat schema definition with the
definition of the parsing rules for document instances.  

Cheers,

E.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ricko at allette.com.au  Mon Sep 29 21:53:30 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:30 2004
Subject: Animal-friends implemented as a pattern (Re: XML-Data:advantages over DTD syntax?)
Message-ID: <199709291958.FAA24998@jawa.chilli.net.au>


----------
> From: Jonathan Robie <jwrobie@mindspring.com>
> To: ricko@allette.com.au
 
> At 05:02 AM 9/30/97 +1000, Rick Jelliffe wrote:
>  
> >If you want multiple inhereitance, then you can just 
> >define a different suffix, and search through attributes
> >based on that to collect the inheritance tree. I can
> >provide an example if anyone is interested.
>  
> Please!
 
Here is a version which allows multiple inheritance.
(Some parenthesis problems fixed too.)
I have put in even empty attribute values, to make
the pattern uniform in every case, so please do not
confuse this simplicity for elaborateness!

To extract the inheritance tree, collect all attributes
with "-inherit" suffix.  I think the only novel thing
is that people are not used to wildcard searches on 
attribute names, but this is only prejudice.

Also, I think because some tools require precompiled
DTDs, there is a general view in some circles that
DTDs are always compiled, and always made prior
to the generation of the instance. But that is
not intrinsic to SGML.

The PATTERN
-----------

This pattern reserves the suffixes:
	-content	 for a parameter entity with the 
                       element type's contents
	-attributes  for a parameter entity with the 
                       element type's attributes
	-inherit     for a fixed attribute with the 
                       element type's immediate inheritance

The pattern is
	<!ENTITY % {GI}-content
		" {CONTENT-MODEL} ">
	<!ENTITY % {GI}-attribute
		" {ATTRIBUTE-DECLARATIONS} 
		{GI}-inherit CDATA #FIXED '' ">
	<!ELEMENT {GI}
		( %{GI}-content;,  {INHERITED-CONTENT-MODELS} ) >
	<!ATTLIST {GI}
		%{GI}-attributes;
		{INHERITED-ATTRIBUTE-DECLARATIONS}
		>
Where the delimiters {} indicate parameters of the template
which you or your application edit in.  

The EXAMPLE
-----------

<!DOCTYPE animal-friends
[
<!-- Handle animal friends ================================= -->
<!ENTITY % animal-friends-content 
	" ( pet | cat | dog )* "
<!ENTITY % animal-friends-attributes
	"  animal-friends-inherit CDATA #FIXED '' ">
<!ELEMENT  animal-friends
	( %animal-friends-content; )>
<!ATTLIST animal-friends
	%animal-friends-attributes;
	>

<!-- Handle pets =========================================== -->
<!ENTITY % pet-content 
	" ANY " >
<!ENTITY % pet-attributes 
	" name ID #IMPLIED
	owner ID #IMPLIED 
	pet-inherit CDATA #FIXED '' "
	 >   
<!ELEMENT pet
	%pet-content;  >
<!ATTLIST pet
	%pet-attributes;
	>

<!-- Handle cats =========================================== -->
<!ENTITY % cat-contents
	" (kittens)? " 
<!ENTITY % cat-attributes
	" lives NMTOKEN #IMPLIED 
	cat-inherit CDATA #FIXED 'pet' ">
<!ELEMENT cat
	( %pet-content;, %cat-contents; )
 <!ATTLIST cat
	%pet-attributes;
	%cat-attributes;
	>

<!-- Handle dogs =========================================== -->
<!ENTITY % dog-contents
	" (puppies)? " 
<!ENTITY % dog-attributes
	" breed CDATA #IMPLIED
       dog-inherit CDATA #FIXED 'pet' ">
<!ELEMENT dog
	( %pet-content;, %dog-contents; )
 <!ATTLIST dog
	%pet-attributes;
	%dog-attributes;
	>
]>

<animalFriends>
  <cat name="Fluffy" lives='9'/>
  <pet name="Diego"/>
  <dog name="Gromit" owner='Wallace' breed='mutt'/>
</animalFriends>


Please note that I am not saying that this form is always
preferable to using AFs or XML-data.  But it can be done
in XML as it stands now, keeping valid SGML declarations.
And, as has been mentioned, there should be interconversion
possible between the three forms, since they give the
same information.  If XML-data requires the use of specialist
tools to mapulate, since it is so verbose, then this pattern
cannot either be regarded as excessively verbose either, 
since the same kind of tools can be constructed to simplify
creating new objects.


Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From srn at techno.com  Mon Sep 29 22:15:42 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
In-Reply-To: <3.0.1.32.19970929080238.0083c5c0@aimnet.com> (message from
	Michael Leventhal on Mon, 29 Sep 1997 08:02:38 +0200)
Message-ID: <199709291827.OAA01640@bruno.techno.com>

[Paul Madsen:]

> Do not Architectural forms provide the traditional DTD syntax just that
> ability [to extend object types so that one class of object is a
> specialization of another more general class]?

[Michael Leventhal:]

> So say some but not really.

I'm one of those who say so.  How "not really"?

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ddb at criinc.com  Mon Sep 29 22:37:33 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun  7 16:58:31 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <3.0.32.19970929133641.009a9100@mailhost.criinc.com>

At 02:14 PM 11/29/97 -0600, W. Eliot Kimber wrote:
>At 11:46 AM 9/29/97 -0700, Derek Denny-Brown wrote:
>I'm not sure I follow you.  If you have an architecture-aware search
>engine, then you should be able to do a query of the form "find all
>elements derived from the form 'animal'", which will include both 'animal'
>elements and 'duck' elements.  How is this not 3?  Or do I misunderstand
>Henry's requirement?

This requires a AF aware search engine.  In addition, all current AF
systems can only view the instance as either the source or the AF.  If the
search engine reports where it found the match, it would report it relative
to the AF, not the source document.  As I implied in my original post:
>> although AF could provide a solution if used in
>> an envirnment which supports simultanious view 
>> of the source and AF instances with links between
>> the two.
a number of things start to change when you add an environment wheren you
can easily map back and forth between the two views.

>Again, I don't follow you.  Either you really have a completely new DTD and
>you have to define the processing for it completely or you have a DTD
>derived from an architecture *and* you have architecture-aware processors
>that let you apply the architeture-specific processing to your new
>documents, leaving only the new stuff to be defined.  How do architectures
>not do this? How would the XML-Data proposal do this any better? In both
>cases, it's a function of the processing code both providing the methods
>for the base classes and the processing system understanding the derivation
>hierarchy.

I want to build on tools which assume you are using an existing DTD, say a
custom editor environment. (note: this is not based on a real
implementation, but rather a mental exercise)  From the point of view of
that tool I either am using a new DTD (since I can not have a nice PUBLIC
reference to the "standard" DTD, and the DTD is different in any case,
because I added elements to some content models) or I only give it the AF
and I have lost my value added elements.  I am talking about today and
tomorrow, not next year.  Next year there may be tools which allow better
use of AFs.  I am not in a position where I have enough information to
really know what vendors plan to release next year.  I am in a situation
where if it can not be done today, I can not use it, since my deadlines are
too tight to wait on future releases for most of the software. (note: if
you want grey hair at an early age, this is an excelent recipy.  managers
who do not want their staff to have grey hair should either take note or
buy lots of hair dye...)

I have never said that XML-Data provides anything better, since I do not
know enough about it to even compare it to AFs, which I do have a
reasonable understanding of, I think.

>You can also use the trick of defining the architecture such that its
>declarations (and in particular, the parameter entities used to configure
>and modularize it) can be also used to create declarations for documents
>derived from the architecture.  In essessence you combine architectural
>derivation with the sort of clever modularization typified by the TEI and
>Docbook declaration sets.

This requires that the original be well designed.  A common request, which
is often ignored ;}


>Your comments suggest that you are confusing *parsing* with *processing*.
(Hopefully) no more than current tools force me to  co-relate them.  They
should be seperate, but are more often than not, virtually synonymous.
Groves are setting the stage for a day when parsing and processing are
seperated.  At times I dream of that day, interspersed with my nightmares
imposed by current tools and requirements...

>Parsing is not an issue, because the document is either valid to its DTD or
>it isn't, and is either valid with respect the governing schema or isn't.
>Whether or not the document is valid doesn't affect how it is *processed*
>after parsing, which is purely a function of methods applied to types, not
>parsing, and is entirely independent of how the type information got
>associated with the data (whether by the architecture syntax or the
>interpretation of some XML-Data document).

The problem is that a number of tools/environment define a document's
model/style/environment by the DTD.  If I have a special setup for editing
DocBook documents, that setup needs to make some assumptions on your
instance.  It does not work when I hand it an instance which violate those
assumtions (because it is conformant to a DTD which uses DocBook as a base
architecture, rather than actually being conformant to the DocBook DTD).
If I have access to the source, I could go in and tweak it, but I would
have to do this either specifically for the new DTD or spend the time to
make the environment work with anything which remotely resembles
DocBook....more work than I want.

>>This is exactly what OO Inheritance allows a programmer to do.  You need
>>an extra attribute? Easy!  With AF's I either see the document as the new
>>DTD or I can not see the attribute... value lost either way.
>
>This is only true if you define your processing in terms of architectural
>instances derived from documents, but clearly, that is not the way
>architectures are intended to be used in the general case.  The
>architecture provides part of the processing and an architecture-aware
>processor must be able to associate architecture-specific processing with a
>document, but it's not an all-or-nothing proposition.  I must always be
>aware of the document's architectural nature as well as its base nature
>unless the only processing I care about at the moment is that defined by
>the architecture.

To an extent what I am asking for is an environment where I could build
tools using a traditional OO Inheritence model applied to the SGML AF
model.  A DSSSL Style sheet where I would only have to define rules for new
elements (or changed elements).

>This is why confusing architectures with object-oriented programming
>approaches is so dangerous: they are not the same thing and thinking that
>they are leads to erroneous conclusions and unrealistic expectations (such
>as that content models can be somehow inherited in any but the most trivial
>ways).

I agree that AFs shoud definitely no be equated with OO programming.  I do
see two things which any attempt to equate them does bring out.

1) DTD extension mechanisms which provide for simple type inheritence would
be very usefull.  AFs provide a limited solution, which presents new
difficulties.  This is a problem with SGML.  AFs are an excellent
workaround which stays within the system, and deserve considerable credit
for that.  My reel frustration is with SGML and the limits it imposes, not
AFs.

2) Tools which allow OOP inheritence style defaulting behaviour for
processing of elements based on element-type, architectural type.. AFs may
not map to OOP but they make OOP based processing tools easier...

>Note too that when you have DTD-less documents, problems of DTD syntax
>munging go away because you don't have any DTD syntax to mung.  Any munging
>is managed by the creators of derived schemas.  This is one of the beauties
>of XML--it frees us from the need to conflat schema definition with the
>definition of the parsing rules for document instances.  

But this puts added burden on the tools since all bets are off as to what
the structure looks like.  AF at least provide a set mechanism for mapping
to a known structure.

-derek

     Derek E. Denny-Brown II      ||   ddb@criinc.com
     "Reality is that which,      ||   Seattle, WA USA
  when you stop believing in it,  ||  WWW/SGML/HyTime/XML
 doesn't go away."  -- P. K. Dick || Java/Perl/Scheme/C/C++

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From digitome at iol.ie  Tue Sep 30 00:26:19 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:31 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <199709292226.XAA06786@GPO.iol.ie>

[Rick Jelliffe]
>
>Because their form of schemas are so complicated and verbose to read
>that you will need browsing tools to manipulate them.  This in turn
>gives schemas (even though they are written in XML) the nature
>of binary objects rather than textual objects.
>
A good point. I have fond memories of being able to understand Make
files for example! These days, with "advanced" tools they are still
"text only" they are pretty impenetrable and effectively locked in to
particular tools:-(

On the other hand, in the specific case of XML-Data I would have to say
I am in favour. DTDs are prefectly good "documents".  XML's reputation as a
meta-language is, I think,  positively served by its use to describe "itself" in
this way.

The approach obviously has its practical limits though. The further one gets
from
"data" the closer one gets towards "algorithm" -  the less *practical* a tagged
 representation becomes. Full scale Scheme would be pretty inpenetrable in
XML but it would be possible! The fact that it is entirely possible is the
important thing. It means (doesn't it????) that  XML can be viewed as the
bed-rock on which all the other required syntactic "short hands" can be based.

So XML could have 8879 DTDs. It could also have a DTD for 8879 DTDs.
Core XML could interpret the latter directly, supporting the 8879 syntax via
a transformation. Future syntaxes, methods etc.; for achieving what 8879 DTDs
achieve could then be cleanly layered on top.


Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ricko at allette.com.au  Tue Sep 30 06:56:20 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:31 2004
Subject: revised Animal-friends implemented as a pattern (Re: XML-Data:advantages over DTD syntax?)
Message-ID: <199709300500.PAA07205@jawa.chilli.net.au>

Someone has pointed out that the colonized syntax would be
approporiate and clearer.  Here it is again (sorry!) with
colons.  (I have also cleaned up the inheritance to bundle
things more, so please delete previous version.)

Actually, this following fragment is illegal, because 
you cannot use ANY inside a content model. I am not sure how
to read the XML-data format here, but I think this exposes
a flaw in their example:  if pet can contain any subelements,
what use is it to say it can also contain a kitten subelement?
Duplicate paths are a little worrying, if that what they
have done.

If it were desired to use ANY in this way (i.e. different
to how SGML uses it), then it could be coped with by
parametising includes and excludes in a similar fashion.
(Again I can provide example if needed, but I hope not.)

----------
> From: Jonathan Robie <jwrobie@mindspring.com>
> To: ricko@allette.com.au
 
> At 05:02 AM 9/30/97 +1000, Rick Jelliffe wrote:
>  
> >If you want multiple inhereitance, then you can just 
> >define a different suffix, and search through attributes
> >based on that to collect the inheritance tree. I can
> >provide an example if anyone is interested.
>  
> Please!
 
Here is a version which allows multiple inheritance.
(Some parenthesis problems fixed too.)
I have put in even empty attribute values, to make
the pattern uniform in every case, so please do not
confuse this simplicity for elaborateness!

To extract the inheritance tree, collect all attributes
with ":inherit" suffix.  I think the only novel thing
is that people are not used to wildcard searches on 
attribute names, but this is only prejudice.

Also, I think because some tools require precompiled
DTDs, there is a general view in some circles that
DTDs are always compiled, and always made prior
to the generation of the instance. But that is
not intrinsic to SGML.

The PATTERN
-----------

This pattern reserves the suffixes:
	contents	 for a parameter entity with the 
                       element type's contents
	attributes  for a parameter entity with the 
                       element type's attributes
	inherit     for a fixed attribute with the 
                       element type's immediate inheritance

The pattern is
	<!ENTITY % {GI}:contents
		" {CONTENT-MODEL}
		{INHERITED-CONTENT-MODELS} ">
	<!ENTITY % {GI}:attribute
		" {ATTRIBUTE-DECLARATIONS} 
		{INHERITED-ATTRIBUTE-DECLARATIONS}
		{GI}:inherit CDATA #FIXED '' ">
	<!ELEMENT {GI}
		( %{GI}:contents; ) >
	<!ATTLIST {GI}
		%{GI}:attributes;
		>
Where the delimiters {} indicate parameters of the template
which you or your application edit in.  

The EXAMPLE
-----------

<!DOCTYPE animal-friends
[
<!-- Handle animal friends ================================= -->
<!ENTITY % animal-friends:contents 
	" ( pet | cat | dog )* "
<!ENTITY % animal-friends:attributes
	"  animal-friends:inherit CDATA #FIXED '' ">
<!ELEMENT  animal-friends
	( %animal-friends:contents; )>
<!ATTLIST animal-friends
	%animal-friends:attributes;
	>

<!-- Handle pets =========================================== -->
<!ENTITY % pet::contents 
	" ANY " >
<!ENTITY % pet:attributes 
	" name ID #IMPLIED
	owner ID #IMPLIED 
	pet:inherit CDATA #FIXED '' "
	 >   
<!ELEMENT pet
	%pet:contents;  >
<!ATTLIST pet
	%pet:attributes;
	>

<!-- Handle cats =========================================== -->
<!ENTITY % cat:contents
	" ( %pet:contents;, kittens)? " 
<!ENTITY % cat:attributes
	" lives NMTOKEN #IMPLIED 
	%pet:attributes;
	cat:inherit CDATA #FIXED 'pet' ">
<!ELEMENT cat
	( %cat:contents; )
 <!ATTLIST cat
	%cat:attributes;
	>

<!-- Handle dogs =========================================== -->
<!ENTITY % dog:contents
	" ( %pet:contents;, puppies?) " 
<!ENTITY % dog:attributes
	" breed CDATA #IMPLIED
	 %pet:attributes;
       dog:inherit CDATA #FIXED 'pet' ">
<!ELEMENT dog
	( %dog:contents; )
 <!ATTLIST dog
	%dog:attributes;
	>
]>

<animalFriends>
  <cat name="Fluffy" lives='9'/>
  <pet name="Diego"/>
  <dog name="Gromit" owner='Wallace' breed='mutt'/>
</animalFriends>


Please note that I am not saying that this form is always
preferable to using AFs or XML-data.  But it can be done
in XML as it stands now, keeping valid SGML declarations.
And, as has been mentioned, there should be interconversion
possible between the three forms, since they give the
same information.  If XML-data requires the use of specialist
tools to mapulate, since it is so verbose, then this pattern
cannot either be regarded as excessively verbose either, 
since the same kind of tools can be constructed to simplify
creating new objects.


Rick Jelliffe

 
xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ht at cogsci.ed.ac.uk  Tue Sep 30 10:35:37 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun  7 16:58:31 2004
Subject: Animal-friends implemented as a pattern (Re: XML-Data:advantages over DTD syntax?)
In-Reply-To: "Rick Jelliffe"'s message of Tue, 30 Sep 1997 05:54:19 +1000
References: <199709291958.FAA24998@jawa.chilli.net.au>
Message-ID: <715.199709300835@grogan.cogsci.ed.ac.uk>

Note that as written Rick's solution lacks a feature of the XML-Data
proposal, namely that e.g. in the internal subset I can add a new
declaration

<elementType id="stick-insect" extends="#pet"/>
  <empty/>
  <attribute id='colour'/>
</elementType>

and non-intrusively extend the content model of animal-friends.  To
cover this Rick's solution would need place-holding empty parameter
entities in most of his existing entities, e.g.

<!ENTITY % animal-friends-content 
	" ( pet | cat | dog %extra-animal-friends-content )* "

so that you could do

<!ENTITY % extra-animal-friends-content '| stick-insect'>

[Note this is not valid XML, I don't think]

This I think completes the reductio -- the point is not that you can
do things with schemata that you can't do in XML, but that you can do
them in ways which are vastly more transparent and maintainable.  Just
because we CAN write all logical formulae using only Shaeffer stroke
and constants doesn't mean we SHOULD do so.  Occam didn't say "Don't
proliferate", he said "Don't proliferate beyond necessity".

Note also that I argued at the XML day in Montreal that to avoid the
dangers of multiple incompatible approaches to schemata, we should
always provide a semantics in terms of vanilla XML, which is how I'd
describe what Rick has shown is possible!

ht

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Tue Sep 30 13:16:16 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:31 2004
Subject: Animal-friends implemented as a pattern (Re:
  XML-Data:advantages over DTD syntax?)
Message-ID: <1.5.4.32.19970930111016.009ead94@pop.mindspring.com>

At 09:35 AM 9/30/97 BST, Henry S. Thompson wrote:

So now we have all the players!

Henry, could I ask you to list all the main advantages you see for XML-Data
over XML with architectural forms? Yesterday's traffic makes me think that
this would be a great place to discuss the issues in some depth. One side of
the debate seems to say that XML-Data adds no new functionality, and the
other says that it adds significant new functionality. At this point, I am
not convinced that I know enough to say one way or another.

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From zwang at pstat.ucsb.edu  Tue Sep 30 21:25:55 1997
From: zwang at pstat.ucsb.edu (Zheng Wang)
Date: Mon Jun  7 16:58:31 2004
Subject: msxml contentmodel
Message-ID: <Pine.GSO.3.95.970930122041.29595A-100000@fisher>

 Hello,

 We are trying to write an editor application that uses XML via the
 MSXML parser. What we plan to do is to let the editor read the DTD and
 then provide users with an interactive environment that they use to
 fill out the content of the xml document. 

 The problem we have is that MSXML does not provide access to the
 content model of the DTD through the Document class. The API it
 provides is mainly through the Document class. We are not sure whether
 Microsoft intended that the interface to the DTD content model not be
 available (directly or indirectly) to the application. Could anyone
 shed light on how to use MSXML to access the DTD content model, or
 does anyone know if some of the other parsers (e.g., NXP, LARK) 
 provide an interface to the DTD content model?  Also, how does this
 relate to SGML groves as I have seen discussed on XML-DEV at various
 times? 

 Thanks

 Zheng and Matt,
 NCEAS, UCSB


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)