From ht at cogsci.ed.ac.uk Mon Sep 1 18:09:22 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:58:23 2004 Subject: New release of LT XML toolkit, including Windows95/Windows NT binaries Message-ID: <4005.199709011605@grogan.cogsci.ed.ac.uk> The HCRC Language Technology Group is pleased to announce a new release of LT XML, the first high-performance publicly available XML toolset written in C. For further information and access to the software distribution, see http://www.ltg.ed.ac.uk/software/xml/ The LT XML tool-kit includes stand-alone tools for a wide range of processing of well-formed XML documents, including searching and extracting, down-translation (e.g. report generation, formatting), tokenising and sorting. If you've been waiting for high throughput XML tools with simple command-line interfaces to explore the potential of XML, LT XML is just what you need to get started. Basic throughput is under 3 seconds/megabyte on a Pentium 133, fast enough to make processing substantial XML datasets feasible. LT XML is an integrated set of XML tools and a developers' tool-kit, including a C-based API. As well as sources, this release includes executable images for a range of platforms, including Windows 95 and Windows NT, FreeBSD, Linux and Solaris. A preliminary partial Macintosh version is also available. This release is restricted to 8-bit character input/output, and does NOT do validation, although it does process and make use of DTDs in documents which include them. Sequences of LT XML tool applications can be pipelined together to achieve complex results. Tools included in this release include: * sggrep -- extract sub-parts of XML documents, using patterns over element structure and text content; * textonly -- extract text content only; * sgsort -- reorder sub-elements within specified elements * sgmltrans -- pattern+action downtranslation tool * sgrpg -- Structure-based transformation tool * simple, simpleq -- event- and fragment-based examples of API use For special purposes beyond what the pre-constructed tools can achieve, extending their functionality and/or creating new tools is easy using the LT XML API, which provides both event-oriented and tree-fragment oriented access to the input document stream. Minimal applications require less than one-half page of C code to express. LT XML is available to anyone free of charge for non-commercial purposes. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tfj at apusapus.demon.co.uk Tue Sep 2 11:22:01 1997 From: tfj at apusapus.demon.co.uk (Trevor Jenkins) Date: Mon Jun 7 16:58:23 2004 Subject: Other whitespace problems was Re: Whitespace rules (v2) In-Reply-To: <3.0.32.19970818162238.00902760@pop.intergate.bc.ca> Message-ID: <199709020007.tfj.2207@apusapus.demon.co.uk> > At 09:52 PM 18/08/97 +0000, Trevor Jenkins wrote: > > I'm > >convinced that as they stand the separator rules in XML are > >ambiguous. > > Yes; Michael Sperberg-McQueen and I both agree that these need > some more work. Only "some". ;-) > If it weren't for the $#*!@#%#!ing Parameter Entities, ... These do seem to be allowed in some very odd places. Even for compatibility I see no reason to allow them in element declarations where %Name occurs. In SGML these was a useful feature; in XML these are obscurantist. > all this would be simple and straightforward - designing a grammar > for the SGML element declaration language is not exactly rocket > science. But it is computing science. I know some adherents of this list despise computing scientists (I heard one of you say so publicly a few months ago) but we can fix this problem. > But when you try to pollute the grammar by saying where you can > and can't replace chunks of it with PE references, it all of a > sudden gets hideously difficult. I've been on holiday since my original posting and relaxed by trying to define an equivalent grammar to describe XML that does not have the convolutions of the existing BNF one. > ... SGML gets around this with the clever device ... I get around this with the cunning plan of using a W-grammar rather than BNF. Some may recall W-grammars as the formalism used to define the Algol-68 programming language. > ... > Anyhow, further grammar engineering is in order. One thing to > think about is simply to drop the 'S' (space) nonterminal, write > a couple of simple tokenization rules, and take it that way. CMSMcQ > has investigated this at length, but it has problems too. My equivalent W-grammar for XML does not have any S nonterminals at all. The number of rules is roughly the same as the "official" BNF set. I think that mine are simpler and correct. However, I did add some meta-productions and hyper-rules to accommodate the parameter entity problem and to enforce the quoting rules. This increase in size is justified as I also made the grammar LL(1), which the official one is not. > Pardon me for whining; I'm sure we'll figure out something. -Tim Any one interested in my version of the grammar should email me and I'll gladly send you a copy. Be warned though you have to be a computing scientist to understand it. :-) If there's enough interst I'll post it to the list. Regards, Trevor. -- "Real Men don't Read Instruction Manuals" Tim Allen, Home Improvement xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tfj at apusapus.demon.co.uk Wed Sep 3 18:43:21 1997 From: tfj at apusapus.demon.co.uk (Trevor Jenkins) Date: Mon Jun 7 16:58:23 2004 Subject: Parameter Entity Reference Considered Harmful Message-ID: <199709031539.tfj.2212@apusapus.demon.co.uk> In making one more pass through the official grammar for XML, before I despatch my alternative version to the 5 people who've requested copies, I spotted a real dumb error in the doctype declaration. The existing definition says: doctypedecl ::= '' Now the notational devie of prefixing a production name with %, and I quote, "specifies that in the external DTD subset..." (emphasis copied from the definition). But notice that this %markupdecl is NOT in the external DTD subset at all! Also the definition of the % device introduces another set of ambiguities from white space. Me thinks that the existing official grammar is in desparate need of a re-write. Regards, Trevor. -- "Real Men don't Read Instruction Manuals" Tim Allen, Home Improvement xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From gannon at commerce.net Wed Sep 3 20:34:29 1997 From: gannon at commerce.net (Patrick Gannon) Date: Mon Jun 7 16:58:23 2004 Subject: Internet Week Article on CommerceNet & XML Message-ID: <01BCB85C.8DC17860@arrow-d86.sierra.net> XML Grabs Markup Baton -- CommerceNet pilot aims push enabler at EDI, Web catalogs. You can read the Internet Week article at: http://www.techweb.com/se/directlink.cgi?INW19970901S0087 A good overview of XMl and CommerceNet's activities using XML. Enjoy! Patrick Gannon, Executive Director Information Access Portfolio, CommerceNet http://www.commerce.net ----------------------------------------- President & CEO Internet Shopping Directory, Inc. 865 Tahoe Blvd., Suite 211 Incline Village, NV 89451 702-831-2251 702-831-3925 (Fax) mailto://patrick@shoppingdirect.com http://www.shoppingdirect.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Wed Sep 3 21:54:40 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:23 2004 Subject: Character classification Message-ID: <3.0.32.19970903125120.00796e20@pop.intergate.bc.ca> I've been working on making Lark really do Unicode. JDK 1.1 is supposed to have, unlike 1.0, a usable input method; thus the problem is to check, when you're reading a GI or Attribute name, whether the characters are legal namestart/name characters. It turns out to be quite a lot of work, so this is an offer to share. I wrote a program (based on Lark) that pulls the relevant character classes out of the XML spec, picks apart the markup, and writes another Java class that has some static arrays and offers two methods: package textuality.lark; public class CharClasses { public static boolean isNameC(char c) public static boolean isNameStart(char c) } It needs about 4k of tables (which it binary-searches); it might be faster with 128k of byte-addressable tables or 16K of bitmaps, neither of which would be hard to implement. (a) is this a waste of time, i.e. are there Unicode library calls that do it? (b) if not, has everyone else already done this? (c) if not, if I'm going to publish this, is the API above OK? I've attached the current Java source file for those who find the explanation above insufficiently clear. -------------- next part -------------- // Synthetically generated; do not edit! // package textuality.lark; import java.util.*; public class CharClasses { static final char[] sNameStart = { 170,170, 181,181, 186,186, 192,214, 216,246, 248,501, 506,535, 592,680, 688,696, 699,705, 736,740, 890,890, 902,902, 904,906, 908,908, 910,929, 931,974, 976,982, 986,986, 988,988, 990,990, 992,992, 994,1011, 1025,1036, 1038,1103, 1105,1116, 1118,1153, 1168,1220, 1223,1224, 1227,1228, 1232,1259, 1262,1269, 1272,1273, 1329,1366, 1369,1369, 1377,1415, 1488,1514, 1520,1522, 1569,1594, 1601,1610, 1649,1719, 1722,1726, 1728,1742, 1744,1747, 1749,1749, 1765,1766, 2309,2361, 2365,2365, 2392,2401, 2437,2444, 2447,2448, 2451,2472, 2474,2480, 2482,2482, 2486,2489, 2524,2525, 2527,2529, 2544,2545, 2565,2570, 2575,2576, 2579,2600, 2602,2608, 2610,2611, 2613,2614, 2616,2617, 2649,2652, 2654,2654, 2674,2676, 2693,2699, 2701,2701, 2703,2705, 2707,2728, 2730,2736, 2738,2739, 2741,2745, 2749,2749, 2784,2784, 2821,2828, 2831,2832, 2835,2856, 2858,2864, 2866,2867, 2870,2873, 2877,2877, 2908,2909, 2911,2913, 2949,2954, 2958,2960, 2962,2965, 2969,2970, 2972,2972, 2974,2975, 2979,2980, 2984,2986, 2990,2997, 2999,3001, 3077,3084, 3086,3088, 3090,3112, 3114,3123, 3125,3129, 3168,3169, 3205,3212, 3214,3216, 3218,3240, 3242,3251, 3253,3257, 3294,3294, 3296,3297, 3333,3340, 3342,3344, 3346,3368, 3370,3385, 3424,3425, 3585,3630, 3632,3632, 3634,3635, 3648,3653, 3713,3714, 3716,3716, 3719,3720, 3722,3722, 3725,3725, 3732,3735, 3737,3743, 3745,3747, 3749,3749, 3751,3751, 3754,3755, 3757,3758, 3760,3760, 3762,3763, 3773,3773, 3776,3780, 3804,3805, 3904,3911, 3913,3945, 4256,4293, 4304,4342, 4352,4441, 4447,4514, 4520,4601, 7680,7835, 7840,7929, 7936,7957, 7960,7965, 7968,8005, 8008,8013, 8016,8023, 8025,8025, 8027,8027, 8029,8029, 8031,8061, 8064,8116, 8118,8124, 8126,8126, 8130,8132, 8134,8140, 8144,8147, 8150,8155, 8160,8172, 8178,8180, 8182,8188, 8319,8319, 8450,8450, 8455,8455, 8458,8467, 8469,8469, 8472,8477, 8484,8484, 8486,8486, 8488,8488, 8490,8497, 8499,8504, 8544,8578, 12295,12295, 12321,12329, 12353,12436, 12449,12538, 12549,12588, 12593,12686, 19968,40869, 44032,55203, 63744,64045, 64256,64262, 64275,64279, 64287,64296, 64298,64310, 64312,64316, 64318,64318, 64320,64321, 64323,64324, 64326,64433, 64467,64829, 64848,64911, 64914,64967, 65008,65019, 65136,65437, 65440,65470, 65474,65479, 65482,65487, 65490,65495, 65498,65500 }; static final char[] sNameC = { 170,170, 181,181, 183,183, 186,186, 192,214, 216,246, 248,501, 506,535, 592,680, 688,696, 699,705, 720,721, 736,740, 768,837, 864,865, 890,890, 902,906, 908,908, 910,929, 931,974, 976,982, 986,986, 988,988, 990,990, 992,992, 994,1011, 1025,1036, 1038,1103, 1105,1116, 1118,1153, 1155,1158, 1168,1220, 1223,1224, 1227,1228, 1232,1259, 1262,1269, 1272,1273, 1329,1366, 1369,1369, 1377,1415, 1425,1441, 1443,1465, 1467,1469, 1471,1471, 1473,1474, 1476,1476, 1488,1514, 1520,1522, 1569,1594, 1600,1618, 1632,1641, 1648,1719, 1722,1726, 1728,1742, 1744,1747, 1749,1768, 1770,1773, 1776,1785, 2305,2307, 2309,2361, 2364,2381, 2385,2388, 2392,2403, 2406,2415, 2433,2435, 2437,2444, 2447,2448, 2451,2472, 2474,2480, 2482,2482, 2486,2489, 2492,2492, 2494,2500, 2503,2504, 2507,2509, 2519,2519, 2524,2525, 2527,2531, 2534,2545, 2562,2562, 2565,2570, 2575,2576, 2579,2600, 2602,2608, 2610,2611, 2613,2614, 2616,2617, 2620,2620, 2622,2626, 2631,2632, 2635,2637, 2649,2652, 2654,2654, 2662,2676, 2689,2691, 2693,2699, 2701,2701, 2703,2705, 2707,2728, 2730,2736, 2738,2739, 2741,2745, 2748,2757, 2759,2761, 2763,2765, 2784,2784, 2790,2799, 2817,2819, 2821,2828, 2831,2832, 2835,2856, 2858,2864, 2866,2867, 2870,2873, 2876,2883, 2887,2888, 2891,2893, 2902,2903, 2908,2909, 2911,2913, 2918,2927, 2946,2947, 2949,2954, 2958,2960, 2962,2965, 2969,2970, 2972,2972, 2974,2975, 2979,2980, 2984,2986, 2990,2997, 2999,3001, 3006,3010, 3014,3016, 3018,3021, 3031,3031, 3047,3055, 3073,3075, 3077,3084, 3086,3088, 3090,3112, 3114,3123, 3125,3129, 3134,3140, 3142,3144, 3146,3149, 3157,3158, 3168,3169, 3174,3183, 3202,3203, 3205,3212, 3214,3216, 3218,3240, 3242,3251, 3253,3257, 3262,3268, 3270,3272, 3274,3277, 3285,3286, 3294,3294, 3296,3297, 3302,3311, 3330,3331, 3333,3340, 3342,3344, 3346,3368, 3370,3385, 3390,3395, 3398,3400, 3402,3405, 3415,3415, 3424,3425, 3430,3439, 3585,3630, 3632,3642, 3648,3662, 3664,3673, 3713,3714, 3716,3716, 3719,3720, 3722,3722, 3725,3725, 3732,3735, 3737,3743, 3745,3747, 3749,3749, 3751,3751, 3754,3755, 3757,3758, 3760,3769, 3771,3773, 3776,3780, 3782,3782, 3784,3789, 3792,3801, 3804,3805, 3864,3865, 3872,3881, 3893,3893, 3895,3895, 3897,3897, 3902,3911, 3913,3945, 3953,3972, 3974,3979, 3984,3989, 3991,3991, 3993,4013, 4017,4023, 4025,4025, 4256,4293, 4304,4342, 4352,4441, 4447,4514, 4520,4601, 7680,7835, 7840,7929, 7936,7957, 7960,7965, 7968,8005, 8008,8013, 8016,8023, 8025,8025, 8027,8027, 8029,8029, 8031,8061, 8064,8116, 8118,8124, 8126,8126, 8130,8132, 8134,8140, 8144,8147, 8150,8155, 8160,8172, 8178,8180, 8182,8188, 8204,8207, 8234,8238, 8298,8303, 8319,8319, 8400,8412, 8417,8417, 8450,8450, 8455,8455, 8458,8467, 8469,8469, 8472,8477, 8484,8484, 8486,8486, 8488,8488, 8490,8497, 8499,8504, 8544,8578, 12293,12293, 12295,12295, 12321,12335, 12337,12341, 12353,12436, 12441,12446, 12449,12538, 12540,12542, 12549,12588, 12593,12686, 19968,40869, 44032,55203, 63744,64045, 64256,64262, 64275,64279, 64286,64296, 64298,64310, 64312,64316, 64318,64318, 64320,64321, 64323,64324, 64326,64433, 64467,64829, 64848,64911, 64914,64967, 65008,65019, 65056,65059, 65136,65470, 65474,65479, 65482,65487, 65490,65495, 65498,65500 }; public static boolean isNameC(char c) { return find(c, sNameC); } public static boolean isNameStart(char c) { return find(c, sNameStart); } // binary-search to find out if C is in one of the ranges in the // map. Remember that the map consists of pairs, not individuals. // If this turns into a horrible performance bottleneck, we could // put the maps into a 64k byte array or as a compromise 2 * 8k bitmaps; the // pair-array trick uses about 4k for both, at the cost of all this // binary searching private static boolean find(char c, char[] map) { int high, low, probe; low = -1; high = map.length/2; while ((high - low) > 1) { // invariant (modulo division by 2): // map[high] is strictly greater than c probe = (high + low) / 2; if (c < map[probe * 2]) high = probe; else low = probe; } return (low != -1 && c >= map[low*2] && c <= map[(low*2) + 1]); } } -------------- next part -------------- Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 From andrewl at microsoft.com Wed Sep 3 22:39:59 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:58:24 2004 Subject: Character classification Message-ID: <7BB61B44F197D011892800805FD4F79201436095@RED-03-MSG.dns.microsoft.com> JDK 1.1 is still broken for Unicode. Take a look at the code in the Microsoft XML Parser (http://www.microsoft.com/standards/xml) to see our work-arounds. --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: Tim Bray [SMTP:tbray@textuality.com] > Sent: Wednesday, September 03, 1997 12:51 PM > To: xml-dev@ic.ac.uk > Subject: Character classification > > I've been working on making Lark really do Unicode. JDK 1.1 is > supposed > to have, unlike 1.0, a usable input method; thus the problem is to > check, > when you're reading a GI or Attribute name, whether the characters are > legal namestart/name characters. > > It turns out to be quite a lot of work, so this is an offer to share. > I wrote a program (based on Lark) that pulls the relevant character > classes out of the XML spec, picks apart the markup, and writes > another > Java class that has some static arrays and offers two methods: > > package textuality.lark; > public class CharClasses > { > public static boolean isNameC(char c) > public static boolean isNameStart(char c) > } > > It needs about 4k of tables (which it binary-searches); it might be > faster > with 128k of byte-addressable tables or 16K of bitmaps, neither of > which > would be hard to implement. > > (a) is this a waste of time, i.e. are there Unicode library calls that > do it? > (b) if not, has everyone else already done this? > (c) if not, if I'm going to publish this, is the API above OK? > > I've attached the current Java source file for those who find the > explanation above insufficiently clear. > > Cheers, Tim Bray > tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 << > File: CharClasses.java.txt >> xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From istvanc at microsoft.com Thu Sep 4 00:24:57 1997 From: istvanc at microsoft.com (Istvan Cseri) Date: Mon Jun 7 16:58:24 2004 Subject: Character classification Message-ID: <91B7E292027DCF1195CD08002BB690B002457407@RED-93-MSG> For better speed I would suggest an alternative solution: use a quick array lookup for characters below 256 and go to the more expensive method above... It will do wonders with your parser. Istvan > ---------- > From: Tim Bray[SMTP:tbray@textuality.com] > Reply To: Tim Bray > Sent: Wednesday, September 03, 1997 12:51 PM > To: xml-dev@ic.ac.uk > Subject: Character classification > > <> > I've been working on making Lark really do Unicode. JDK 1.1 is > supposed > to have, unlike 1.0, a usable input method; thus the problem is to > check, > when you're reading a GI or Attribute name, whether the characters are > legal namestart/name characters. > > It turns out to be quite a lot of work, so this is an offer to share. > I wrote a program (based on Lark) that pulls the relevant character > classes out of the XML spec, picks apart the markup, and writes > another > Java class that has some static arrays and offers two methods: > > package textuality.lark; > public class CharClasses > { > public static boolean isNameC(char c) > public static boolean isNameStart(char c) > } > > It needs about 4k of tables (which it binary-searches); it might be > faster > with 128k of byte-addressable tables or 16K of bitmaps, neither of > which > would be hard to implement. > > (a) is this a waste of time, i.e. are there Unicode library calls that > do it? > (b) if not, has everyone else already done this? > (c) if not, if I'm going to publish this, is the API above OK? > > I've attached the current Java source file for those who find the > explanation above insufficiently clear. > > Cheers, Tim Bray > tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 > xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From colds at nwlink.com Thu Sep 4 01:56:21 1997 From: colds at nwlink.com (Chris Olds) Date: Mon Jun 7 16:58:24 2004 Subject: Character classification References: <91B7E292027DCF1195CD08002BB690B002457407@RED-93-MSG> Message-ID: <340DF8CE.FCD49856@nwlink.com> How are people dealing with UTF-8 vs. unicode vs. Latin-1? I have been working on a lexer (using Flex) that assumes the input stream is either Latin-1 or UTF-8 and returns byte strings to the caller. Since Java chars are Unicode, I assume that the Java XML parsers are doing the opposite, right? Is there any consensus on what form PCDATA or GI names should take when they are returned to the application? On a related note, when do character entities get replaced - in the lexer or later on? My reading of the draft is that the scanner must do the replacement if the examples of rescanning are to work. /cco Chris Olds colds@nwlink.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From istvanc at microsoft.com Thu Sep 4 17:01:36 1997 From: istvanc at microsoft.com (Istvan Cseri) Date: Mon Jun 7 16:58:24 2004 Subject: Character classification Message-ID: <91B7E292027DCF1195CD08002BB690B00245740C@RED-93-MSG> The Java parser is using Java char-s and Strings for storage so it is using Unicode. The GI-s are actually 'atomized' for memory savings and returned that way. PCDATA is stored in String chunks. The entities are preserved in special nodes but can be made transparent to the reader (user) of the parsed tree. Istvan > ---------- > From: Chris Olds[SMTP:colds@nwlink.com] > Reply To: Chris Olds > Sent: Wednesday, September 03, 1997 4:54 PM > To: xml-dev@ic.ac.uk > Cc: 'Tim Bray'; Istvan Cseri > Subject: Re: Character classification > > How are people dealing with UTF-8 vs. unicode vs. Latin-1? I have > been > working on a lexer (using Flex) that assumes the input stream is > either > Latin-1 or UTF-8 and returns byte strings to the caller. Since Java > chars are Unicode, I assume that the Java XML parsers are doing the > opposite, right? Is there any consensus on what form PCDATA or GI > names > should take when they are returned to the application? On a related > note, when do character entities get replaced - in the lexer or later > on? My reading of the draft is that the scanner must do the > replacement > if the examples of rescanning are to work. > > /cco > > Chris Olds colds@nwlink.com > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Thu Sep 4 17:33:34 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:24 2004 Subject: Character classification Message-ID: <3.0.32.19970904083026.008f1a90@pop.intergate.bc.ca> At 04:54 PM 03/09/97 -0700, Chris Olds wrote: > Is there any consensus on what form PCDATA or GI names >should take when they are returned to the application? On a related >note, when do character entities get replaced - in the lexer or later >on? My reading of the draft is that the scanner must do the replacement >if the examples of rescanning are to work. Like Istvan says, Java chars and Strings. However, you have to do lazy evaluation; if you foolishly make every little chunk of text you read into a String, you'll spend all your time in the Java String class implementation, and none doing useful work. Character entitities have to be replaced in two places, when you find them in an entity definition and when you find them in free text or an attribute value. -T. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tfj at apusapus.demon.co.uk Thu Sep 4 17:54:06 1997 From: tfj at apusapus.demon.co.uk (Trevor Jenkins) Date: Mon Jun 7 16:58:24 2004 Subject: Parameter Entity Reference Considered Harmful In-Reply-To: <199709031539.tfj.2212@apusapus.demon.co.uk> Message-ID: <199709040153.tfj.2217@apusapus.demon.co.uk> > In making one more pass through the official grammar for XML, before > I despatch my alternative version to the 5 people who've requested > copies, I spotted a real dumb error in the doctype declaration. Since posting this I did something even more useful. :-) I went back to ISO 8879 in which, of course, the use of paremeter entity references is allowed in both the "internal" and "external" subsets. As a programmer I reckon that the different handling of parameter entities between the internal and external subsets makes things MORE complicated rather than simpler. I knew there was something wrong with the avowed claim that XML was a subset of SGML. If they're to be allowed (and they are allowed in some odd places in XML) then let them occur wherever SGML lets them occur. Regards, Trevor. -- "Real Men don't Read Instruction Manuals" Tim Allen, Home Improvement xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Patrice.Bonhomme at loria.fr Thu Sep 4 21:38:05 1997 From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme) Date: Mon Jun 7 16:58:24 2004 Subject: parameter entity with msxml Message-ID: <199709041935.VAA00751@chimay.loria.fr> Hi there, I am using msxml to read/parse my XML documents within a Java application. I wondered if msxml takes care of parameter entity. This example doesn't work : %ISOtech; %ISOlat1; %ISOlat2; %ISOgrk1; %ISOgrk2; %ISOgrk3; %ISOgrk4; ]>

J'ai décide d'écrire un livre sur l'Espace et le Temps à l'intention du grand public après les conférences Loeb que j'ai données à Harvard en 1982.

I've got this exception : Error: test-ent.xml(21,17) Context: - - -

- com.ms.xml.ParseException: Missing entity eacute at com.ms.xml.Parser.error(Parser.java:110) at com.ms.xml.Parser.scanEntityRef(Parser.java:440) at com.ms.xml.Parser.scanText(Parser.java:395) at com.ms.xml.Parser.parseText(Parser.java:1223) at com.ms.xml.Parser.parseElement(Parser.java:1081) at com.ms.xml.Parser.parseDocument(Parser.java:643) at com.ms.xml.Parser.parse(Parser.java:47) at com.ms.xml.Document.load(Document.java:171) at msxml.main(msxml.java:50) Thanks for any help... Pat. ============================================================== bonhomme@loria.fr | Office : B.228 http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 -------------------------------------------------------------- * Projet Aquarelle http://aqua.inria.fr * Serveur Silfide http://www.loria.fr/Projet/Silfide * Multilingual Concordancing http://www.loria.fr/~bonhomme/lingua/ ============================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Fri Sep 5 01:40:56 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:24 2004 Subject: Lark 0.91 available Message-ID: <3.0.32.19970904163748.00838210@pop.intergate.bc.ca> Hi - Lark 0.91 is now available at http://www.textuality.com/Lark Only one real difference - it now does Unicode. It reads the BOM and thus UCS-2/UTF-16 (even byte-swaps); if there's no BOM, reads and tries to use the encoding declaration, boots it if it says anything but "UTF-8" or "UTF8". Successfully parses Murata-san's translation of the XML spec, would love to get my hands on some more internationalized XML; in particular with non-ASCII markup. Another 6K of .class files for I18n, sigh. Lots of bug-fixes in the event-stream module. I had to write a significant event-stream Lark application to pull the character classes out of the XML spec in order to build the CharClasses.java file, and ran across a few bodacious bugs in end-tag handling. It's a bit bogus because it really doesn't do UTF-8 yet, just ASCII masquerading as such. UTF-8 Real Soon Now. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jjc at jclark.com Fri Sep 5 07:10:54 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:58:24 2004 Subject: Character classification References: <91B7E292027DCF1195CD08002BB690B002457407@RED-93-MSG> Message-ID: <340F92EB.E84772CB@jclark.com> Istvan Cseri wrote: > > For better speed I would suggest an alternative solution: use a quick > array lookup for characters below 256 and go to the more expensive > method above... It will do wonders with your parser. Except of course when you're parsing non-Latin scripts. There's another technique which exploits the fact that characters on the same page often have similar properties, and this is true even more so for characters in the same column. The idea is to have a three-level table, the first level with 256 entries, the second and third levels with 16 entries. The entries for the first and second levels are a (possibly null) pointer to a sub-table plus a value; the entries for the third level are just values. To look up the value for a character, you use the high 8 bits to index into the first-level table; if the pointer part of the entry is null, then return the value part of entry; otherwise use the sub-table table addressed by the pointer; use the next 4 bits to index into that in a similar way, and, if necessary, the bottom 4 bits to index into the bottom table. This is I believe quite a well-known technique; I got it from Glenn Adams. You can use this to implement case-folding by storing the difference between a character and its upper-case equivalent modulo 2^16. There's a C++ implementation of this in SP in include/CharMap.h. James xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From crism at ora.com Fri Sep 5 16:15:09 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:58:25 2004 Subject: DSSSL Digest now publicly available In-Reply-To: <998.199709051356@grogan.cogsci.ed.ac.uk> (ht@cogsci.ed.ac.uk) Message-ID: <199709051417.KAA01715@geode.ora.com> The announcement of the DSSSL Digest (or reference) at sparked me to get around to announcing my SGML reference on comp.text.sgml. For those of you who don't read c.t.s, it's at . I find it very useful day-to-day, especially when checking that XML remains valid HTML. As posted to c.t.s, this information is copyright ISO, and is intended only as a supplement to ISO 8879. (You won't find it very useful without the accompanying text anyhoo.) -Chris -- http://www.oreilly.com/people/staff/crism/ +1.617.499.7487 90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From istvanc at microsoft.com Fri Sep 5 18:02:59 1997 From: istvanc at microsoft.com (Istvan Cseri) Date: Mon Jun 7 16:58:25 2004 Subject: Character classification Message-ID: <91B7E292027DCF1195CD08002BB690B00245740F@RED-93-MSG> You are right, it is a well known technique, Java JDK1.1 in fact uses very similar code for character classification. I replaced that with the simple 256 member array lookup (for characters in that range) and it sped up the parser ~10%. Istvan > ---------- > From: James Clark[SMTP:jjc@jclark.com] > Reply To: James Clark > Sent: Thursday, September 04, 1997 10:04 PM > To: xml-dev@ic.ac.uk > Subject: Re: Character classification > > Istvan Cseri wrote: > > > > For better speed I would suggest an alternative solution: use a > quick > > array lookup for characters below 256 and go to the more expensive > > method above... It will do wonders with your parser. > > Except of course when you're parsing non-Latin scripts. > > There's another technique which exploits the fact that characters on > the > same page often have similar properties, and this is true even more so > for characters in the same column. > > The idea is to have a three-level table, the first level with 256 > entries, the second and third levels with 16 entries. The entries for > the first and second levels are a (possibly null) pointer to a > sub-table > plus a value; the entries for the third level are just values. To look > up the value for a character, you use the high 8 bits to index into > the > first-level table; if the pointer part of the entry is null, then > return > the value part of entry; otherwise use the sub-table table addressed > by > the pointer; use the next 4 bits to index into that in a similar way, > and, if necessary, the bottom 4 bits to index into the bottom table. > > This is I believe quite a well-known technique; I got it from Glenn > Adams. > > You can use this to implement case-folding by storing the difference > between a character and its upper-case equivalent modulo 2^16. > > There's a C++ implementation of this in SP in include/CharMap.h. > > James > > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Fri Sep 5 18:47:47 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:25 2004 Subject: Character classification Message-ID: <3.0.32.19970905094447.008fcad0@pop.intergate.bc.ca> At 12:04 PM 05/09/97 +0700, James Clark wrote: >There's another technique Of course, then there's the space/time trade-off. In particular, in XML, the proportion of times when you're going to change parsing state based on whether something's a NameChar/NameStart is not that high; so how much table space & traversal code is it worth investing in speeding up that case? Maybe a lot, maybe not. What we need is a truly good profiler. Anyone with a good Java profiler experience to share? I speeded Lark up substantially with 0.91, just by code-walking and guessing. This is not the right way to do it. -T. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Support at EpiphanySoftware.com Fri Sep 5 20:37:52 1997 From: Support at EpiphanySoftware.com (Andrew Cogan) Date: Mon Jun 7 16:58:25 2004 Subject: Resolving links Message-ID: <341050F4.2932184C@EpiphanySoftware.com> Introductory apology: I'm a newcomer to XML, so forgive me if this topic has already been covered. Does/will XML include a way to resolve links by using a mapping table external to the originating document, or alternatively by calling a process? In this scenario, there would either a "library manager" process, or a registry file containing a list of symbolic document names along with their physical locations. This would enable a link in document "A" to refer to document "B" without concern for whether document B's location is on a CD-ROM, a hard disk, or the Web. It would also allow document B's location to change over time without invalidating the link in document A. -- Andy Cogan Epiphany Software E-mail: andrew@EpiphanySoftware.com Voice: (408) 378-6145 Web: http://www.EpiphanySoftware.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jwrobie at mindspring.com Fri Sep 5 21:30:10 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:25 2004 Subject: SGML/XML Developer's Group, Research Triangle area, North Carolina Message-ID: <1.5.4.32.19970905192951.009cdb5c@pop.mindspring.com> I am interested in starting a user's group for people developing SGML and XML applications in the Research Triangle area of North Carolina. I would like this group to be oriented towards developers rather than end users. The goal of the group would be to learn from each other about the emerging XML-based standards and APIs, new design techniques using architectural forms and components, tools, discuss various program architectures and document designs, and to get to know the other people who are working on SGML and XML projects in our local area. If anybody would be interested in such a group, please contact me via email. Jonathan *************************************************************************** Jonathan Robie jwrobie@mindspring.com http://www.mindspring.com/~jwrobie POET Software, 3207 Gibson Road, Durham, N.C., 27703 http://www.poet.com *************************************************************************** xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jwrobie at mindspring.com Fri Sep 5 22:51:40 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:25 2004 Subject: SGML/XML Developer's Group, Research Triangle area, North Carolina Message-ID: <1.5.4.32.19970905205128.009fc340@pop.mindspring.com> Very interesting - I'll contact you in private email. I'd be interested in having a presentation on XML/EDI, and a focus group could develop out of that. Let's do the rest offline (and over beer!) Jonathan *************************************************************************** Jonathan Robie jwrobie@mindspring.com http://www.mindspring.com/~jwrobie POET Software, 3207 Gibson Road, Durham, N.C., 27703 http://www.poet.com *************************************************************************** xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From zwang at pstat.ucsb.edu Fri Sep 5 23:07:54 1997 From: zwang at pstat.ucsb.edu (Zheng Wang) Date: Mon Jun 7 16:58:25 2004 Subject: Developer's Group Message-ID: I have followed the discussion for a period time. I think it is time to summarize the disscussion up to now and let the developers begin to work. This group will be much help in this aspect. Zheng Wang Department of Statistics and Applied Probability University of California, Santa Barbara E-mail: zwang@pstat.ucsb.edu; http://www.pstat.ucsb.edu/~zwang xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eliot at isogen.com Sat Sep 6 01:00:55 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:25 2004 Subject: Resolving links Message-ID: <3.0.32.19970905175827.00a1dbf4@swbell.net> At 11:35 AM 9/5/97 -0700, Andrew Cogan wrote: >Introductory apology: I'm a newcomer to XML, so forgive me if this topic >has already been covered. > >Does/will XML include a way to resolve links by using a mapping table >external to the originating document, or alternatively by calling a >process? > >In this scenario, there would either a "library manager" process, or a >registry file containing a list of symbolic document names along with >their physical locations. This would enable a link in document "A" to >refer to document "B" without concern for whether document B's location >is on a CD-ROM, a hard disk, or the Web. It would also allow document >B's location to change over time without invalidating the link in >document A. This problem, that of changes in the resource pointed to requiring changes in the documents that point to it, is one of the fundamental weaknesses of URLs as a form of address. You cannot have "industrial strength" addressing without some form of indirection that lets you isolate references in A from changes in B. SGML provides one fundamental form of indirect address, the entity reference, which when used with public IDs (rather that system IDs) protects the entity declarations from changes in the system identifiers of storage objects. However, entities alone cannot protect you from changes inside a storage object, so you must have some way to indirecting references to objects inside storage objects. The current XML Link spec does not allow entity references as a form of resource address. It also does not provide any other form of indirection. However, you're not limited to using only XML Link with XML documents--you can use anything you want, including normal SGML mechanisms and other public addressing architectures, such as the TEI and the HyTime architecture. Here's how you do entity-based indirection: ]> Somewhere else, you'd have a mapping for the public ID to the system ID: -- SGML Open catalog -- PUBLIC "-//You//DOCUMENT Your Document//EN" "/home/you/docs/mydoc.xml" -- End of catalog -- You could imagine a service analogous to DNS that would resolve public IDs to storage IDs (or rather, would resolve owner IDs to public ID servers, that is "-//You" would be associated with your public ID server, which then takes the rest of the public ID and resolves it to a storage object). XML Lang, of course, does allow you to declare ENTITY attributes, as I've done above, it's just that XML Link does associate any particular semantic with ENTITY attributes. So you can do the above, but you can't depend on systems that only support XML Lang and XML Link to help you (but any existing SGML system should handle the above). Both the TEI spec and the HyTime architecture provide indirect addresses that you can use to isolate a reference from the ultimate location of the target. For example, using HyTime indirect addresses, you could have a separate document that provided the mapping of persistent object names to URLs for those objects: ]> http://www.me.com/docs/mydoc1.xml http://www.me.com/docs/mydoc2.xml You could then use the mapping by making references to the URLloc elements: ]> Click here The HREF in my document points to a URLloc in the URL map document, which then gets us to the real URL, which may change at any time. One advantage of the entity approach is that you can use different catalogs without changing any of the documents involved (because the entity declaration and public ID provide an additional level of indirection, which is outside of any documents, namely in the public ID mapping catalog). As the XML Link specification is not yet finalized, its possible that we may include a way to address entities as resources of links and do indirect addressing. It should be clear from the above that the mechanism at work is pretty simple: given a two part address (storage object and ID within that object), use it to look up the next stage in the address (i.e., the URL in the content of the URLloc elements). That's all there is to it, and the above is 100% HyTime conforming (and if you implemented the above, you could call your system a conforming HyTime application). Cheers, Eliot xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Dick_Hardt at ActiveState.com Sat Sep 6 02:16:03 1997 From: Dick_Hardt at ActiveState.com (Dick Hardt) Date: Mon Jun 7 16:58:25 2004 Subject: Perl utilities for XML Message-ID: <3.0.1.32.19970905153037.016c4fac@pop3.activestate.com> Hello all, I searched the archive and it looks like there is some Perl development re: XML but I have not seen anything specific. Does anyone have anything or interested in a Perl module for XML? -- Dick xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From matt at wdi.disney.com Sat Sep 6 02:25:03 1997 From: matt at wdi.disney.com (Matthew Fuchs) Date: Mon Jun 7 16:58:25 2004 Subject: Perl utilities for XML In-Reply-To: Dick Hardt "Perl utilities for XML" (Sep 5, 3:30pm) References: <3.0.1.32.19970905153037.016c4fac@pop3.activestate.com> Message-ID: <9709051727.ZM4032@scrumpox.rd.wdi.disney.com> I've been hacking away desperately, but I'm not sure if I can put anything in the public domain. I'll have to check and see. Matthew On Sep 5, 3:30pm, Dick Hardt wrote: > Subject: Perl utilities for XML > Hello all, > > I searched the archive and it looks like there is some Perl development re: > XML but I have not seen anything specific. Does anyone have anything or > interested in a Perl module for XML? > > -- Dick > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > >-- End of excerpt from Dick Hardt -- ----------------------------------------------------- Matthew Fuchs matt@wdi.disney.com http://cs.nyu.edu/phd_students/fuchs ----------------------------------------------------- xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Sat Sep 6 23:10:00 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:25 2004 Subject: On Case and Performance Message-ID: <3.0.32.19970906140647.008f1830@pop.intergate.bc.ca> Recently, I whined: >What we need is a truly good profiler. Anyone with a good Java profiler >experience to share? I speeded Lark up substantially with 0.91, just by >code-walking and guessing. This is not the right way to do it. -T. Disgusted with myself, I went and found the Java Workshop Beta from java.sun.com, downloaded it (16M!) and ran its profiler. Well well, surprise, Lark was spending 91% of its time in this little routine that looks up a GI to see if we've seen it before. And in that routine, it was spending most of its time in Character.toUpperCase. Ouch. The code used to be: for (i = 0; i < name.length; i++) name[i] = sToUpper[name[i]]; Now it says for (i = 0; i < name.length; i++) if (name[i] < 127) name[i] = sToUpper[name[i]]; // 127-entry upcasing table else name[i] = Character.toUpperCase(name[i]); Note that toUpperCase is called only in the case when non-ASCII characters show up in GI/Attribute/Entity names. Resulting performance improvement in Lark, in processing the XML spec: a factor of 11.9. The Sun profiler is not quite as slick as gprof of yore, but it's not bad at all. -T. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Sat Sep 6 23:36:50 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:25 2004 Subject: On Case and Performance In-Reply-To: <3.0.32.19970906140647.008f1830@pop.intergate.bc.ca> (message from Tim Bray on Sat, 06 Sep 1997 14:06:53 -0700) Message-ID: <199709062134.OAA03801@boethius.eng.sun.com> [Tim Bray:] | Resulting performance improvement in Lark, in processing the XML spec: | a factor of 11.9. Now you've made me too curious to resist asking. What's the performance difference if you just compare codes directly and don't bother with case folding? Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Sat Sep 6 23:53:35 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:25 2004 Subject: On Case and Performance Message-ID: <3.0.32.19970906145033.008f17e0@pop.intergate.bc.ca> At 02:34 PM 06/09/97 -0700, Jon Bosak wrote: >| Resulting performance improvement in Lark, in processing the XML spec: >| a factor of 11.9. > >Now you've made me too curious to resist asking. What's the >performance difference if you just compare codes directly and don't >bother with case folding? To test that I'd have to go regularize the case of all the tags in the XML spec which seems like an unreasonable amount of work. Anyhow, the routine that checks whether we've seen a GI (where this stuff is) is taking 8.7% of the total time. So the gain from skipping the monocasing entirely is not going to be dramatic. In fact, it's now spending more time in BufferedInputStream.read() (oh for a good old-fashioned getc() macro). -Tim xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From phxsoft at ibm.net Sun Sep 7 05:01:05 1997 From: phxsoft at ibm.net (phxsoft@ibm.net) Date: Mon Jun 7 16:58:25 2004 Subject: unsubscribe phxsoft@ibm.net Message-ID: <9709070306.AA0224@slip166-72-179-153.or.us.ibm.net> unsubscribe phxsoft@ibm.net xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tikvas at agentsoft.com Sun Sep 7 12:55:17 1997 From: tikvas at agentsoft.com (Tikva Schmidt) Date: Mon Jun 7 16:58:25 2004 Subject: Examples for new XML demo Message-ID: <34127982.3142@agentsoft.com> AgentSoft Ltd. is about to put out a demo XML application in the next week or two. What we are missing is real valid xml content (dtd & xml files) that we can use with our demo. If you have an idea of examples I can use please let me know. I'll let you know when you can actually see the demo. Thanks. Tikva Schmidt. -------------------------------------------------------------------- Tikva Schmidt. email: tikvas@agentsoft.co.il corp: Agentsoft Ltd. http://www.agentsoft.co.il Phone: 972-2-6480573 --------------------------------------------------------------------- xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Mon Sep 8 17:11:02 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:25 2004 Subject: Examples for new XML demo In-Reply-To: <34127982.3142@agentsoft.com> (tikvas@agentsoft.com) Message-ID: <199709081508.IAA04251@boethius.eng.sun.com> [Tikva Schmidt:] | What we are missing is real valid xml content (dtd & xml files) | that we can use with our demo. | | If you have an idea of examples I can use please let me know. These chestnuts are still available: http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/religion.1.01.xml.zip http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/shakespeare.1.01.xml.zip The collections aren't complex enough to test parser features against (there are no attributes and no empty elements), but what they lack in complexity they make up for in size, so they're good for certain kinds of benchmarking. Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From dcarlson at ontogenics.com Mon Sep 8 21:31:16 1997 From: dcarlson at ontogenics.com (Dave Carlson) Date: Mon Jun 7 16:58:25 2004 Subject: parsing xml-data schemas Message-ID: <2.2.32.19970908192629.00bd9900@pop.dimensional.com> I'm new to XML, but have been going through all the specs, papers, and old mail list archives that I can find. I am especially interested in the metadata proposals, which seem to be centered around MCF and XML-Data. Apparently, these are being combined into the Reference Data Framework, and a secret meeting was held in Redmond 2 weeks ago. Well, at least secret to those of us who are not allowed access to the W3C working group :-( So, I'm left to guess. I want to start building a prototype using XML-Data, and probably Microsoft's XML Java parser. Am I wasting my time building something to this spec, or is the current RDF completely different? Assuming that it's useful to proceed... I've created a small schema according to xml-data and successfully parsed it using the DTD from Appendix A. My question (finally) is how can I use this schema to validate an XML file that conforms to it? After parsing the schema, should I convert it within the XML processor to DTD objects, then proceed _as_if_ the schema really originated from a DTD file? It seems wrong to duplicate the existing validation logic that works for DTD's and create another for schemas. This is probably part of the argument in support of those who say that the xml-data schema is a bad idea, and we should write all schemas directly in DTD syntax. However, coming from an artificial intelligence background, the idea of a metadata representation language appeals to me. Thanx for any thoughts or advice! Dave Carlson Ontogenics Corp. Boulder, CO xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Tue Sep 9 08:28:23 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:25 2004 Subject: Lark 0.92 available Message-ID: <3.0.32.19970908232250.007a8ad0@pop.intergate.bc.ca> Hi - Lark 0.92 is now available at http://www.textuality.com/Lark Pardon the quick releases, but thanks to Sun's JWS profiler, Lark 0.92 is now 11.9 times faster than 0.91. Secondly, the accompanying "xh" application, which formats the XML spec and related documents (including what you get at the URL above) has been upgraded so that it now can process the Japanese version of the XML specification and produce beautiful UCS-2 Japanese HTML output. (Go to www.bitstream.com and download their Cyberbit font if you want to see some damn nice-looking stuff on your screen - Netscape can do it, but be warned that Communicator 4 + Cyberbit between them will use all your memory, no matter how much you have). When you can have a few tens of K of code do this kind of transformation on two violently different character sets, it bespeaks, I think, a couple of standards (Java and XML) in pleasing harmony. The process of getting the Japanese formatting working would have been completely impossible without all sorts of support and question-answering and double-checking and pointing-to-useful-resources from Murata Makoto of FXIS; many thanks to him. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Tue Sep 9 10:23:30 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:58:25 2004 Subject: parsing xml-data schemas In-Reply-To: Dave Carlson's message of Mon, 08 Sep 1997 13:26:29 -0600 References: <2.2.32.19970908192629.00bd9900@pop.dimensional.com> Message-ID: <1683.199709090823@grogan.cogsci.ed.ac.uk> For what it's worth, my view is that translation into vanilla DTD IS the right way to go in the short term, if for no other reason than to forestall rapid incompatible divergence of schema DTDs and semantics. If and when we get to the point of standardising a schema DTD, then direct implementation makes sense. ht xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Thu Sep 11 08:59:57 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:25 2004 Subject: XML-DEV Jewels Message-ID: <9906@ursus.demon.co.uk> XML-DEV has been active for about 7 months and generated around 1000 postings. This information is searchable thanks to Henry Rzepa. However there are some postings which I feel are of lasting value and are not easy to locate by keywords and other places where the thread has been useful (and perhaps re-usable by newcomers to XML). I have therefore created a page of links to the archived postings which is at: http://www.vsms.nottingham.ac.uk/vsms/xml/jewels.html This does NOT attempt to duplicate the other XML resources such as the FAQ and Robin Cover's comprehensive analysis. If you fail there and on the keyword search it may then be worth browsing this list. Enjoy.. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Sat Sep 13 07:40:16 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:26 2004 Subject: Recent XML WG decisions Message-ID: <199709130538.WAA07282@boethius.eng.sun.com> While it is not our usual policy to post decisions of the XML Working Group to xml-dev, the last three WG meetings have seen a number of issues decided that bear directly on current experimental XML implementations. Following are reports prepared by C. M. Sperberg-McQueen and Tim Bray detailing recent decisions that will be incorporated into the next working draft. Jon ---------------------------------------------------------------------- Jon Bosak, Online Information Technology Architect, Sun Microsystems 901 San Antonio Road, MPK17-101, Palo Alto, California 94303 ---------------------------------------------------------------------- ISO/IEC JTC1/SC18/WG8::NCITS V1::Davenport::SGML Open::W3C XML WG It is earlier than we think. -- Vannevar Bush ---------------------------------------------------------------------- From: "C. M. Sperberg-McQueen" Subject: XML WG decisions of 27 August 1997 The XML Work Group discussed the following questions, and made the decisions indicated, in the meeting of 27 August 1997. Present: Jon Bosak, James Clark, Steve DeRose, Eliot Kimber, Eve Maler, Makoto Murata, Peter Sharpe, C. M. Sperberg-McQueen. 1. A decision on case folding was postponed. Background: The current draft XML spec requires that most names (i.e. generic identifiers, attribute names, IDs, IDREFs, name tokens in attribute values PI targets, notation names, and document type names) be case-folded, while entity names are case sensitive. It has been repeatedly urged that this be changed and that all names be case-sensitive. The arguments are familiar: For case folding: since the reference concrete syntax requires case folding, many current users of SGML and HTML are familiar with and have come to expect this behavior. For case sensitivity: since SGML parsers are required to fold up, rather than down, the XML spec is inconsistent with recommended Unicode practice. (Unicode recommends folding down rather than up since there are slightly fewer unpleasant surprises and inconsistencies that way.) There is *no* rule for case folding which works in the culturally expected manner for all speakers of all alphabetic languages: a lower-case e with acute accent is (correctly) uppercased one way in Quebec and a different way in metropolitan France. Lowercase I (with a dot) is uppercased one way in Turkish and another way in other languages using the Latin alphabet. A strong majority of those participating felt that we should make XML case sensitive and drop case folding, but in view of the sensitive nature of the decision, it was decided to postpone the decision until a larger fraction of the work group was present. 2. XML characters range from #x0 to #x10FFFF. Decision: Legal XML characters are those representable in UTF-16 / Unicode 2.0, i.e. those in the first seventeen planes of ISO/IEC 10646. Unanimous. Rationale: The current spec says that XML characters may include any character defined by ISO/IEC 10646. Currently, that standard defines characters only within the Basic Multilingual Plane, each of which can be represented by a string of 16 bits; in principle, however, ISO/IEC 10646 defines a 31-bit character space, and production 2 accordingly defines Character as covering the range #x0 to #x7FFFFFFF, with some gaps for forbidden characters. XML processors, however, are not required to support the flat 32-bit character encoding UCS-4, only the 16- and 8-bit encodings of UCS-2 and UTF-8. (The latter can represent all the characters of the 31-bit character space, but UCS-2 cannot.) In many places, the XML spec suggests, or at least allows incautious readers to believe, that XML characters are only 16 bits wide. Either way, it's important to eliminate the ambiguity in the spec. In favor of restricting XML characters to 16 bits: it simplifies life for users of Java and other tools. It seems clear that the full 31-bit space of 10646 will not be needed, even for extremely specialized applications, in the foreseeable future. In favor of defining XML characters to be 31 bits wide: 16 bits is manifestly too few for anyone working with historical texts in Han characters. Politically, it would be unwise to give the impression that only the Basic Multilingual Plane is of importance. The surrogate method, while clever, is clearly a hack which demonstrates that the original Unicode claim (16 bits is enough to build an absolutely flat character space which will last for all time) has fallen apart under the pressure of fact; the surrogate method abandons the flat character space which is one of the most important advantages of Unicode. The compromise (BMP plus the next 16 planes) appears - well understood - compatible with Java and other tools which assume 16-bit characters - sufficient for realistic expectations (even the most extensive of known collections of historical Chinese characters is unlikely to take much more than one of the additional planes; even the user area is sufficiently large, with 131,072 character positions) 3. Processors must support UTF-16, not just UCS-2. Background: the current draft spec says (4.3.3): "All XML processors must be able to read entities in either UTF-8 or UCS-2." It has been proposed to change this to require support for UTF-8 and UTF-16 (which is UCS-2 plus support for the surrogate-character mechanism by which characters outside the Basic Multilingual Plane may be encoded). Decision: (i) XML processors must support 16-bit data streams (i.e. UTF-16) for input. (ii) They must not corrupt surrogate characters. (iii) If the processor uses a 16-bit buffer or a 16-bit interface to the downstream application, it must correctly represent numeric character references to non-BMP characters as pairs of surrogate characters. Unanimous. Rationale: since all name characters in XML are in the Basic Multilingual Plane, characters outside the BMP can only appear in XML documents as data. Since an XML processor is required to do nothing more to data than store it and pass it to the downstream application without corrupting it, no special handling is required for surrogate characters. The only new requirement is that processors understand the surrogate-character mechanism for characters outside the BMP, and use it, when necessary, to handle numeric character references correctly. 4. XML will refer to Unicode 2.0 and ISO/IEC 10646 with Am. 1-7. The current draft spec refers to Unicode 2.0 and ISO/IEC 10646 with Amendments 1 through 5. It has been suggested (a) that XML should refer *only* to Unicode, and (b) that the reference should be to "the current version" of Unicode, so that as Unicode is revised, XML automatically accepts the revisions. Decision: refer to 10646 with Amendments 1 through 7, but otherwise retain the current reference. I.e. do not drop the reference to ISO/IEC 10646, and do not phrase the reference so as to incorporate changes to Unicode automatically. Unanimous. Rationale: the agreement between ISO/IEC JTC1/SC2 and the Unicode Consortium to keep Unicode and 10646 synchronized is extremely important to all users. A joint reference to both standards makes clear to both parties that we, as users, wish them to honor that agreement. A reference solely to Unicode would imply clearly that XML would follow Unicode even if Unicode were to diverge from ISO/IEC 10646. The joint reference makes clear our intent: if the Unicode Consortium and SC2 fail to keep the two standards in synch, then XML is not guaranteed to follow either of them. Reference to as yet unpublished standards (which is what reference to "the most recent version" amounts to) is unwise because there is and can be no guarantee that revisions in Unicode and 10646 will not require corresponding revisions to the XML spec. 5. Encoding of external text entities is kept as is. It has been suggested that by allowing external entities to be in different character encodings, XML is incompatible with ISO 8879, which does not allow this. The WG unanimously reaffirmed its belief that the current draft spec is in fact compatible with ISO 8879 under what is sometimes called the 'new' character model. SGML documents must have a single document character set declaration and thus a single document character set, but this reflects the output from, not the input to, the entity manager, and is thus independent of the character encoding encountered in the actual data stream of the external text entity. 6. Ideographic space is not white space. Decision (unanimous): ideographic space (#x3000) will be removed from the non-terminals S and PubidCharacter. Rationale: Ideographic space corresponds more closely to the no-break space (#xA0,  ) than to the standard space character (#x20). #xA0 is not allowed in S, and neither should ideographic space be. It is unlikely, with current standard input methods for kanji, that any operator would unintentionally or accidentally insert an ideographic (#x3000) rather than a Latin (#x20) space within a tag. 7. Binding sources of information for character encodings will be specified. The current draft spec says nothing about the priority of various sources of information regarding character encodings. Some participants (notably Gavin Nicol and Makoto Murata) have argued that this should be specified. Decision: The spec should include wording to the following effect: If an XML document or entity is in a file, the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery. If an XML document is delivered via the HTTP protocol with a MIME type of text/xml, then the HTTP header determines the character encoding method; all other heuristics and sources of information are solely for error recovery. If an XML document is delivered via the HTTP protocol with a MIME type of application/xml, then the Byte-Order Mark and encoding-declaration PI are used (if present) to determine the character encoding. All other heuristics and sources of information are solely for error recovery. -C. M. Sperberg-McQueen From: "C. M. Sperberg-McQueen" Subject: XML WG decisions of 3 September 1997 The XML Work Group met today (3 Sept 1997) and made the decisions described below. Present were Jon Bosak (JB), Tim Bray (TB), James Clark (JC), Dan Connolly (DC), Steve DeRose (SJD), Paul Grosso (PG), Dave Hollander (DH), Eliot Kimber (EK), Murray Maloney (MMa), Makoto Murata (MMu), Joel Nava (JN), Jean Paoli (JP), Peter Sharpe (PS), and Michael Sperberg-McQueen (MSM). 1. Procedures for determination of character encoding to be described in an appendix. Background: last week's report of decisions (31 August, posting from U35395@UICVM.UIC.EDU), included as item 7 a decision regarding "Binding sources of information for character encodings". The WG revisited the issue, noted that in fact no formal vote on it had been taken (error in the report), and discussed whether such rules belong in the XML language spec or not. Against inclusion: the rules really apply to the delivery of XML in very specific protocol environments, and should be included in the specification of the protocol. XML will be delivered by many protocols, some of them not yet invented; the language spec should not have to be revised every time a new protocol is deployed or invented. For inclusion: such conventions are important for encouraging interoperability of XML software. Conforming processors reading the same material in the same environment should make the same decisions about the character encoding. Decision: The rules for locating binding information about the character encoding of XML entities (reported last week) will be described in an appendix. They will be accompanied by a note making clear that the rules about http service properly belong in the RFC defining the Mime types text/xml and application/xml, and that when those RFCs are available their text will supersede the recommendations of the appendix. The wording given in the posting of 31 August will be changed by replacing the phrases 'XML document or entity' and 'XML document' with the phrase 'XML entity'. (It has been argued that the term 'entity' is not currently well defined in the XML spec; if the usage of the term is later revised, this occurrence may be changed.) In favor: all present. 2. A decision on case-folding was postponed again. A summary of the issues and a request for discussion by the SIG will be posted shortly. 3. XML processors to normalize CR, LF, and CRLF to LF. Background: the current draft XML spec says nothing about whether or how XML processors or applications should normalize the common line-break sequences CR, LF, and CRLF. For normalization: since the three sequences are intended, in practice, to have the same meaning, they can be normalized without loss of useful information. If the XML processor does not normalize these sequences, every single downstream XML application will be forced to do so; experience shows that relying on them to do so will result in broken applications and inconsistent behavior. Against normalization: right now the spec has no concept of line or line break; there is no need to introduce one, so for the sake of economy (and clarity) none should be introduced. For normalizing to LF: thanks to C's standard IO model, it's what most program libraries provide, and thus what most programs and most programmers expect. For normalizing to CRLF: it's more consistent with the specifications governing the Web. Last time anybody looked at the ASCII spec, CRLF was the preferred form of this information. Against CRLF: specifications? On the Web? Decision: When an XML processor encounters any of the character sequences CR (UTF-16 x000D), LF (UTF-16 x000A), or CR LF (UTF-16 x000D x000A), the processor must pass a single LF character to the downstream application. (Note: this formulation of the decision presupposes that the set of information which XML processors may or must make visible to downstream applications will be described more fully than it is in the current draft spec. If the WG decides against such a description, this substantive decision will need to be expressed in some other form. If the processor disappears from the XML language specification, as has been proposed, this decision may be expressed as a constraint on whether the differences among line-break sequences in the input stream are 'visible' or 'significant'.) -C. M. Sperberg-McQueen University of Illinois at Chicago tei@uic.edu From: Tim Bray Subject: XML WG decisions of Wed. Sep. 10 The XML WG met on Wed. Sep. 10th. Present: Bosak, Kimber, Murata, Clark, Sperberg-McQueen, Wood, Nava, Bos, Maler, Bray, Tigue, Maloney, Paoli, DeRose. Errors in discussion summaries are, as usual, mine. 1. Discussion of case sensitivity Few new arguments arose in the discussion of case sensitivity, aside from Steve DeRose's observation that disallowing case folding will, by removing the possibility that attribute values are case-folded, reduce the number of instances where the results of parsing can be affected by the presence/absence of a DTD. (Note that the handling of white space can still be affected in the case where attribute values are known to be tokenized, so the problem hasn't entirely gone away). This is a summary of points made in a brief last-chance-to-speak- your-mind go-around: For Case Sensitivity: - XML will rarely be created by hand and when it happens, it'll be by experts. - This is a chance to do the right thing early in XML's history and avoid living with a compromise forever. - Case folding is very easy to specify and to understand. - It would be nice to be able to map case-sensitive objects, for example DSSSL flow objects, to element types. - Internationalization experts are unanimously against folding. - Pleasant experiences with case-sensitive programming languages. - Casefolding problems are truly vile. - It will be easy to make XML processors recognize typical user errors and provide helpful error messages. For Case Folding: - It would be the right thing to do if we were starting from scratch, but it's too late now. - There will be serious difficulties dealing with the XML-in-HTML scenario. - It will make it impossible for HTML ever to be specified as an application of XML as opposed to SGML. - The XML spec has been out for nine months now; it's late in the game to be making this change. The Question: Modify the XML specification to achieve the effect of NAMECASE GENERAL NO in SGML. Yes: Bosak Kimber Murata Clark Sperberg-McQueen Nava Bos Bray Tigue Maloney Paoli DeRose No: Wood Abstain: Maler So XML is now case-sensitive. 1a: Since XML is case sensitive, we must specify the case of our keywords, i.e. [From Jon Bosak] > 3. Discussion of the proposition that the XML spec should say > more about what the processor passes the App. John Tigue has > volunteered to write an XML Grove Plan; while there is little > sentiment that this should be made normative, it might serve > usefully as either a separate application note or an appendix. I raised this issue a long time ago and I am delighted to see it is being considered for inclusion in XML. Having a grove plan gives developers a sanity checker for their parsers. Having a grove plan with a syntactic form that can be output from a parsers internal tree representation provides a mechanism for testing and comparing parsers. Having a grove plan allows apps to be developed that process post-parse data-structures as opposed to using an API. >From my perspective the importance of this merits normative inclusion in the spec. I am reminded of that well thumbed quintet of pages in 8879. Annex G of appendix B, attatchement 1. Otherwise known as ESIS - the starting point for many an SGML structure controlled application. Sean Mc Grath sean@digitome.com Digitome Electronic Publishing http://www.digitome.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat Sep 13 13:28:21 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:26 2004 Subject: Recent XML WG decisions Message-ID: <9971@ursus.demon.co.uk> In message <199709130538.WAA07282@boethius.eng.sun.com> Jon.Bosak@eng.Sun.COM (Jon Bosak) writes: > While it is not our usual policy to post decisions of the XML Working > Group to xml-dev, the last three WG meetings have seen a number of > issues decided that bear directly on current experimental XML > implementations. Following are reports prepared by C. M. > Sperberg-McQueen and Tim Bray detailing recent decisions that will be > incorporated into the next working draft. I would like to thank the XML-WG for posting the results of these decisions and for providing so much of the detail. [Note that the records are in chronological order, so that the final decision on case-folding comes towards the end :-)]. I am sure that all xml-dev readers are aware that XML is still at draft stage so that decisions which alter the current draft spec are still possible. As someone privileged to be part of the XML-SIG discussion group I can confirm that the discussion on these issues has been extremely constructive. The decision-making on the XML project is an impressive achievement in itself. Whilst there is, and will not be, formal transmission from XML-DEV to XML-WG it is carefully scanned by members of the WG and issues discussed here constructively are taken note of. Readers will note John Tigue's very generous offer to develop an Api for the Grove Plan, and that this may accompany the spec in the future. I hope that members of XML-DEV will help in this endeavour where appropriate. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat Sep 13 13:28:29 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:26 2004 Subject: NOTATION/MIME (was Re: Recent XML WG decisions) Message-ID: <9975@ursus.demon.co.uk> In message <199709130538.WAA07282@boethius.eng.sun.com> Jon.Bosak@eng.Sun.COM (Jon Bosak) writes: [... decision of XML-WG omitted...] > > 2. Chris Maden's suggestion that NOTATION System Identifiers > should be mime types. The WG liked the idea, but declined to > modify the spec to achieve tihs effect; among other things, > URLs and mime types are not syntactically distinguishable. It > was the feeling of the group that it would be desirable that a > new URL scheme be created to allow a URL to locate a mime type. I am not wanting to re-open this discussion/decision, but I'd be very grateful for clarification as to how a SytemID is used to identify the type of a NOTATION. If I wish to identify it as 'image/gif', how do I do this in practice? Is there a set of URLs that map onto current MIME types, or is it impossible in XML to state what the MIME type of a NOTATION is? [If so this is a pity, especially since HTTP, Java, etc. support MIME types.] If it *is* impossible, how is a URL used with a NOTATION in practice, other than simply holding a textual description relating to it. Does the last sentence mean that the XML-WG hopes to come up with such a scheme or that some other body (e.g. IETF) may/might do so? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Sat Sep 13 19:02:39 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:26 2004 Subject: Religion.1.02.xml and Shakespeare.1.02.xml Message-ID: <199709131700.KAA07483@boethius.eng.sun.com> I've updated my Religion and Shakespeare collections to be in what I *hope* is accordance with the new case sensitivity rules. (I am firmly in favor of case sensitivity, but I'm the first to admit that it will take some getting used to.) I would appreciate it if the parser-builders would check out these collections as soon as they've incorporated case sensitivity and tell me whether I've got it right. http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/religion.1.02.xml.zip http://sunsite.unc.edu/pub/sun-info/standards/xml/eg/shakespeare.1.02.xml.zip As usual, I note that these collections don't really exercise very many XML features, but they are useful for benchmarking and certain kinds of stress testing. In addition to being interesting reading, of course. Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eliot at isogen.com Sat Sep 13 22:17:15 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:26 2004 Subject: XML Grove Plan Message-ID: <3.0.32.19970913150653.00bc0140@swbell.net> Note that a grove plan is not a property set: a grove plan is simply a statement of which classes and properties are included in the property set used by a particular processor or process. For example, the HyTime default grove plan is: HyTime Default SGML Grove Plan Removes processing instructions (pi) from and adds pseudo-elements (pelement) to the default SGML grove plan defined in the SGML property set. pelement pi Which is itself a delta on the SGML default grove plan (indicated by the presence of the "default" attribute on those modules, classes, and properties included in the SGML default grove plan). The discussion of grove plans can be found at: http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-7.1.html#clause-7.1.4.2 and http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.4.1.html#clause-A.4.1.6 At 09:44 AM 9/13/97 +0100, Sean Mc Grath wrote: >I raised this issue a long time ago and I am delighted to see it is being >considered for inclusion in XML. Having a grove plan gives developers >a sanity checker for their parsers. Having a grove plan with a syntactic form >that can be output from a parsers internal tree representation provides a >mechanism >for testing and comparing parsers. Having a grove plan allows apps to be >developed >that process post-parse data-structures as opposed to using an API. There is a defined syntactic representation for *groves* (as opposed to grove plans, which is what I think Sean meant), called the "canonical grove representation" (CGR) document, described in http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.4.5.html CGR documents are designed such that two groves that are identical should produce exactly the same CGR documents, character for character. They are designed specifically to enable the comparison of the groves produced by different tools, which is useful both for checking tools and for doing comparisons of documents by comparing their CGR documents (this allows documents to be compared meaningfully without regard to their original markup syntax as long as the groves used for comparison do not include any markup properties). CGR documents are also designed to be easy to process with text processing tools like Perl so that they can be used must as you would use the output of NSGMLS. I'm in the process of creating a DSSSL spec to generate CGR documents using Jade--I'll post something about it to comp.text.sgml when I get it working. Cheers, Eliot xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Sun Sep 14 01:06:31 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:26 2004 Subject: Recent XML WG decisions In-Reply-To: <341AAE35.5C0B583A@technologist.com> (message from Paul Prescod on Sat, 13 Sep 1997 11:16:05 -0400) Message-ID: <199709132304.QAA08273@boethius.eng.sun.com> Memo to Paul Prescod: 1. Please do not put me down as the author of everything you quote from something that I've forwarded to the group. Everything that you have attributed to me was in fact written by C. M. Sperberg-McQueen or Tim Bray. 2. Please do not mindlessly copy me when replying to messages I happen to post to the list. I am not interested in receiving two copies of everything you say. 3. Please do not post to the w3c-xml-wg list. Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From arnaud21 at club-internet.fr Sun Sep 14 01:52:28 1997 From: arnaud21 at club-internet.fr (Arnaud Le Taillanter) Date: Mon Jun 7 16:58:26 2004 Subject: Whitespace Message-ID: <341B277D.3EFD@club-internet.fr> Hello, Still about white space, sorry :-) First part : comments on the XML draft approach to WS handling. Second part : comments on Neil Bradley's five rules for WS handling (version 1). **First part** In the current draft, I see 3 rules concerning WS : *Rule 1* : all WS is preserved and fed to the application. A very simple rule indeed, in accordance with XML design goals. But Neil Bradley five rules are simple to implement too (though incorrect). On the contrary, consider parameter entities: the committee members aknowledged they had some difficulty designing a grammar for DTD declarations, because of PEs. So implementing such a grammar won't be trivial (BTW, someone said he had designed a W grammar. It could be interesting to see what it looks like. Please post!), far less trivial than replacing CR, LF, CRLF by a single character! (NB: the WG agreed a few days ago on that rule :-) So the simplicity argument doesn't hold. The real issue is that the aplication must be fed with a credible tree structure. Take a document without a DTD: CR CR

foo

CR CR What kind of tree structure will the processor offer us? A root node "DOC". So far, so good. But everybody expects now a single child node (the "PART" element). The processor gives us *three* for the same price: the very useful "CR" element. The "PART" element. And another "CR" node. What kind of ridiculous tree is that ? A Tchernobyl tree I guess. *Rule 2*: a validating parser must distinguish WS in element content and signal to the application that such WS is not significant. I observe that it is not said how the parser will tell the application about such insignificant WS. A minor point, I concede. Wether the parser is validating or not, a solution should be found where WS in element content is *discarded* : this is the important point. No node with only WS in it : it is completely against the philosophy of SGML/XML: (well)*structured* content. If the parser is able to distinguish what is element content and what is not (the hard part without a DTD), it should discard those completely useless WSs (the easy part). *Rule 3*: A special attribute may be inserted in documents to signal an intention that the element to which this attribute applies requires all white space to be treated as significant by applications. The value DEFAULT signals that applications' default white-space processing modes are acceptable for this element; the value PRESERVE indicates the intent that applications preserve all the white space. As someone observed, this is contradictory with the position "the application should manage WS issues, the parser doesn't intervene". BTW, the attribute is hardly useful: suppose I put on the web a document, with a "FOO" element with the attribute "XML-SPACE" set to "DEFAULT". Application A normalizes WS by default. Application B does nothing with WS by default. As a result, an attribute set to "DEFAULT" conveys absolutely no information. It will be the same as "PRESERVE" with some applications. Basically, it will be a mess :-) But we are used to that :-)) What is strange too, is that there is no default value for this attribute by default. Those SGML guys are really subtle :-)) A default value of "DEFAULT" would seem to be natural, but in that case the application does anything it wants to, so who cares :-) **Second part** Neil Bradley proposed some simple rules (this is "version 1", a second version, a little more complex, but simple enough, was proposed). I really like the approach, even if it doesn't work for the moment. *Rule 1*: standardization of input from different OSs. CR, LF, CRLF are translated to a line end code. OBVIOUS!!!!! *Rule 2*: line end codes after a start tag or before an end tag are discarded. A simple rule. For usual elements, it is exactly what you expect :

blabla

becomes

blabla

for PRE-like elements:
SPSPblabla
becomes
SPSPblabla
, so two line ends are discarded. It seems nevertheless natural that these line ends are dropped. BTW, this rule was in the first (11/14/96) XML draft. There is a first problem with this approach: in default content (preserved content will be examined later):

Two words

becomes

Twowords

The space between "Two" and "words" evaporated. Same thing with:

Two words

I don't think this particular problem is important: the encoding is not natural. It should be an error! I think everybody would write:

Two words

, or

Two words

, etc... Inside a preserved element, line end codes are wrongly discarded after element start tags and before element end tags:
         blabla 
         bloblo
         blublu
The coding in this case is natural: bla, blo and blu are very aesthetically aligned! But: a line end code is discarded after "", it shouldn't be. So: preserved elements need a special rule. It seems quite natural they need a special rule concerning line end codes (and space codes). A possibility: the parser closes a "default" (not preserved) element, and opens a "preserved" element: the line end codes after the start tag and before the end tag are discarded. But for a preserved element directly embedded in a preserved element, line end codes are left intact. *Rule3*: WS in element content is discarded. WS space in element content *must* be discarded. The problem is: without a DTD, one doesn't know if an element contains only other elements. Suppose we have :

blablaSPbloblo

We could choose a rule like: an element in which the parser finds only other elements and WS (no characters) is an element content element. But as the above example shows, it doesn't work. If we follow this rule, we have a tree with a root node "P" and two child nodes "EM". And what we want is a root note with three child nodes: two "EM" elements and between the two a "PCDATA" element (the space between "blabla" and "bloblo") So a different method must be found. A radical constraint put on the user would be: don't input a single space character in element content. With this rule the parser will be able to recognize easily element content. But you can forget about indentation in that case. The rule for the user would be: "when you type a space, you mean a space". BTW, this is always the case, except for indentation. If the semantic overloading for the space character is removed (a space is either a "real" space or an indentation space), things are so much easier. *Rule 4*: Except in preserved elements (elements with a space attribute set to "PRESERVE") line end codes are discarded when preceded by a hard or soft hyphen (in the process, a soft hyphen is also discarded) and remaining line end codes are treated as space. The rule concerning hyphens is not necessary. If it's a hard hyphen, don't put it at line end (who would do that?) Moreover, there is no use in an XML source file to put a soft hyphen at line end. Who would do that? In my poor life, I have no occa- sion to see some text with hyphens at line end. There is a possible problem with the replacement of line end codes in default (that is, not preserved) elements by a space character. Suppose we have a text coded with Unicode (that could happen :-)), with chinese ideographs. In chinese, there is no concept of a word (sequence of letters): each ideograph is a "word". I don't know how in fact the chinese encode their texts, but there is obviously no utility in putting a space after each ideograph. The chinese must use nevertheless the end of line character. And one shouldn't replace such a character by a space, which would be an error, but simply discard it. Depending on the class of characters, there could be a different treatment of line end codes. But this becomes complex :-( Another approach: simply ignore line end codes. But you have to put a space at the end of a line. The idea is quite natural: line end codes are there for our eyes, they don't add anything to the meaning of a text. The XML tree should reflect the substance of a text, not the particular way it was input:

We should get rid of line end codes

and

We should get rid of line end codes

should give the same node in the document tree. If line end codes must be preserved: use a preserved element, or an empty element (
). *Rule 5*: except in preserved elements, consecutive WS characters are reduced to a single space. I don't like this rule. If I put two spaces after a point, I mean two spaces. It's a typographic decision. Rule 5 is meant to allow some indentation:

He said: I need some indentation.SPSPIndentation is needed.

In the above example, it is necessary to get rid of spaces caused by indentation. But the two spaces marked "SP" should be retained. So the new rule would be: SPs at the beginning of a line should be discarded. This rule must happen before line end codes ere discarded, ie before rule 2. What a headache :-) Perhaps a simple rule could be: don't use indentation in XML files, or you'll get burned. More generally, if we want the parser to produce a clean data structure out of an XML file, some burden will have to be put on the user's shoulders. The contract could be: the user accepts some limitations on the way to input the source code. He could have to write instead of the above something like: He said: I need some indentation.SPSPIndentation is needed.

The reward (unvaluable) will be: a clean data structure available for applications. Thanks for your attention! Regards, Arnaud xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From dgd at cs.bu.edu Sun Sep 14 02:33:49 1997 From: dgd at cs.bu.edu (David Durand) Date: Mon Jun 7 16:58:26 2004 Subject: Arnaud Le Taillanter on whitespace Message-ID: <199709140033.UAA08971@csb.bu.edu> I'll be very brief. There's little chance that there will be any new whitespace ignoring rules in XML. Everyone involved has read (and written!) literally hundreds of messages on the topic. Every variation you discussed has been gone over and they all were either: 1. unworkably complex (like the current SGML rules, whihc few remember and even fewer remember correctly)). 2. Not compatible with SGML, or unworkably ugly like the proposal to quote all literal text. 3. Failed to work without a DTD. This is the kicker, and it's required by XML because you don't always have the DTD, and different results in the has-DTD/doesn't-have-DTD cases are unacceptable. The recent change (to normalize all linends) fills the one hole the previous proposal had -- because it was nearly certain that some processes would blindly change CRLF and their ilk anyhow. My advice: don't waste you're bytes complaining about this -- we've heard it _all_ before -- and the solution that works best is to leave it to the application. Aside: XML-SPACE doesn't affect this -- it's in the lines of a "standard hint" that will allow applications like web-crawlers and full-text indexers to make more sense out of markup according to DTDs about which they lack special knowledge. So it doesn't contradict the "pass all space" philosophy, but rather supplements it, to enhance document re-use. -- David xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From neil at bradley.co.uk Sun Sep 14 10:26:11 1997 From: neil at bradley.co.uk (Neil Bradley) Date: Mon Jun 7 16:58:26 2004 Subject: Whitespace Message-ID: <199709140826.JAA29099@andromeda.ndirect.co.uk> > Reply-to: Arnaud Le Taillanter > Neil Bradley proposed some simple rules (this is "version 1", a second > version, a little more complex, but simple enough, was proposed). I > really like > the approach, even if it doesn't work for the moment. I agree they are inadequate, but I think my second attempt was more acurate than my first, so I am surprised that you now dissect the first attempt. Still, I am happy to see this issue continue to be aired. > *Rule 1*: standardization of input from different OSs. > CR, LF, CRLF are translated to a line end code. > OBVIOUS!!!!! Absolutely, but perhaps not to some programmers unfamiliar with, for example, the Mac line-end conventions. > *Rule 2*: line end codes after a start tag or before an end tag are > discarded. A simple rule. For usual elements, it is exactly what you > expect : >

Two > words

> becomes >

Twowords

> The space between "Two" and "words" evaporated. > Same thing with: >

> Two > words

> I don't think this particular problem is important: the encoding > is not natural. It should be an error! > I think everybody would write: >

Two words

, or >

> Two words >

, etc... I have long thought that 'some' formatting options should simply be made illegal, and that we should then ensure widespread knowledge of restrictions to future document authors. This is the main example I had already considered. > Inside a preserved element, line end codes are wrongly discarded > after element start tags and before element end tags: >
>          blabla 
>          bloblo
>          blublu
> 
Again, I think this coding is very unnatural. > *Rule 4*: Except in preserved elements (elements > with a space attribute set to "PRESERVE") line end codes are > discarded when preceded by a hard or > soft hyphen (in the process, a soft hyphen is also discarded) and > remaining line end codes are treated as space. > > The rule concerning hyphens is not necessary. If it's a hard hyphen, > don't put it at line end (who would do that?) It is in fact a very natural action, which I have seen many times. > Moreover, there is no use in an XML source file to put a soft > hyphen at line end. Who would do that? In my poor life, I have no occa- > sion to see some text with hyphens at line end. I have. Many times. > *Rule 5*: except in preserved elements, consecutive WS characters > are reduced to a single space. > > I don't like this rule. If I put two spaces after a point, I mean two > spaces. > It's a typographic decision. > Rule 5 is meant to allow some indentation: > >

> He said: > > I need some > indentation.SPSPIndentation is needed. > >

NO IT WAS NOT! I have never said this, and I did not intend to imply this. The reason for this rule was purely to remove surplus spaces generated by the effect of previous rules. > Arnaud I am more than happy for people to pull-apart my proposed rules. That is what I put them here for. But please refer to the second attempt, not the first. Neil. ----------------------------------------------- Neil Bradley - Author of The Concise SGML Companion. neil@bradley.co.uk www.bradley.co.uk xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From digitome at iol.ie Sun Sep 14 11:00:12 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:58:26 2004 Subject: XML Grove Plan Message-ID: <199709140859.JAA08220@GPO.iol.ie> [Eliot Kimber] > >There is a defined syntactic representation for *groves* (as opposed to >grove plans, which is what I think Sean meant), called the "canonical grove >representation" (CGR) document, described in >http://www.ornl.gov/sgml/wg8/docs/n1920/html/clause-A.4.5.html > Thanks for the correction + the pointer Eliot. >CGR documents are designed such that two groves that are identical should >produce exactly the same CGR documents, character for character. Wonderful. > They are >designed specifically to enable the comparison of the groves produced by >different tools, Wonderful++. > CGR documents are also designed to be easy to process >with text processing tools like Perl so that they can be used must as you >would use the output of NSGMLS. pow(Wonderful,10) > >I'm in the process of creating a DSSSL spec to generate CGR documents using >Jade--I'll post something about it to comp.text.sgml when I get it working. Thanks again Eliot. Can I ask John Tigue if he is thinking CGR as part of his XML grove work? Can XML-DEVers do anything to help??? Sean Mc Grath sean@digitome.com Digitome Electronic Publishing http://www.digitome.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sun Sep 14 11:27:43 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:26 2004 Subject: Whitespace Message-ID: <10005@ursus.demon.co.uk> In message <199709140826.JAA29099@andromeda.ndirect.co.uk> "Neil Bradley" writes: > > Reply-to: Arnaud Le Taillanter > > > Neil Bradley proposed some simple rules (this is "version 1", a second > > version, a little more complex, but simple enough, was proposed). I > > really like > > the approach, even if it doesn't work for the moment. > > I agree they are inadequate, but I think my second attempt was more > acurate than my first, so I am surprised that you now dissect the > first attempt. Still, I am happy to see this issue continue to be > aired. Any constructive discussion on this subject is appropriate for XML-DEV. As we have archives, it's important that posters read them beforehand, especially on this subject. [...] > > I am more than happy for people to pull-apart my proposed rules. That > is what I put them here for. But please refer to the second attempt, > not the first. Two procedural points (I am not commenting on the content): - the postings are all referenceable by URLs on Henry Rzepa's archive, so please use these if there is a chance of confusion. [David D] - please try to keep the same subject for the thread so that it can later be read in hypermailed form more easily. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ricko at allette.com.au Sun Sep 14 12:26:38 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:58:26 2004 Subject: NOTATION/MIME (was Re: Recent XML WG decisions) Message-ID: <199709141028.UAA14599@jawa.chilli.net.au> > From: Peter Murray-Rust > > I am not wanting to re-open this discussion/decision, but I'd be very > grateful for clarification as to how a SytemID is used to identify the > type of a NOTATION. If I wish to identify it as 'image/gif', how do I > do this in practice? Peter asked me to on-post this. The standard way to stick a MIME type into a system identifier is given as part of HyTime '97. First we have a notation declaration (which is really only for documentation, so you don't need it if you don't want it). This notation declaration allows us to use "mimetype" in Formal System Identifiers, which are system identifiers with little pseudo-start tags giving the notation used in the rest of the string. So we can then declare the notation "gif" to be the mime type "image/gif" by Content-Type=image/gif"> A full form for this with both public and system identifiers would be Content-Type=image/gif"> Presumably you could also stick other MIME parameters in also, after semicolons, e.g. Content-Type=multipart/mixed;boundary="--@QQQ@--"'> (There is also provision of a notation called simply "mime", which can be used for burrowing into a MIME file for specific parts. ) Rick Jelliffe xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sun Sep 14 13:43:08 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:27 2004 Subject: NOTATION/MIME (was Re: Recent XML WG decisions) Message-ID: <10011@ursus.demon.co.uk> Thanks very much Rick, In message <199709141028.UAA14599@jawa.chilli.net.au> "Rick Jelliffe" writes: [...] > The standard way to stick a MIME type into a system identifier is > given as part of HyTime '97. First we have a notation declaration > (which is really only for documentation, so you don't need it > if you don't want it). > > FSISM PORTABLE > MIME Content Type//EN"> Being picky, this is not valid XML since prod [74] requires a SystemLiteral as well as the PubidLiteral. > > This notation declaration allows us to use "mimetype" in > Formal System Identifiers, which are system identifiers with > little pseudo-start tags giving the notation used in the rest > of the string. So we can then declare the notation "gif" > to be the mime type "image/gif" by > > Content-Type=image/gif"> This is fine for my purposes, but I'm not clear how it fits with the XML spec. 4.3.2 says: 'The SystemLiteral that follows the keyword SYSTEM [...] is a URL, ...' It says nothing about SystemLiterals which follow the PubidLiteral (your example is clearly not a URL). So my reading of the XML spec is that your code above is invalid XML :-). If so, it would be useful if the WG had some way that it was allowed. [...] P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From arnaud21 at club-internet.fr Sun Sep 14 18:21:34 1997 From: arnaud21 at club-internet.fr (Arnaud Le Taillanter) Date: Mon Jun 7 16:58:27 2004 Subject: whitespace References: <199709140033.UAA08971@csb.bu.edu> Message-ID: <341C0F4B.5FDB@club-internet.fr> David Durand wrote: > > I'll be very brief. There's little chance that there will be any new > whitespace ignoring rules in XML. Everyone involved has read (and > written!) literally hundreds of messages on the topic. Inside the XML WG mailing list the WS issue was surely extensively discussed, but I don't have access to the archive of this discussion. I know it's already a favor that the XML draft is made public (all drafts and standards of W3C are public, I think this helps) and that XML WG members are participating in the xml-dev mailing list (they could avoid it). Well, I ask for another favor: could you please make the discussion about WS that led to the WG decision available on line? After such a reading, everybody could become convinced of the appropriate nature of the WG decision. Please! > Every variation > you discussed has been gone over and they all were either: > 1. unworkably complex (like the current SGML rules, whihc few > remember and even fewer remember correctly)). Agreed. > 2. Not compatible with SGML, or unworkably ugly like the proposal to > quote all literal text. If SGML rules concerning WS are to be discarded, any other rule adopted is incompatible, including the draft rule. > 3. Failed to work without a DTD. This is the kicker, and it's > required by XML because you don't always have the DTD, and different > results in the has-DTD/doesn't-have-DTD cases are unacceptable. I agree. The tree structures must be exactly the same in either case. Some constraint regarding WS is necessary on the way to input an XML text I assume. > > The recent change (to normalize all linends) fills the one hole the > previous proposal had -- because it was nearly certain that some > processes would blindly change CRLF and their ilk anyhow. > > My advice: don't waste you're bytes complaining about this -- we've > heard it _all_ before -- and the solution that works best is to leave > it to the application. I am sure I will get convinced when I read the WG discussion :-) Or I fear the WG members will have to hear it all (and more) again :-)) > > Aside: > XML-SPACE doesn't affect this -- it's in the lines of a "standard > hint" that will allow applications like web-crawlers and full-text > indexers to make more sense out of markup according to DTDs about > which they lack special knowledge. So it doesn't contradict the "pass > all space" philosophy, but rather supplements it, to enhance document > re-use. > > -- David > Arnaud xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ricko at allette.com.au Sun Sep 14 18:24:56 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:58:27 2004 Subject: NOTATION/MIME (was Re: Recent XML WG decisions) Message-ID: <199709141627.CAA26463@jawa.chilli.net.au> This is off topic for XML-DEV. Apologies. > From: Peter Murray-Rust > In message <199709141028.UAA14599@jawa.chilli.net.au> "Rick Jelliffe" writes: > [...] > > The standard way to stick a MIME type into a system identifier is > > given as part of HyTime '97. Sorry, maybe I should have capitalized "standard" to be clearer. XML is certainly neither standard (common) nor Standard (adopted by a reputable open not-for-profit body whose job is to set standards without undue proprietary influence) at the moment. > Being picky, this is not valid XML since prod [74] requires a SystemLiteral > as well as the PubidLiteral. Yep. And do the < and > have to be entity references too in XML? > > Content-Type=image/gif"> > > This is fine for my purposes, but I'm not clear how it fits with the XML spec. Yep, XML does not support "formal" system identifiers as I understand it. I think it is a shame, since there are things that are not URLs that would be nice as identifiers, even in web systems. But support for FSIs can be retrofitted at some later stage to XML. I hope there is no chance of them being added to XML 1.0. But I hope people keep FSIs in mind as a good way to ramp up the power of URIs and other identifiers in the near future, in particular for selecting particular system identifier notations (schemas). For example, assuming hrefs could be FSIs, you could have in which data about the transfer and unpacking of the resource (e.g. here a public key for encryption) is also marked up as a part of the system identifier. Rick Jelliffe xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tikvas at agentsoft.com Mon Sep 15 12:13:12 1997 From: tikvas at agentsoft.com (Tikva Schmidt) Date: Mon Jun 7 16:58:27 2004 Subject: New AgentSoft XML demo Message-ID: <341CFB78.658D@agentsoft.com> New AgentSoft XML demo is now available on the Web at http://www.agentsoft.com/xml/. In a nutshell, the demo reads an XML file along with its associated DTD file and uses the information in the DTD file to guide the user in specifying a semantically meaningful query. The XML file is then searched for elements matching the query. While the system will work on any valid XML and DTD files, Java applet security limits it to files on our own server, which now consist of CDF files and an act from a Shakespeare play. We would be happy to add any valid XML samples to the demo. The demo has been developed as part of AgentSoft's initiative to integrate XML support into our LiveAgent Pro system. LiveAgent Pro allows users to record agents that automate Web access and interaction. For more information on LiveAgent Pro see our main Web page at http://www.agentsoft.com. Feel free to send any comments you have on our demo to xml@agentsoft.com. Tikva Schmidt. -------------------------------------------------------------------- Tikva Schmidt. email: tikvas@agentsoft.co.il corp: Agentsoft Ltd. http://www.agentsoft.co.il Phone: 972-2-6480573 --------------------------------------------------------------------- xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From russc at livepage.com Mon Sep 15 15:27:18 1997 From: russc at livepage.com (Russell Chamberlain) Date: Mon Jun 7 16:58:27 2004 Subject: Microsoft's XSL Proposal Message-ID: <3.0.1.32.19970915092937.007a73d0@livepage.com> Hi, I'm _extremely_ happy that someone (Microsoft) has put forward an XML-formatting proposal (XSL - Extensible Style Language) to the W3C that: 1) Is represented in XML This is _absolutely_ necessary if XML is to have any mass appeal. Using a non-XML format (eg. DSSSL) flies in the face of what XML is hoping to accomplish. I confess that I laughed out loud when I heard that DSSSL was the chosen processing environment for XML. (Purely for the fact that DSSSL isn't represented in XML, and not for any other reason!) A major advantage of XML representation is, of course, that you can use your favourite XML editor as a stylesheet editor. The daunting task of matching braces and finding syntax errors is greatly reduced. 2) Is complementary to DSSSL The proposal states explicitly that it is _complementary_ to DSSSL, with the same "principles and processing model". This will help to ensure consistent processing, regardless of its representation. 3) Is (predominantly) declarative The programmatic nature of DSSSL is something that can severely limit its appeal. Remember the "Is DSSSL Hard?" thread in comp.text.sgml? My impression was that most of the folks who answered "No" to the above question were people who were hard-core programmers. I am such a person, but I would answer a loud "Yes!". Maybe I've interacted with more non-programmers and/or users. Nevertheless, since formatting is what most novices start with, it is best that their tools be easier. The common processing model should allow for easy migration to DSSSL, if its greater power is desired. 3.1) ...while retaining programmatic features Power is a good thing, so long as its presence doesn't prevent simple things from staying simple. I think that the scripting features of XSL are nicely unobtrusive. 4) Lets you reorder and restructure the elements This is a _big_ plus. Most (all?) of the declarative formatting environments that I've been exposed to don't let you change the structure or sequence of the elements during formatting. When the chapter number came after the title, you could never put it in front of the title. The lack of such power may have kept a few of us from using/designing declarative processing environments. Not any more. 5) Has inline styles This mechanism lets you specify formatting properties on the element itself. This is a remarkably simple way of formatting that _one_ element that has to be different from all the rest, but whose context is too complicated or difficult to express. Here's an example from the proposal: Note that this is from the source document itself, and not from an XSL stylesheet. Neat. 6) Supports named modes A mode is simply a named formatting scheme. Only those rules, etc. that apply to the current mode are used. This lets you store rules for different presentations in the same stylesheet. In their example, the "toc-mode" mode is used for a Table of Contents presentation only, and the default mode is used for the usual presentation. This should also cut down on the duplication that usually occurs when different stylesheets are used for different presentations. It'll cut down on duplication errors, too, because it is possible for most of the rules, etc. to be centralized and shared. 7) Has a clearly-defined conflict-resolution mechanism Some formatting environments specify that the "first" applicable style in the stylesheet is always the one to be applied. A stylesheet's behaviour should not change based on the location of a style in the stylesheet's source file. XSL will let authors organize their styles in any way they see fit, with no effect on behaviour. Some environments also allow _multiple_ styles to be applied. Which ones, and in what order? Yuck! XSL explicitly states that at most a single pattern will be chosen. Good idea. So much for my praise of a terrific standards initiative. I have a few questions regarding the proposal itself, though: - Some XSL tags seem to be mutable, in that they can be empty or non-empty. The tag, in particular, is used both ways in the examples, eg: . . . and later: Is this proper XML? Am I wrong in thinking that is reserved for tags that are _always_ empty? Is this just a notational convenience within the proposal? - The DTD contains (gasp!) an exclusion rule. What's going on here? The fact that exactly one should appear per rule is something that the XSL application must enforce. I recommend using a comment, instead, so that the DTD can eventually be valid XML. All in all, I'm much more excited about the future of XML. Sorry if this isn't the place to discuss this. As usual, I can't end without an XSLvely bad pun, - Russ PS - You can get the XSL spec at: http://www.microsoft.com/standards/xsl/xslspec.htm ------------------------------------------------------ Russ Chamberlain - Software Developer INFORIUM (The Information Atrium Inc) Waterloo, Ontario, Canada xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From crism at ora.com Mon Sep 15 17:02:22 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:58:27 2004 Subject: NOTATION/MIME (was Re: Recent XML WG decisions) In-Reply-To: <199709141627.CAA26463@jawa.chilli.net.au> (ricko@allette.com.au) Message-ID: <199709151505.LAA26058@geode.ora.com> [Rick Jelliffe] > [Peter Murray-Rust] > > Being picky, this is not valid XML since prod [74] requires a > > SystemLiteral as well as the PubidLiteral. > > Yep. And do the < and > have to be entity references too in XML? > > Yep, XML does not support "formal" system identifiers as I > understand it. I think it is a shame, since there are things that > are not URLs that would be nice as identifiers, even in web systems. > But support for FSIs can be retrofitted at some later stage to XML. > I hope there is no chance of them being added to XML 1.0. But I > hope people keep FSIs in mind as a good way to ramp up the power of > URIs and other identifiers in the near future, in particular for > selecting particular system identifier notations (schemas). FSIs were discussed at the beginning. A decision was made that they were better left for later, and I agree. A decision was also made that all system identifiers would have an implicit FSI identifier of , which I also think is usually a good idea. This allows FSIs to be added later, and any unlabeled system ID is implied to have . What I was suggesting on the SIG was that for system identifiers in notation declarations, the assumed FSI label would be . As Rick pointed out, this is legal HyTime 2 FSI notation, and would be very useful. However, the WG has made its decision. I believe that XML authors are largely going to refer to images simply by URLs instead of entities; in that case, file system associations or HTTP headers can be used to ascertain the entity's type. In cases where NDATA entities are used, I would recommend that XML implementors ignore the system identifier of the notation, and make their decision based on the entity itself. -Chris -- http://www.oreilly.com/people/staff/crism/ +1.617.499.7487 90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Mon Sep 15 18:15:00 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:58:27 2004 Subject: Microsoft's XSL Proposal In-Reply-To: Russell Chamberlain's message of Mon, 15 Sep 1997 09:29:37 -0400 References: <3.0.1.32.19970915092937.007a73d0@livepage.com> Message-ID: <446.199709151614@grogan.cogsci.ed.ac.uk> Thanks for all your kind words. All the better, your queries are easily answered: 1) The August 7th draft of XML-lang introduced the use of NET for contingently, as well as declared, empty elements. Until the NetSGML TC is passed, use of this feature will produce XML documents which are NOT valid SGML. 2) The DTD The DTD in the appendix was simply intended to clarify a few points about the structure of patterns and actions. We should have been clearer that it is NOT a constitutive part of the proposal. A more complete (and XML conformant) DTD should be forthcoming soon. ht ----------- Henry S. Thompson, Human Communication Research Centre, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.cogsci.ed.ac.uk/~ht/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From russc at livepage.com Mon Sep 15 18:18:08 1997 From: russc at livepage.com (Russell Chamberlain) Date: Mon Jun 7 16:58:27 2004 Subject: Microsoft's XSL Proposal In-Reply-To: <199709151605.MAA06963@nathaniel.eps.inso.com> References: <3.0.1.32.19970915092937.007a73d0@livepage.com> Message-ID: <3.0.1.32.19970915121630.007c1c60@livepage.com> Hi, At 12:05 PM 97/09/15 -0400, you [Gavin Nicol] wrote: >>I'm _extremely_ happy that someone (Microsoft) has put forward an >>XML-formatting proposal (XSL - Extensible Style Language) to the W3C that: > >This is NOT a Microsoft proposal... other were (heavily) involved. Thousands of apologies!!!!!!! Here is a full list the folks who deserve credit (as mentioned in the proposal itself): Sharon Adler, Inso Corporation Anders Berglund, Inso Corporation James Clark Istvan Cseri, Microsoft Corporation Paul Grosso, ArborText Jonathan Marsh, Microsoft Corporation Gavin Nicol, Inso Corporation Jean Paoli, Microsoft Corporation David Schach, Microsoft Corporation Henry S. Thompson, University of Edinburgh Chris Wilson, Microsoft Corporation Good work! - Russ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From russc at livepage.com Mon Sep 15 20:04:55 1997 From: russc at livepage.com (Russell Chamberlain) Date: Mon Jun 7 16:58:27 2004 Subject: Microsoft's XSL Proposal In-Reply-To: <341D708C.26E27765@EpiphanySoftware.com> References: <3.0.1.32.19970915092937.007a73d0@livepage.com> Message-ID: <3.0.1.32.19970915140339.007c2800@livepage.com> Hi Andy (& XML-DEVers), At 10:29 AM 97/09/15 -0700, you [Andy Cogan] wrote: >Hi Russell, > >Russell Chamberlain wrote: >> I'm _extremely_ happy that someone (Microsoft) has put forward an >> XML-formatting proposal (XSL - Extensible Style Language) to the W3C that: >> >> 1) Is represented in XML >[...snip...] > >First, I agree with your points in your original mail message. Well >said! I've only recently started following the development of XML, and >DSSSL-O looked pretty intimidating. I like the direction of XSL. > >How did you find out about the XSL initiative? It seems like a major new >development, and I hate the feeling of being blindsided by being >ignorant of such important efforts. It came from the "XML/EDI Group Mailing List". The subject line was "More good news for XML/EDI !!!". I must confess that it was forwarded to me by a co-worker (thanks, Rich!) who subscribes to the list. I wouldn't have heard of it, otherwise. That's why I posted to XML-DEV, since I thought is was important, yet nobody had mentioned it. >Finally, I've gotten the impression that XML formatting can happen via >CSS, or XSL, or DSSSL-O. Can that be that right? It seems odd to offer >three distinct formatting languages. Or am I just completely confused (a >likely alternative!)? I certainly have heard all three mentioned in an XML context. Would anyone care to clarify this? I know that there may not be an answer yet, as the XML style issues are still in draft form. Last I heard, the deadline was around December of this year. I'm not involved with the XML-format (XML-style?) discussions, so I don't know the answer. I am willing to _guess_ that one reason for including all three might be because various organizations have investments in one, but not the other(s), so restricting to one just might upset a few people (big understatement). Also, I can see a definite trend in ease-of-use and power that goes CSS-->XSL-->DSSSL. Which one you want may depend on where your needs lie on the spectrum. There's also an existing application that has already been targeted for XML: the WWW. This already has CSS defined. > >-- > Andy Cogan > Epiphany Software >**************************************** >* E-mail: support@EpiphanySoftware.com * >* Voice: (408) 378-6145 * >* Web: http://www.EpiphanySoftware.com * >**************************************** Your guess speaker, - Russ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From peter at techno.com Mon Sep 15 21:15:34 1997 From: peter at techno.com (Peter Newcomb) Date: Mon Jun 7 16:58:27 2004 Subject: NOTATION/MIME (was Re: Recent XML WG decisions) In-Reply-To: <199709151505.LAA26058@geode.ora.com> (message from Chris Maden on Mon, 15 Sep 1997 11:05:12 -0400) Message-ID: <199709151859.OAA13746@exocomp.techno.com> > Date: Mon, 15 Sep 1997 11:05:12 -0400 > From: Chris Maden > > I believe that XML authors are largely going to refer to images simply > by URLs instead of entities; in that case, file system associations or > HTTP headers can be used to ascertain the entity's type. In cases > where NDATA entities are used, I would recommend that XML implementors > ignore the system identifier of the notation, and make their decision > based on the entity itself. I would caution against ignoring the declared notation for an entity, since it may be used to specify an interpretation other than the default interpretation that would be made by the system. By associating notations with chunks of data, entity declarations allow the same chunk of data to be viewed in different ways. The "classic" example of this is an XML document that is treated as XML in some places and as plain text in others (possibly as an example in a book about XML). It is true that most near-term applications can probably ignore declared notations, since the web community is already used to the limitations involved. This may change, however, as documents become increasingly object-oriented, providing different views of themselves for different audiences (as is done with SGML architectures). -peter -- Peter Newcomb TechnoTeacher, Inc. peter@petes-house.rochester.ny.us peter@techno.com http://www.petes-house.rochester.ny.us http://www.techno.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From mrc at allette.com.au Tue Sep 16 00:42:02 1997 From: mrc at allette.com.au (Marcus Carr) Date: Mon Jun 7 16:58:27 2004 Subject: Microsoft's XSL Proposal References: <3.0.1.32.19970915092937.007a73d0@livepage.com> Message-ID: <341DB990.AA5A35AD@allette.com.au> Russell Chamberlain wrote: > I'm _extremely_ happy that someone has put forward an XML-formatting proposal > (XSL - Extensible Style Language) to the W3C... Can you direct us to the draft proposal? -- Regards Marcus Carr email: mrc@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 9262 4777 fax: +61 2 9262 4774 _______________________________________________________________ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From donpark at quake.net Tue Sep 16 02:39:09 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:58:27 2004 Subject: Microsoft's XSL Proposal Message-ID: <199709160038.RAA18714@gw.quake.net> http://www.microsoft.com/standards/xml/ -----Original Message----- From: Marcus Carr To: xml-dev@ic.ac.uk Date: Monday, September 15, 1997 3:42 PM Subject: Re: Microsoft's XSL Proposal >Russell Chamberlain wrote: > >> I'm _extremely_ happy that someone has put forward an XML-formatting proposal >> (XSL - Extensible Style Language) to the W3C... > >Can you direct us to the draft proposal? > >-- >Regards > >Marcus Carr email: mrc@allette.com.au >_______________________________________________________________ >Allette Systems (Australia) email: info@allette.com.au >Level 10, 91 York Street www: http://www.allette.com.au >Sydney 2000 NSW Australia phone: +61 2 9262 4777 > fax: +61 2 9262 4774 >_______________________________________________________________ > > >xml-dev: A list for W3C XML Developers >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To unsubscribe, send to majordomo@ic.ac.uk the following message; >unsubscribe xml-dev >List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtauber at jtauber.com Tue Sep 16 03:06:44 1997 From: jtauber at jtauber.com (James K. Tauber) Date: Mon Jun 7 16:58:27 2004 Subject: Microsoft's XSL Proposal Message-ID: <01BCC280.5F482FA0.jtauber@jtauber.com> On Monday, 15 September 1997 11:04, Russell Chamberlain [SMTP:russc@livepage.com] wrote: > At 10:29 AM 97/09/15 -0700, you [Andy Cogan] wrote: > >How did you find out about the XSL initiative? It seems like a major new > >development, and I hate the feeling of being blindsided by being > >ignorant of such important efforts. > I must confess that it was forwarded to me by a co-worker (thanks, Rich!) > who subscribes to the list. I wouldn't have heard of it, otherwise. I'll try to make information like this available as soon as possible on http://www.jtauber.com/xml/ If you fill out the form at the bottom of the page, Netmind will email you whenever the page has been updated. James -- James K. Tauber / jtauber@jtauber.com Perth, Western Australia XML Pages: http://www.jtauber.com/xml/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jjc at jclark.com Tue Sep 16 07:37:40 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:58:27 2004 Subject: Microsoft's XSL Proposal References: <3.0.1.32.19970915092937.007a73d0@livepage.com> Message-ID: <341E1692.5FF1A8B2@jclark.com> Russell Chamberlain wrote: > 7) Has a clearly-defined conflict-resolution mechanism > > Some formatting environments specify that the "first" > applicable style in the stylesheet is always the one to > be applied. A stylesheet's behaviour should not change > based on the location of a style in the stylesheet's > source file. XSL will let authors organize their styles > in any way they see fit, with no effect on behaviour. > > Some environments also allow _multiple_ styles to be > applied. Which ones, and in what order? Yuck! > XSL explicitly states that at most a single pattern > will be chosen. Good idea. Only one construction rule can apply, but multiple style rules can apply. However, XSL does have a (hopefully) well defined conflic resolution mechanism for dealing with this, and it doesn't depend on the order of the rules in the stylesheet. James xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jjc at jclark.com Tue Sep 16 10:24:19 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:58:27 2004 Subject: XSL requests for clarification/suggestions for enhancement References: <5044147A23FED01195BF00609712EB6B5FA1@FLPS-NTSERVER1> Message-ID: <341E414E.819E6E71@jclark.com> Daniel Rivers-Moore wrote: > What is the best place to get information about just what can go into a > script? Is there a publicly available specification of the ECMAScript > language? You can get the ECMAScript spec from: http://developer.netscape.com/library/documentation/javascript.html James xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From north at synopsys.com Tue Sep 16 11:50:04 1997 From: north at synopsys.com (Simon North) Date: Mon Jun 7 16:58:27 2004 Subject: XSL requests for clarification/suggestions for enhancement In-Reply-To: <341E414E.819E6E71@jclark.com> Message-ID: <199709160951.LAA00415@cadis.de> You can get the official ECMA-262 (ECMAScript) spec in either MS-Word or Adobe Acrobat (PDF) form directly from the ECMA for free from: http://www.ecma.ch/stand/ecma-262.htm Simon. Simon North north@synopsys.com COSSAP Technical Writer, Aachen, Germany To be or not to be, those are the parameters. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From David.Rosenborg at uab.ericsson.se Tue Sep 16 15:19:04 1997 From: David.Rosenborg at uab.ericsson.se (David Rosenborg) Date: Mon Jun 7 16:58:27 2004 Subject: Recent XML WG decisions In-Reply-To: <199709130538.WAA07282@boethius.eng.sun.com> References: <199709130538.WAA07282@boethius.eng.sun.com> Message-ID: <199709161318.PAA11663@uabs19c25.eua.ericsson.se> Tim Bray wrote: > So XML is now case-sensitive. Sounds good, but what is the general opinion about case-sensitivity in XML applications? My own feeling is that it might be appropriate too have case insensitivity when you for example do a structural search in an XML browser or editor. It could also be useful when specifying patterns in XSL and alike. These things may of course fail if the document designer has chosen to distinguish elements only by case but I think that's unlikly to happen. I also have the feeling that the problem of case insensitive string comparison is not as dificult as the one of case folding. Case folding is a one to one mapping that might not be the same for different languages but when comparing strings you can treat groups of character, differentiated only in case and diacritics, to be the same. For example the characters i, ?, ?, ?, I, ? etc could be treated as being equal in this situation. Is this a correct assumtion or am I missing something? Cheers, ______________________________________________________________________________ David.Rosenborg@uab.ericsson.se Ericsson Utvecklings AB (UAB/K/UG) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From David.Rosenborg at uab.ericsson.se Tue Sep 16 16:21:02 1997 From: David.Rosenborg at uab.ericsson.se (David Rosenborg) Date: Mon Jun 7 16:58:28 2004 Subject: Case sensitivity (Clarification) In-Reply-To: <199709161337.IAA32194@mcconnel.ac.sil.org> References: <199709161337.IAA32194@mcconnel.ac.sil.org> Message-ID: <199709161420.QAA11840@uabs19c25.eua.ericsson.se> robin@mcconnel.ac.sil.org writes: > I think the discussion was just related to NAMECASE GENERAL NO > in the SGML declaration, having therefore to do only with SGML > names. Your post to XML-DEV made me wonder if you were talking > about character text in content... No, I was thinking of the SGML names. As far as I can understand case-sensitivity is only for the XML language it self i.e start and end tags should match in case and also match the case of a possible element declaration etc. This implies that the parser also is case sensitive. But the actual application accessing the resulting grove (if one is built) could be case insensitive even about SGML names. My question was what people think of this and also if my assumtions about comparing strings case insensitively were right. Thanks ______________________________________________________________________________ David.Rosenborg@uab.ericsson.se Ericsson Utvecklings AB (UAB/K/UG) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From dgd at cs.bu.edu Tue Sep 16 19:16:44 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:58:28 2004 Subject: whitespace In-Reply-To: <341C0F4B.5FDB@club-internet.fr> References: <199709140033.UAA08971@csb.bu.edu> Message-ID: At 11:22 AM -0500 9/14/97, Arnaud Le Taillanter wrote: >Inside the XML WG mailing list the WS issue was surely >extensively discussed, but I don't have access to >the archive of this discussion. I know it's already >a favor that the XML draft is made public (all drafts >and standards of W3C are public, I think this >helps) and that XML WG members are participating >in the xml-dev mailing list (they could avoid it). I agree that it's rather unfair of me to make a reference to a discussion that I can't produce. >Well, I ask for another favor: could you please make the >discussion about WS that led to the WG decision >available on line? After such a reading, everybody >could become convinced of the appropriate nature >of the WG decision. Please! Well, it's up to the W3C, not me -- as a member of the SIG (not even the decision-making part of the working group) I have no power to do this. There were some public archives of some parts of the discussion -- I think this is no longer allowed for the current discussions, under the W3C's confidentiality rules. You could try an Altavista search for my name -- it used to come up with a WWW archive of the old mailing list, and the URL may still work. I do doubt that people will want to re-read that discussion, however, once they have seen it. I was not exaggerating when I put the count at hundereds of messages. Most of these were repetitive, because the total list of factors involved, in the end, is the short list in my mail. The desire for simple rules, and need to work without DTDs the same way as with DTDs, and the desire for SGML compatibility all needed to be balanced. In fact, they were incompatible -- SGML as it stands has complicated rules, that we finally asked the ISO to relax. And _any_ solution that differentiates element content from mixed content requires a DTD or other declaration (under SGML rules or even new ones). The proposal to add a new declaration for element content was abandoned because it's rendundant with a DTD, and confusing without -- a likely source of errors rather than a convenience. >> Every variation >> you discussed has been gone over and they all were either: >> 1. unworkably complex (like the current SGML rules, whihc few >> remember and even fewer remember correctly)). > >Agreed. So we have point 1 nailed down. >> 2. Not compatible with SGML, or unworkably ugly like the proposal to >> quote all literal text. > >If SGML rules concerning WS are to be discarded, any >other rule adopted is incompatible, including the draft rule. Yes, but the ISO was willing to add the pass-all-whitespace rule to SGML, and it wil be official in a few months. No other proposal also solved the very real problems of SGML->SGML transformation caused by parsers hiding whitespace, and so there was little independent reason to add them into SGML. That nails down point 2. > >> 3. Failed to work without a DTD. This is the kicker, and it's >> required by XML because you don't always have the DTD, and different >> results in the has-DTD/doesn't-have-DTD cases are unacceptable. > >I agree. So that nails down point 3. And we really agree! :) .... oh: > The tree structures must be exactly the same in either case. >Some constraint regarding WS is necessary on the way to input an >XML text I assume. I'm not sure what you mean, here. Any method for ignoring whitespace must enable: 1. explicit whitespace to be posible wherever is is wanted (including near element boundaries). 2. Line-breaks to be preserved for some (verbatim, or
-style) elements.

  3. Can't depend on the DTD or other declarations to control it.

The simplest proposal that does this is to pass all whitespace.

The only real drawback is that _some_ applications (like table formatters)
may have to explicitly ignore whitespace in _some_ contexts where a
traditional SGML parser would have been able to do it for them. Linking
applications must deal with (count), and can't ignore whitespace chunks
that in some cases may have little meaning to a user.

The benefits are "simplest possible rule", easy XML->XML transduction that
preserves the original formatting, a dependable way to count character data
in documents that contain whitespace, regardless of whether you have a DTD.

>> The recent change (to normalize all linends) fills the one hole the
>> previous proposal had -- because it was nearly certain that some
>> processes would blindly change CRLF and their ilk anyhow.

Note that this is only data normalization permitted in XML, and that it
only warrants processes like the changing of line-ending conventions (eg
from PC to Mac) -- that we all know would have taken place anyway, causing
errors, even if they were explicitly prohibited by the standard.

>> My advice: don't waste your bytes complaining about this -- we've
>> heard it _all_ before -- and the solution that works best is to leave
>> it to the application.
>
>I am sure I will get
>convinced when I read the WG discussion :-)
>Or I fear the WG members will have to hear it all (and more)
>again :-))

    My advice was just advice about what expectations you could have of
_results_ from whatever discussion ensure. Feel free to discuss whitespace
to your heart's content. But don't expect XML to change.

I'll see if there's any way the archives of the whitespace debate can be
made available, but I can honestly say that they're painful rather than
enlightening reading. Expect to devote several days to the reading, too, if
they do becom public.

I was a chief proponent of the current approach, even at the beginning,
when most in the group did not want to do anything so radical, so I agree
that explanations of the decision are worthwhile -- and I've tried to
contribute such -- but I'm certainly not going to read an extended rehash
on the issue. I've devoted my pound(s) of flesh to whitespace already.

  -- David

RE delenda est!

David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________



xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Thu Sep 18 16:13:55 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:28 2004
Subject: Re  Whitespace
Message-ID: <199709181413.PAA06449@mail.iol.ie>


Sorry for the lateness of this reply. It got a bit lost in my out-box for a
while!

[Sean Mc Grath]
>>Throw out that grep, that text editor, that fgets(), that diff,sort,uniq
>>utility There all busted for XML use.
>
[David Durand]
>gets is of course Broken As Designed, as the cause of most security bugs in
>Unix systems.

Sorry David, I cannot let you get away with that one. I said *fgets()* which
is an entirely different function to gets(). It takes
three paramaters one of which is the maximum number of characters to read.
It is not Broken As Designed.

>
>Again, they are broken for XML use with files created a particular way.
>They are also broken for HTML files created the same way, and I don't hear
>the weeping and wailing.

No weeping and wailing required because it is typically possible to splice in
line-ends into HTML *without affecting the content*. This is not the case
with XML.

>Can you suggest any solution to the "grep" problem other than requiring a
>fixed line-max in XML.

Yes. Ignore all line ends. I know this presents its own set of difficult
problems
but I'd prefer to tackle these - and maintain compatability with a decades worth
of tools - rather than break the tools.

> Do you think that that hideous hack to accomodate
>defective (if very useful) tools is really worth it.
Yes. Line oriented text processing has been a hugely popular paradigm for
many years now. I don't think of these tools as "defective" at all. I dare
say many wielders of these tools are of the same opinion. These people will
be rightly miffed at the suggestion that they are defective by virtue of the
use of a line oriented paradigm. They will also be rightly miffed that they
cannot bring their tools/skills to bear in the XML world.

>Can you suggest how we
>would determine that buffer size?
Question is Broken As Designed. No need for a silly fixed limit. Just a
recognition
of the existence *of* limits and a standardised mechanism for dealing with them.

Sean Mc Grath
sean@digitome.com
www.digitome.com



xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Thu Sep 18 18:09:41 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:28 2004
Subject: Re  Whitespace
In-Reply-To: <199709181413.PAA06449@mail.iol.ie>
Message-ID: 

>Sorry for the lateness of this reply. It got a bit lost in my out-box for a
>while!
>
>[Sean Mc Grath]
>>>Throw out that grep, that text editor, that fgets(), that diff,sort,uniq
>>>utility There all busted for XML use.
>>
>[David Durand]
>>gets is of course Broken As Designed, as the cause of most security bugs in
>>Unix systems.
>
>Sorry David, I cannot let you get away with that one. I said *fgets()* which
>is an entirely different function to gets(). It takes
>three paramaters one of which is the maximum number of characters to read.
>It is not Broken As Designed.

No, but fgets (unlike gets) can deal with long lines --- you have to
recognize that you overflowed and make accomodations, but you can do the
right thing. iw as giving you the benefit of the doubt, since gets, at
least, has the problem that you are raising, while fgets does not.

>>
>>Again, they are broken for XML use with files created a particular way.
>>They are also broken for HTML files created the same way, and I don't hear
>>the weeping and wailing.
>
>No weeping and wailing required because it is typically possible to splice in
>line-ends into HTML *without affecting the content*. This is not the case
>with XML.

Just try that in tables. You have to know the meaning of the markup, even
in HTML, if you want to do this. Now you can claim that table markup is
broken, and you might be right, but HTML does not suport your argument.

Similarly for pre elements: You can't do anything to lineneds in there --
maybe I'm using a 20K line in 
 to force horisontal scrolling for a
rhetorical reason.

>>Can you suggest any solution to the "grep" problem other than requiring a
>>fixed line-max in XML.
>
>Yes. Ignore all line ends. I know this presents its own set of difficult
>problems
>but I'd prefer to tackle these - and maintain compatability with a decades
>worth
>of tools - rather than break the tools.

But this creates worse problems: lack of 
-style elements, inability to
write XML filters that preserve linespace jsut from generic XML parsers.
No way to use string offsets in linking.

>> Do you think that that hideous hack to accomodate
>>defective (if very useful) tools is really worth it.
>Yes. Line oriented text processing has been a hugely popular paradigm for
>many years now. I don't think of these tools as "defective" at all. I dare
>say many wielders of these tools are of the same opinion. These people will
>be rightly miffed at the suggestion that they are defective by virtue of the
>use of a line oriented paradigm. They will also be rightly miffed that they
>cannot bring their tools/skills to bear in the XML world.

But they can, they just need to limit their files to crrespond to the
limitation of their tools. People do this all the time, without difficulty.
Of course if the world at large decides to abandon the "line paradigm" then
those who stick to it will be inconvenienced. But then if "the world" make
the shift, then there's still not a very big problem, is there?

Even in that case, with some (usually minimal) human intervention, such
linend conversion/insertion is trivial in practice.

I'm sorry I still don't see how this is _worse_ than what we have with text
files today. And compared to HTML and SGML, I think XML's rules are more
consistent, and useful for more things.

I deal with the Mac (where line == paragraph), as well as Unix, all the
time. This problem is not usually of more than 10 seconds concern on the
few times in a month that it comes to mind. On occasion, of course, I find
myself spending 1-10 minutes in an editor fixing things (usually by
invoking a "wrap" command of some sort).

>>Can you suggest how we
>>would determine that buffer size?
>Question is Broken As Designed. No need for a silly fixed limit. Just a
>recognition
>of the existence *of* limits and a standardised mechanism for dealing with
>them.

I can't imagine what such a mechanism is: IBM text editors for decades had
an 80-character limit. Some still work best with 72 column files. if XML is
supposed to require lines no longer than some limit, we need to specify
that limit in the standard. Otherwise all we can say is that any XML
processor is free to reject any document if the lines are "too long for
that tool". That's en even worse prescription for interoperability.

If there are limits, a standard has to tell you how to be safe and not
break any of those limits. At least, a good standard should.

 -- David

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________



xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From digitome at iol.ie  Thu Sep 18 20:40:28 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:28 2004
Subject: Re  Whitespace
Message-ID: <199709181840.TAA04606@mail.iol.ie>

[Sean Mc Grath]
>>
>>Sorry David, I cannot let you get away with that one. I said *fgets()* which
>>is an entirely different function to gets(). It takes
>>three paramaters one of which is the maximum number of characters to read.
>>It is not Broken As Designed.
>
[David Durand]
>No, but fgets (unlike gets) can deal with long lines --- you have to
>recognize that you overflowed and make accomodations, but you can do the
>right thing. iw as giving you the benefit of the doubt, since gets, at
>least, has the problem that you are raising, while fgets does not.
>
[Sean Mc Grath]
You mentioned gets(). I didn't. How your insertion of an irrelevant reference
to gets() can be construed as giving me "the benefit of the doubt" I don't know.

[Sean Mc Grath]
>>No weeping and wailing required because it is typically possible to splice in
>>line-ends into HTML *without affecting the content*. This is not the case
>>with XML.
>
[David Durand]
>Just try that in tables. You have to know the meaning of the markup, even
>in HTML, if you want to do this. Now you can claim that table markup is
>broken, and you might be right, but HTML does not suport your argument.

[Sean Mc Grath]
Why not? Why cannot I replace say, "" with "\n" everywhere?
The problem then reduces to long data chunks such as...
pre elements:-

[David Durand]
>
>Similarly for pre elements: You can't do anything to lineneds in there --
>maybe I'm using a 20K line in 
 to force horisontal scrolling for a
>rhetorical reason.

[Sean Mc Grath]
Absolutely agreed. the  case is fundamentally different.
These line-ends are truly part of the data and a processor that adds new ones
is blowing the integrity of the data. Thus the plausible argument in favour
of not
using line-end as data content.

[David Durand]
>
>>>Can you suggest any solution to the "grep" problem other than requiring a
>>>fixed line-max in XML.
>>
[Sean Mc Grath]
>>Yes. Ignore all line ends. I know this presents its own set of difficult
>>problems
>>but I'd prefer to tackle these - and maintain compatability with a decades
>>worth
>>of tools - rather than break the tools.
>

[David Durand]
>But this creates worse problems: 

[Sean Mc Grath]

Worse?

[David Durand]
>lack of 
-style elements

Broken As Designed. If something has to give I think 
 elements should
be first to go.
Alternatively the problem can alway be "arcformed" away. We use 
      DIGITOME CDATA #FIXED "PREFORM">
all the time. Our pretty printing, word wrapping SGML processing tools use
this to
avoid adding extraneous WS that would blow the data content.

[David Durand]
>, inability to write XML filters that preserve linespace jsut from generic
XML parsers.

[Sean Mc Grath]
Line ends (at least those) tipping up to start-end tags would *not* be part
of the data. They
could thus be added/dropped without effecting the data. The CGR output of
the grove
would be the final arbiter on "equivalence" and the launching pad for
offsets used in
addressing.

>No way to use string offsets in linking.

If it ain't got a representation in the grove it ain't in the data and thus
is not counted
when totting up offsets.

[David Durand]
>
>>> Do you think that that hideous hack to accomodate
>>>defective (if very useful) tools is really worth it.

[Sean Mc Grath]
>>Yes. Line oriented text processing has been a hugely popular paradigm for
>>many years now. I don't think of these tools as "defective" at all. I dare
>>say many wielders of these tools are of the same opinion. These people will
>>be rightly miffed at the suggestion that they are defective by virtue of the
>>use of a line oriented paradigm. They will also be rightly miffed that they
>>cannot bring their tools/skills to bear in the XML world.

[David Durand]
>But they can, they just need to limit their files to crrespond to the
>limitation of their tools. People do this all the time, without difficulty.


[Sean Mc Grath]
No difficulty?

Problem : I receive an XML file from a user who works with <1024 lines in
his tools.

I use <512. how do I munge his file to suite my tools? I can't without
blowing the data. If tag-tipping line ends were transient I could make 
a stab at it. I would still have to address the ""
case. But hey! I never said this was simple! I just said that the alternate
set of problems this presents have the benefit of not throwing out our
existing line oriented tools and techniques.

[David Durand]
>Of course if the world at large decides to abandon the "line paradigm" then
>those who stick to it will be inconvenienced. But then if "the world" make
>the shift, then there's still not a very big problem, is there?

[Sean Mc Grath]
That is one-helluva shift IMHO! I am not sure to what extent the world is
   a) aware of this aspect of XML
   b) willing to bite that bullet.
 
[David Durand]
>if XML is
>supposed to require lines no longer than some limit, we need to specify
>that limit in the standard.

[Sean Mc Grath]
No we don't! We need to have a well defined mechanism whereby a tool with
a line length limit of N can work with XML with line length > N without
blowing the integrity of the data.

[David Durand]
>Otherwise all we can say is that any XML
>processor is free to reject any document if the lines are "too long for
>that tool". That's en even worse prescription for interoperability.
>
See above.

[David Durand]
>If there are limits, a standard has to tell you how to be safe and not
>break any of those limits. At least, a good standard should.
>

[Sean Mc Grath]
The standard does not have to establish a limit. It could help users
of "legacy" tools to *cope* with limits though. "Buy/build better tools" is one
line that can be taken but it is not the only one.




Sean Mc Grath
sean@digitome.com
www.digitome.com



xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From dgd at cs.bu.edu  Thu Sep 18 21:38:33 1997
From: dgd at cs.bu.edu (David G. Durand)
Date: Mon Jun  7 16:58:28 2004
Subject: Re  Whitespace
In-Reply-To: <199709181840.TAA04606@mail.iol.ie>
Message-ID: 

At 1:40 PM -0500 9/18/97, Sean Mc Grath wrote:
>[David Durand]
>>No, but fgets (unlike gets) can deal with long lines --- you have to
>>recognize that you overflowed and make accomodations, but you can do the
>>right thing. iw as giving you the benefit of the doubt, since gets, at
>>least, has the problem that you are raising, while fgets does not.
>>
>[Sean Mc Grath]
>You mentioned gets(). I didn't. How your insertion of an irrelevant reference
>to gets() can be construed as giving me "the benefit of the doubt" I don't
>know.


Well, as fgets does not support your argument that "long lines cause
problems" I thought it might be a typo for gets (wh/ does have serious
problems w/ long lines, but is of course a canonical example of bad design,
and not something we want to accomodate).

as to fgets, I confess that I don't see that it should have any problem
with anyfile, newline-containing or not. Am I clear now?

>[David Durand]
>>Just try that in tables. You have to know the meaning of the markup, even
>>in HTML, if you want to do this. Now you can claim that table markup is
>>broken, and you might be right, but HTML does not suport your argument.
>
>[Sean Mc Grath]
>Why not? Why cannot I replace say, "" with "\n" everywhere?
>The problem then reduces to long data chunks such as...
>pre elements:-

Well, because people use tables to format, and that extra space queers the
pitch, inducing funny spacign bahavior. Agreed that a better table model
could avoid this.

>[David Durand]
>>
>>Similarly for pre elements: You can't do anything to lineneds in there --
>>maybe I'm using a 20K line in 
 to force horisontal scrolling for a
>>rhetorical reason.
>
>[Sean Mc Grath]
>Absolutely agreed. the  case is fundamentally different.
>These line-ends are truly part of the data and a processor that adds new ones
>is blowing the integrity of the data. Thus the plausible argument in favour
>of not
>using line-end as data content.

I confess to not understanding why a lineend cannot occur at the beginning
of an element. Even SGML never proposed to remove more than _1_ such line
break.

So you want to take them all away, so that grep won't break.

>[David Durand]
>>
>>>>Can you suggest any solution to the "grep" problem other than requiring a
>>>>fixed line-max in XML.
>>>
>[Sean Mc Grath]
>>>Yes. Ignore all line ends. I know this presents its own set of difficult
>>>problems
>>>but I'd prefer to tackle these - and maintain compatability with a decades
>>>worth
>>>of tools - rather than break the tools.

Well, it makes data rather unrevealing.

And of course, the tools are only broken if common practice leads to the
use of long lines -- and if that becomes the case, then it will only have
been because the tools are _not_ actually that important.

This is a social argument that you have not addressed yet, but it cuts to
the core of why we should not do this... We get a simpler easier model, and
there is  nothing to stop people from any self-imposed discipline their
tools require.

And if people are _not_ following such a discipline, then there's no reason
to worry about the tools, because it can only happen if people are not
using those tools for XML.

>[David Durand]
>>lack of 
-style elements
>
>Broken As Designed. If something has to give I think 
 elements should
>be first to go.
Well, theoretically there's a lot of reasonableness to using explict markup
for such line breaks. But, the pragmatist in me has to note that there has
been _no_ successful markup or document processing language without such a
feature (except for word-processors, but the case there is complicated
because the user never _sees_ the relevant representation.

>Alternatively the problem can alway be "arcformed" away. We use
>      DIGITOME CDATA #FIXED "PREFORM">
>all the time. Our pretty printing, word wrapping SGML processing tools use
>this to
>avoid adding extraneous WS that would blow the data content.

Doesn't solve the problem you raised. That data has a long line in it and
grep crashes. You have to split the line, and take the consequences, or not
use grep.
if you don't allow arbitrary line-break introduction anywhere, you haven't
solved the legacy tool problem, which weakens your argument somewhat. If
you do, you've mad it impossible to count on line-breaks _ever_ being
significant. The XML committee considered this and rejected it as too
divergent from current practice (that people did not want to give up).

>[David Durand]
>>, inability to write XML filters that preserve linespace jsut from generic
>XML parsers.
>
>[Sean Mc Grath]
>Line ends (at least those) tipping up to start-end tags would *not* be part
>of the data. They
>could thus be added/dropped without effecting the data. The CGR output of
>the grove
>would be the final arbiter on "equivalence" and the launching pad for
>offsets used in
>addressing.

Yes, and the "looks the same in my editor" arbiter of equivalence would
fail. This has long been felt unacceptable by those who use such
transformations. If any hand-editing is involved it is unacceptable
behaviour to change all the line-ends.

>[Sean Mc Grath]
>>>Yes. Line oriented text processing has been a hugely popular paradigm for
>>>many years now. I don't think of these tools as "defective" at all. I dare
>>>say many wielders of these tools are of the same opinion. These people will
>>>be rightly miffed at the suggestion that they are defective by virtue of the
>>>use of a line oriented paradigm. They will also be rightly miffed that they
>>>cannot bring their tools/skills to bear in the XML world.

>[David Durand]
>>But they can, they just need to limit their files to crrespond to the
>>limitation of their tools. People do this all the time, without difficulty.

Yes, If your editor and tools have a 72 character line limit, you don't
create files with long lines. Then your tools always work. If you want
everyone's tools to always work, and you admit a maximum line-length for
tools, you need to pick that number so I can make files that won't toast
your software. Either that, or someone with different software will exceed
the limits of your software, of whose existence she has never even heard!

>
>[Sean Mc Grath]
>No difficulty?
>
>Problem : I receive an XML file from a user who works with <1024 lines in
>his tools.
>
>I use <512. how do I munge his file to suite my tools? I can't without
>blowing the data. If tag-tipping line ends were transient I could make
>a stab at it. I would still have to address the ""
>case. But hey! I never said this was simple! I just said that the alternate
>set of problems this presents have the benefit of not throwing out our
>existing line oriented tools and techniques.

Look, we have a solution. Proposing a new solution based on a new problem
(grep and other tools with hard line-length limitations) requires that the
new solution actually _solve_ the problem. Your solution does not solve the
problem you yourself pose, so it's hard for me to take seriously.

>[David Durand]
>>Of course if the world at large decides to abandon the "line paradigm" then
>>those who stick to it will be inconvenienced. But then if "the world" make
>>the shift, then there's still not a very big problem, is there?
>
>[Sean Mc Grath]
>That is one-helluva shift IMHO! I am not sure to what extent the world is
>   a) aware of this aspect of XML
>   b) willing to bite that bullet.

In that case, they create files with short lines, and there is no bullet to
bite. The only way this problem can become common is if long lines become
very popular. I don't see how long lines can become popular if they create
fatal tool problems with popular tools. Either long lines will not be
common, or tools that cope with long lines will be common along with the
long lines themselves.

It's a simple feedback loop. No need to change the standard, just let
people's desire to share data feed back into the general knowledge of what
data is shareable.
>[David Durand]
>>if XML is
>>supposed to require lines no longer than some limit, we need to specify
>>that limit in the standard.
>
>[Sean Mc Grath]
>No we don't! We need to have a well defined mechanism whereby a tool with
>a line length limit of N can work with XML with line length > N without
>blowing the integrity of the data.
How do we do this for legacy tools like grep with a hard-compiled limit
(that is not documented, and varied from vendor to vendor)?
If files that work with arbitrary tools are to be possible, we need to know
the constraints that those tools impose.

>[David Durand]
>>Otherwise all we can say is that any XML
>>processor is free to reject any document if the lines are "too long for
>>that tool". That's en even worse prescription for interoperability.
>>
>See above.

I saw. I didn't see how you're going to fix grep (for your data\ndata
case). Or rather the "40K of data with no \n" case which is the real killer.

>[David Durand]
>>If there are limits, a standard has to tell you how to be safe and not
>>break any of those limits. At least, a good standard should.
>>
>
>[Sean Mc Grath]
>The standard does not have to establish a limit. It could help users
>of "legacy" tools to *cope* with limits though. "Buy/build better tools"
>is one
>line that can be taken but it is not the only one.

Well, how could the standard do that?

Actually, since the standard is almost certainly not going to change, I
don't really care how it could do it. My sense is that people won't do
without 
 equivalents -- so you can never get total freedom to
remove/add linends. So since the problem is unsolvable, lets not waste
time, and complicate the standard to get a partial solution (ie. solution
that fails to solve the problem) at the cost of a popular feature.

  -- David

I think that's it for me.

_________________________________________
David Durand              dgd@cs.bu.edu  \  david@dynamicDiagrams.com
Boston University Computer Science        \  Sr. Analyst
http://www.cs.bu.edu/students/grads/dgd/   \  Dynamic Diagrams
--------------------------------------------\  http://www.dynamicDiagrams.com/
MAPA: mapping for the WWW                    \__________________________



xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Sep 19 01:58:21 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:28 2004
Subject: XML-WG and XML-SIG deliberations
Message-ID: <10145@ursus.demon.co.uk>

Two postings on XML-DEV have explicitly or implicitly referred to the 
discussion of XML-SIG and XML-WG. The formal position is that the
discussions of XML-WG (the current W3C-appointed decision-making body) and
XML-SIG (a group of experts who offer advice to XML-WG) are confidential to
W3C member organisations (and the invited experts). This confidentiality
is important as it represents part of the value of being a member of W3C.

There is potential confusion about the archives, since the XML discussion group
was originally called the 'WG' and its archives were (and are) public. They
ended about June 1997 (any precise dates and current URLs for these?) They
are of historical interest and there *might* be some useful discussion there
but there is a huge amount to read through. Maybe some of the whitespace 
discussion is in the public archives, though I wouldn't rush.
The archives of XML-WG since June 1997 (?) are not publicly available. Nor
are those of XML-SIG.

However the discussion on this list, and the publicly reported developments
contributed by posters/readers of this list are valued by the XML-groups. 
For example the recent WG posting emphasised the value of APIs and their
possible co-publication with XML specs. 

The proposal for XSL (XML-STYLE) *is* publicly visible and URLs have been
posted on this list. Unfortunately for XML-DEVers, any XML-SIG and XML-WG 
discussion on this is confidential.  I leave it to any XML-WG readers of this
list to keep XML-DEV aware of what is happening. Perhaps it could be useful
to remind us of the proposed milestones/timescales for the various XML 
components to be published/accepted.

	P. 

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From jeremy at allaire.com  Fri Sep 19 06:15:15 1997
From: jeremy at allaire.com (Jeremy Allaire)
Date: Mon Jun  7 16:58:28 2004
Subject: Custom Tags
Message-ID: <34220BAC.2F83@allaire.com>

For anyone interested in CFML custom tags:

http://www.allaire.com/TagGallery/

xml-dev: A list for W3C XML Developers
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To unsubscribe, send to majordomo@ic.ac.uk the following message;
unsubscribe xml-dev
List coordinator, Henry Rzepa (rzepa@ic.ac.uk)


From Alice.Portillo at PSS.Boeing.com  Tue Sep 23 23:05:47 1997
From: Alice.Portillo at PSS.Boeing.com (Portillo, Christina)
Date: Mon Jun  7 16:58:28 2004
Subject: Use of Character Escape Codes
Message-ID: 

Thought I would share Peter Flynn response on escape codes with you all.

Christina Portillo
Product Definition and Image Technology

The Boeing Company               Phone: 425.237.3351
PO Box 3707   M/S 6H-AF        Fax: 425.237.3428
Seattle, WA  98124-2207            christina.portillo@boeing.com


> ----------
> From: 	Peter Flynn[SMTP:pflynn@imbolc.ucc.ie]
> Sent: 	Monday, September 22, 1997 7:15 PM
> To: 	Christina Portillo
> Subject: 	Use of Escape Codes and Characters
> 
> At 20:13 22/05/97 +0100, you wrote:
> >Q == "Question=0D How do you encode in your XML document references
> to=
> >characters above 126 in the ISO646 character set. 
>
>So of the character=
> >classes defined in the standard: space, char, letter, Base Char, =
> >Ideographic, CombiningChar, Letter, Digit, Ignorable, and Extender
> which=
> >of these has to be escaped to be used in a document. OR from what =
> >index value down must escape codes be used."
> 
> I'm sorry to have delayed answering this but the character set
> question
> became rather vexed :-)
> 
> The simple answer is you escape any code you can't type as a character
> or byte combination. In other words, if you are working in ASCII, but
> you can generate an e-acute with the correct code (ie ISO 10646, not
> Windows :-) then you should be able to do so, and embed that byte in
> the file. If you need a Hangul glyph and you can't type it, then you
> need to use the escaped code: presumably users on Hangul systems can
> generate all their own characters at the keyboard. 
> 
> But in practice I think we'll need to see how/if the browsers
> implement 
> non-Latin character repertoires.
> 
> ///Peter
> 

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From gannon at commerce.net  Thu Sep 25 01:58:25 1997
From: gannon at commerce.net (Patrick Gannon)
Date: Mon Jun  7 16:58:28 2004
Subject: XML iMarket Project Planning Meeting
Message-ID: <01BCC907.BE1F2F00@arrow-d83.sierra.net>

CommerceNet XML iMarket Project Team,

The XML iMarket Project Planning Team will meet on Monday, October 6, 1997, 9:00am to 12:00pm PDT.  

The meeting location will be the CommerceNet offices, 4005 Miranda Ave, Suite 175, Palo Alto, CA 94304 (650-858-1930) unless otherwise notified.  

I will arrange for 800# conference call facilities for those unable to attend in person and send the 800# information to those who have replied and confirmed their interest in participating.

If you can attend, please reply confirming whether you will be able to attend in person or whether you will attend via the 800# conference call.  Please note that attendence in person or phone is limited to members of CommerceNet's Information Access Portfolio only.

The goal of the meeting is to develop a detailed project plan and Request For Proposal (if needed) to identify companies or consultants with expertise required to help on the project.  The iMarket Project is designed to take the XML catalog files and Document Type Definition files produced during the recently completed XML Catalog project.  The general plan is to build a demonstration virtual marketplace which utilizes the multiple vendor XML catalogs with standard DTDs and allows shoppers to search for products across vendors by specifying product and merchant attributes.  Another goal of this project is to demonstrate how the use of XML stylesheets will allow vendors/merchants to maintain "brand equity" while using common description templates (DTDs).

The XML Catalog tutorial and sample XML/DTD files are available for members at:
http://members.commerce.net/pw/portfolios/access/xml/xml-demo.html

CommerceNet IA Portfolio Members, please review these XML documents and let me know if you or someone else in your company is interested in participating.

Non-members, please reply if you are interested in becoming members or being put on the RFP list.

Thank you for your continued support.

Patrick Gannon, Executive Director
Information Access Portfolio, CommerceNet
http://www.commerce.net/services/portfolios/
------------------------------------------------------
President & CEO, Internet Shopping Directory, Inc.
865 Tahoe Blvd., Suite 211, Incline Village, NV  89451
702-831-2251   702-831-3925 (Fax)
mailto://patrick@shoppingdirect.com
http://www.shoppingdirect.com




xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Thu Sep 25 17:53:42 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:28 2004
Subject: XML iMarket Project Planning Meeting
In-Reply-To:  (message from Arthur Keller on Thu, 25 Sep 97 6:02:38 PDT)
Message-ID: <199709251550.IAA13057@boethius.eng.sun.com>

| The requirement of standard DTDs by all vendors and participants
| presumes that these are adequate to satisfy the differentiation needs
| of the various participants.  "Brand equity" is not sufficient
| differentiation.  Rather, one company may use more detailed
| characteristics than another company in order to differentiate their
| products.

I think you're missing the point.

What I as a consumer want to be able to do is quite simple.  I want to
be able to say, "Hey, I need a new jacket," sit down at my computer,
call up my find-a-product robot, enter my jacket parameters, and then
come back a while later to find all the jackets that fit those
parameters offered by all the vendors whose products I'm interested in
considering.  If the catalog scheme isn't standardized enough to
support this, then I as a consumer am not interested in using it.  If
one of the vendors differentiates itself by adopting a scheme of data
representation that doesn't allow this kind of transparent direct
comparison, then it differentiates itself right out of the class of
vendors I'm interested in, because if all it's giving me is the
ability to cruise its catalog in isolation, I can get the same
functionality from the printed version; it no longer participates in a
way that allows the net to add value to me as a consumer.

I'm not denying that vendors will want to differentiate their
offerings, but if they can't do it in a way that supports detailed
direct comparisons based on the differentia that I am interested in
*as a consumer* then they are simply not in the game at all.

Jon


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From srn at techno.com  Thu Sep 25 22:12:40 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:58:29 2004
Subject: XML iMarket Project Planning Meeting
In-Reply-To: <199709251550.IAA13057@boethius.eng.sun.com>
	(Jon.Bosak@eng.Sun.COM)
Message-ID: <199709252008.QAA01199@bruno.techno.com>

[Jon Bosak:]

> What I as a consumer want to be able to do is quite simple.  I want to
> be able to say, "Hey, I need a new jacket," sit down at my computer,
> call up my find-a-product robot, enter my jacket parameters, and then
> come back a while later to find all the jackets that fit those
> parameters offered by all the vendors whose products I'm interested in
> considering.  If the catalog scheme isn't standardized enough to
> support this, then I as a consumer am not interested in using it.  If
> one of the vendors differentiates itself by adopting a scheme of data
> representation that doesn't allow this kind of transparent direct
> comparison, then it differentiates itself right out of the class of
> vendors I'm interested in, because if all it's giving me is the
> ability to cruise its catalog in isolation, I can get the same
> functionality from the printed version; it no longer participates in a
> way that allows the net to add value to me as a consumer.
> 
> I'm not denying that vendors will want to differentiate their
> offerings, but if they can't do it in a way that supports detailed
> direct comparisons based on the differentia that I am interested in
> *as a consumer* then they are simply not in the game at all.

There is a very serious problem here that bears strikingly on an
ongoing discussion in XML-land: the discussion of so-called
"namespaces".  The idea that there will be consortia of vendors, or
any other sort of authority who will determine some list of names of
characteristics of each sort of product, so that characteristics can
be directly and automatically compared, is dangerous to innovation,
competition, and commerce, and it is totally unnecessary, too.  It
will open the door for existing businesses to use such architectures
as weapons against upstarts in niche markets and in unusual or new
market combinations.  Moreover, the use of information architectures
as weapons will always seem like perfectly reasonable business
practices, so it will be nobody's fault when new concepts fail to be
accepted in the marketplace, because the internet failed to live up to
its promise of helping people find what they are looking for and make
informed purchasing decisions.  The macroeconomy will be damaged.

Andrew Layman (whom I do not know, but would like to) has laid out a
list of requirements for the implementation of namespaces which, if
used as guidance in the development of XML's namespace features, will
create a need for authorities who give "standard" names to such things
as product characteristics.  The concentration of power in such
authorities will hinder innovation, by making it difficult to compare
products regarded as "out of category" for some authority's set of
defined names.  I quote from Andrew's "Universal Names" posting of 23
September 1997 on the w3c-xml-sig@w3.org list:

  [Andrew Layman:]

  I've agreed to summarize the set of requirements that I have
  championed in the past under the term "namespaces." Because this
  word has also meant several alternate sets of requirements, I'm
  temporarily using an entirely different term, "universal names," so
  that we can understand this set of requirements without being
  confused by other useful, but different, goals.  ...

  [Here] I'm going to describe one set of requirements, as best I
  understand it, in my own words. The name is not important. This set
  of requirements is.  ...

  Let me mention a few things that are not requirements of this
  facility.  They may be useful features in some other context, but
  they are not needed in order to have universal names, and should not
  be confused with universal names:

  We do not require an ability to rename elements, so that they can be
  called one thing in a schema and something else in a document instance.
  We do not require the ability to associate multiple semantic meanings
  with a single name.

  In short, what we need, and all that we need, is a facility that
  gives every element's type a universal name, and allows a single
  element type to be known by the same name across disparate
  documents, where the documents have different "document types" or
  where there is no specific document type.


When Andrew Layman says, "We do not require an ability to rename
elements, so that they can be called one thing in a schema and
something else in a document instance," he is backhandedly stating a
requirement that conflicts with the evolutionary process of defining
and marketing new products.  How will the catalog of everything that
is for sale handle a case where the same product characteristic, or
even the same entire product, arises from multiple industries
simultaneously, and each of those industries already uses its own
authoritative schema?  Will the contents of documents have to be
duplicated and translated so as to conform with multiple schemas, so
that different comparisons can be made?  If so, that will cause much
of the value of making the comparisons in the first place to be lost;
features regarded by authorities as "out of category" will simply
disappear.  Imagine a single device that is a fax machine, a
telephone, a copier, a computer, and a stereo sound system.  Should it
appear in a list of telephones?  Maybe.  Should the output wattage of
its amplifier be listable in a comparison with the output wattage of
other telephones?  Maybe.  Should the people who figure out what are
the interesting characteristics of telephones anticipate that output
wattage may be an important characteristic of telephones?  It's
completely unrealistic to expect those people to anticipate that.
And, yet, it's an interesting and relevant statistic and it may be
important to some consumers.

The ugly truth is that we can't predict whether information that is
now thought to be irrelevant to other information (or, maybe we don't
even know about the existence of the other information yet) will turn
out to be semantically identical or semantically mappable.  In my own
mind, anyway, the real justification for the existence of businesses
that provide "yellow pages on steroids" in support of internet
commerce is to provide the added value of mapping semantics to each
other in such a way that they can be directly compared, just as Jon
says.  That mapping can be expressed in some proprietary fashion, or
it can be done using SGML documents that inherit from multiple SGML
architectures, or, if XML supports it, it can be done with XML
documents that inherit from multiple XML architectures, with no limit
on the number of XML architectures that can be inherited, and no
limits on the number of architectures that can usefully be fielded by
old and new industries.  If Andrew Layman's much more limited
requirements govern the design of XML, though, XML documents that
represent such semantic mappings will be more costly to create and
maintain.  (I guess you'd have to do it all with hyperlinks.  Anything
can be done with hyperlinks, but that doesn't mean that everything
*should* be done with hyperlinks.  In general, hyperlinks are best
regarded by information managers as a last resort because they cost
more to maintain and their structure is arbitrary and external.  It's
better if the information, in effect, maps itself.  Inheritable SGML
architectures allow information to map itself in complex ways.  Why
shouldn't it be possible to accomplish the same end in XML, without
requiring the use of hyperlinks?)

So, I continue to harp on the importance of allowing a single element
to inherit multiple semantics (and/or the _same_ semantic differently
named or named within different namespaces).  Andrew Layman says, "We
do not require the ability to associate multiple semantic meanings
with a single name."  But, in my own mind, anyway, this really *is* a
requirement for cataloging companies to extract maximum value from
their listings at minimum information management cost in a dynamic,
non-authoritarian market environment.  It would allow internet catalog
providers to map each new DTD into their existing DTDs simply by
tweaking their existing DTDs.  For example, in the DTD for their
catalog of telephone products, when the output wattage issue first
arises (i.e., when a telephone appears on the market that lists an
output wattage), a declaration is added that allows the
characteristics listed in the DTD for the manufacturer's product
description document to be inherited.  In the same declaration, the
features of the product, such as its "colour", can be mapped to the
things that are the same that are already in the DTD, (such as
"color").  The new feature, "outputWattage", can be made to appear
with a default value of "not applicable", so now all the existing
telephone product listings have this feature, and they can all respond
meaningfully (if uninterestingly) to queries about it.  No need to
create and maintain (!) any hyperlinks.  No need to write or maintain
any extra documents.  One change in one place updates all telephone
products listed in the catalog, regardless of how many there are.  The
amount of information stored hardly increases at all, but the value of
the information increases quite a lot.  Essentially the same change
can be applied to the DTDs for stereo systems (now they can have a
redial feature, yes or no), the DTD for copiers, etc.  Cheap and very
powerful, no?  The catalog provider gets to add a terrific amount of
value at very little cost.  New products can be found by consumers
even if they didn't know the hybrid category existed.  ("I want a very
loud telephone.  Hmmm.")  New products for untried niches can be
usefully listed in multiple catalogs.  Innovation is not penalized for
being unanticipated by the authorities who created DTDs for product
listings in various categories, or by the failure to recognize a
viable category.  Indeed, there is no need for such authorities at
all.  There is only a need for catalogers who can read and understand
incoming DTDs and perform these cheap semantic mapping tricks.

You can do all this now with SGML (as of August 1, 1997; see
http://www.ornl.gov/sgml/wg8/document/1920.htm).  The only question is
whether XML will be able to do it.  Maybe it doesn't matter; providers
of internet shopping directories can always maintain their source
information in SGML and simply deliver it in XML form, if they like.
(Or in HTML form, for that matter.)

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Thu Sep 25 22:34:16 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:29 2004
Subject: XML iMarket Project Planning Meeting
Message-ID: <1.5.4.32.19970925200436.01683f9c@pop.mindspring.com>

At 10:14 AM 9/25/97 -0700, ark@DB.Stanford.EDU wrote:

>I certainly agree with your goal, but I don't agree with the means.
>The experience I have is that standards do not work well in this area.
>What we need is an approach that allows the cross-comparison that you
>want, and yet allows for differentiation, experimentation, and
>evolution.

Perhaps the standards could describe architectural forms which would be the
basis for more individual DTDs created by each vendor. This allows searches
to be done for anything in the architectural forms, but still allows each
vendor to have additional information. Because each vendor has a DTD,
documents can still be validated when they are authored, even though they
have vendor-specific information. Because the DTDs are based on common
architectures, searches can be done across vendors.

Jonathan


***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From gannon at commerce.net  Fri Sep 26 00:24:09 1997
From: gannon at commerce.net (Patrick Gannon)
Date: Mon Jun  7 16:58:29 2004
Subject: XML & Catalogs
Message-ID: <01BCC9C3.094BCEA0@sphynx-d105.sierra.net>

Steven,

Nice to hear from someone who "gets it" regarding the impact of XML on future usage & searchability of internet catalogs.

Since this topic has spilled over from the original meeting posting and generated significant interest, I will request a listserv be established for xml-catalog.  This will allow for application oriented discussions of XML that are now related to development (XML-DEV) or EDI (XML-EDI) issues that have their own listserv.

Patrick Gannon


----------
From: 	Steven R. Newcomb[SMTP:srn@techno.com]
Sent: 	Thursday, September 25, 1997 1:08 PM
To: 	Jon.Bosak@eng.sun.com
Subject: 	Re: XML iMarket Project Planning Meeting

[Jon Bosak:]

> What I as a consumer want to be able to do is quite simple.  I want to
> be able to say, "Hey, I need a new jacket," sit down at my computer,
> call up my find-a-product robot, enter my jacket parameters, and then
> come back a while later to find all the jackets that fit those
> parameters offered by all the vendors whose products I'm interested in
> considering.  If the catalog scheme isn't standardized enough to
> support this, then I as a consumer am not interested in using it.  If
> one of the vendors differentiates itself by adopting a scheme of data
> representation that doesn't allow this kind of transparent direct
> comparison, then it differentiates itself right out of the class of
> vendors I'm interested in, because if all it's giving me is the
> ability to cruise its catalog in isolation, I can get the same
> functionality from the printed version; it no longer participates in a
> way that allows the net to add value to me as a consumer.
> 
> I'm not denying that vendors will want to differentiate their
> offerings, but if they can't do it in a way that supports detailed
> direct comparisons based on the differentia that I am interested in
> *as a consumer* then they are simply not in the game at all.

There is a very serious problem here that bears strikingly on an
ongoing discussion in XML-land: the discussion of so-called
"namespaces".  The idea that there will be consortia of vendors, or
any other sort of authority who will determine some list of names of
characteristics of each sort of product, so that characteristics can
be directly and automatically compared, is dangerous to innovation,
competition, and commerce, and it is totally unnecessary, too.  It
will open the door for existing businesses to use such architectures
as weapons against upstarts in niche markets and in unusual or new
market combinations.  Moreover, the use of information architectures
as weapons will always seem like perfectly reasonable business
practices, so it will be nobody's fault when new concepts fail to be
accepted in the marketplace, because the internet failed to live up to
its promise of helping people find what they are looking for and make
informed purchasing decisions.  The macroeconomy will be damaged.

. . . 



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From peat at erols.com  Fri Sep 26 01:17:12 1997
From: peat at erols.com (peat)
Date: Mon Jun  7 16:58:29 2004
Subject: XML & Catalogs
Message-ID: <199709252308.TAA11756@smtp1.erols.com>

Before you do this, we need to ask ourselves, is there or should there be a
significant difference in namespace and other mechanisms depending on use of
the object. Is there that much of a difference on how we describe an article;
say a "red sweater" if the item is in a catalog, stored in an object
repository or exchanged in a Purchase Order? Significant enough to split the
group? 

Let me propose we introduce a collaborative means to keeping the collection
(which is still relatively small) of people together. The XML/EDI Group will
soon have this capability through its subgroups and via a generous donation
from outside corporation.  It should be up and running in a few weeks. Just a
thought, before splintering off the main path.

- Bruce
 

----------
Steven,

Nice to hear from someone who "gets it" regarding the impact of XML on future
usage & searchability of internet catalogs.

Since this topic has spilled over from the original meeting posting and
generated significant interest, I will request a listserv be established for
xml-catalog.  This will allow for application oriented discussions of XML
that are now related to development (XML-DEV) or EDI (XML-EDI) issues that
have their own listserv.

Patrick Gannon


----------
From: 	Steven R. Newcomb[SMTP:srn@techno.com]
Sent: 	Thursday, September 25, 1997 1:08 PM
To: 	Jon.Bosak@eng.sun.com
Subject: 	Re: XML iMarket Project Planning Meeting

[Jon Bosak:]

> What I as a consumer want to be able to do is quite simple.  I want to
> be able to say, "Hey, I need a new jacket," sit down at my computer,
> call up my find-a-product robot, enter my jacket parameters, and then
> come back a while later to find all the jackets that fit those
> parameters offered by all the vendors whose products I'm interested in
> considering.  If the catalog scheme isn't standardized enough to
> support this, then I as a consumer am not interested in using it.  If
> one of the vendors differentiates itself by adopting a scheme of data
> representation that doesn't allow this kind of transparent direct
> comparison, then it differentiates itself right out of the class of
> vendors I'm interested in, because if all it's giving me is the
> ability to cruise its catalog in isolation, I can get the same
> functionality from the printed version; it no longer participates in a
> way that allows the net to add value to me as a consumer.
> 
> I'm not denying that vendors will want to differentiate their
> offerings, but if they can't do it in a way that supports detailed
> direct comparisons based on the differentia that I am interested in
> *as a consumer* then they are simply not in the game at all.

There is a very serious problem here that bears strikingly on an
ongoing discussion in XML-land: the discussion of so-called
"namespaces".  The idea that there will be consortia of vendors, or
any other sort of authority who will determine some list of names of
characteristics of each sort of product, so that characteristics can
be directly and automatically compared, is dangerous to innovation,
competition, and commerce, and it is totally unnecessary, too.  It
will open the door for existing businesses to use such architectures
as weapons against upstarts in niche markets and in unusual or new
market combinations.  Moreover, the use of information architectures
as weapons will always seem like perfectly reasonable business
practices, so it will be nobody's fault when new concepts fail to be
accepted in the marketplace, because the internet failed to live up to
its promise of helping people find what they are looking for and make
informed purchasing decisions.  The macroeconomy will be damaged.

. . . 



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)

----------


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Sep 26 01:22:26 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:29 2004
Subject: XML & Catalogs
Message-ID: <10312@ursus.demon.co.uk>

In message <01BCC9C3.094BCEA0@sphynx-d105.sierra.net> Patrick Gannon writes:
> Steven,
> 
> Nice to hear from someone who "gets it" regarding the impact of XML on 
> future usage & searchability of internet catalogs.
> 
> Since this topic has spilled over from the original meeting posting and 
> generated significant interest, I will request a listserv be established 
> for xml-catalog.  This will allow for application oriented discussions 

I think there is potential confusion in the word 'catalog', because of the
SGML Open Catalog.  Some XML software such as NXP supports such Catalogs,
although at present (I think) it is not formally part of XML.

If possible I would hope that 'XML Catalog' and xml-catalog (if they exist
at all) were reserved for this usage - otherwise there could be a lot of
confusion. 

A general point is the use of the XML-* prefix. Within XML itself it is
reserved (e.g. xml-space, xml-link) and I think we should avoid pre-empting
possible uses of XML-*.  Of course 'XML-DEV' falls into the same trap... :-)

I'm assuming that this is not a request for Henry and me to set up another
listserv, because one is about our limit :-).

	P.

> of XML that are now related to development (XML-DEV) or EDI (XML-EDI) 
> issues that have their own listserv.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From elm at arbortext.com  Fri Sep 26 01:35:54 1997
From: elm at arbortext.com (Eve L. Maler)
Date: Mon Jun  7 16:58:29 2004
Subject: XML iMarket Project Planning Meeting
Message-ID: <3.0.32.19970925193549.00ab5490@village.doctools.com>

(I just posted this directly to xml-dev; if any of the iMarket folks wants
to post this to the original recipients of the thread, be my guest...)

At the Montreal face-to-face XML WG meeting, Eliot Kimber mentioned a cool
idea: Schemas can be in the business of providing synonyms for semantics
published in other schemas.  Schemas can also be in the business of
providing mappings from names to multiple schemas.

Thus, if you want to use your own name for something, you can create a
schema (why not even use AF syntax?) that does nothing but map your name to
the "standard" one or to several "standard" ones.  So my personal schema
can map eve:gazorninplat to both dc:subject and docbook:subject if I want
it to.

This could have some interesting consequences:

  o You could chain schemas as much as necessary to get your desired effect.

  o An interesting market in derivative schemas could develop.

  o XML-only documents wouldn't require full AFDR functionality.

So Jonathan's suggestion below could be seen as a suggestion to create a
base schema using AFDR syntax, which others could use directly, or in
modified form by inserting another schema.

I don't know, maybe all this is obvious to everybody else, but seeing the
problem this way blows my mind.  It makes me think that (ironically?) the
first obvious candidate for "non-DTD" schema syntax is AFDRs.

	Eve

At 04:04 PM 9/25/97 -0400, Jonathan Robie wrote:
>Perhaps the standards could describe architectural forms which would be the
>basis for more individual DTDs created by each vendor. This allows searches
>to be done for anything in the architectural forms, but still allows each
>vendor to have additional information. Because each vendor has a DTD,
>documents can still be validated when they are authored, even though they
>have vendor-specific information. Because the DTDs are based on common
>architectures, searches can be done across vendors.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From srn at techno.com  Fri Sep 26 05:11:18 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:58:29 2004
Subject: Retraction and apology
In-Reply-To: <199709252008.QAA01199@bruno.techno.com> (srn@techno.com)
Message-ID: <199709260306.XAA01444@bruno.techno.com>

Some of you who received the note I sent to you earlier today should
not have received the material written by Andrew Layman that I quoted
and which was previously distributed only within the confines of W3C.
I should not have quoted it in a note that was being publicly
distributed.

In my own (pretty weak) defense: I didn't notice that, for example,
the xml-dev list was in the address list; I merely scanned the list of
addresses it to verify that, in fact, it was a list with a lot of
insiders.  I should have verified that the list contained no
*outsiders*, but I inexplicably failed to do that, blithely assuming
from the list's provenance, insider topic, insider tenor, and
recognizable insider addressees that it was a discussion taking place
within the family.  I should have been more careful; this was
definitely a poor algorithm.

I must ask you folks who were not supposed to see the Layman material
to destroy it and forget it.  Anyway, it's an internal discussion,
and, therefore, you can't know the context.

W3C people: I would not blame you for withdrawing my access to the
discussion.  My blunder has caused some pain, and I regret that.

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From Jon.Bosak at eng.Sun.COM  Fri Sep 26 17:33:38 1997
From: Jon.Bosak at eng.Sun.COM (Jon Bosak)
Date: Mon Jun  7 16:58:29 2004
Subject: XML & Catalogs
In-Reply-To: <01BCC9C3.094BCEA0@sphynx-d105.sierra.net> (message from Patrick Gannon on Thu, 25 Sep 1997 14:55:12 -0700)
Message-ID: <199709261530.IAA13761@boethius.eng.sun.com>

| Since this topic has spilled over from the original meeting posting
| and generated significant interest, I will request a listserv be
| established for xml-catalog.  This will allow for application oriented
| discussions of XML that are now related to development (XML-DEV) or
| EDI (XML-EDI) issues that have their own listserv.

Thanks, Patrick.  Like Steve Newcomb, I didn't notice that this thread
was being copied to xml-dev when I posted to it.  We should start over
on the new list server.

Jon


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From Peter at ursus.demon.co.uk  Fri Sep 26 21:18:34 1997
From: Peter at ursus.demon.co.uk (Peter Murray-Rust)
Date: Mon Jun  7 16:58:29 2004
Subject: Retraction and apology
Message-ID: <10329@ursus.demon.co.uk>

In message <199709260306.XAA01444@bruno.techno.com> "Steven R. Newcomb" writes:
> 
> I must ask you folks who were not supposed to see the Layman material
> to destroy it and forget it.  Anyway, it's an internal discussion,
> and, therefore, you can't know the context.

Mailings to xml-dev are not only posted to subscribers, but also hypermailed.
I have no idea what people or robots copy material from this list, but I expect
that this happens. The messages are stored in a mail box, regenerated into 
hypertext at regular intervals and it isn't feasible to delete messages from
the archive without a great deal of work. The moving finger writes... sorry.

	P.

-- 
Peter Murray-Rust, domestic net connection
Virtual School of Molecular Sciences
http://www.vsms.nottingham.ac.uk/

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From tbray at textuality.com  Sat Sep 27 00:30:46 1997
From: tbray at textuality.com (Tim Bray)
Date: Mon Jun  7 16:58:29 2004
Subject: First XML Book?
Message-ID: <3.0.32.19970926152707.00944510@pop.intergate.bc.ca>

Just got my copy in the mail of "Presenting XML", mostly by Richard Light,
from SamsNet.   400 pages, suffers from being a snapshot of a moving target,
but, I think, a worthy first volume in the soon-to-be-large XML library.
ISBN 1-57521-334-6. -Tim

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From srn at techno.com  Mon Sep 29 04:35:49 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:58:29 2004
Subject: please consider whether
Message-ID: <199709290233.WAA01182@bruno.techno.com>


[Patrick Gannon:]

> Since this topic has spilled over from the original meeting posting
> and generated significant interest, I will request a listserv be
> established for xml-catalog.  This will allow for application
> oriented discussions of XML that are now related to development
> (XML-DEV) or EDI (XML-EDI) issues that have their own listserv.

Patrick -- Here is a note to post on the listserv. -- Steve

**********************************************************************

This note asks those in the online product catalog business to
consider whether they need XML to support SGML Architectures --
multiple architectural inheritance.  (Others may also find it
interesting.)

The designers of XML want to know why multiple architectural
inheritance is a feature that should remain unsupported, at least
temporarily.

If you want to use and benefit from the "SGML Architectures" notion
outlined in my earlier note (attached below), I believe you should now
consider (while you still have an option in the matter) whether you
want to be able to use XML for your company-internal "information
source code" for all the information that is the essence of your
company's value.  An ISO standard alternative, SGML/HyTime, is also
available for that purpose.

On the one hand, SGML/HyTime is one helluva strong set of paradigms,
of which XML and all the things currently present in or planned for
XML (linking, addressing, metadata) are a proper subset.  Together,
these paradigms put the information manager and owner in maximum
control of the cost of creating and maintaining information about
information.

On the other hand, XML will have a wider audience.  XML data will flow
across the internet to an awful lot of users (or so we think, anyway)
who won't have full SGML/HyTime capabilities in their systems any time
soon.

If, because your internal databases are limited in functionality to
the representational power of XML, your internal applications cannot
deliver the cost-cutting power of SGML/HyTime for creating and
maintaining massive amounts of n-dimensional (and n-dimensionally
interrelated) information, maybe that's ok because the potential for
higher code maintenance costs is worth the convenience of being able
to dump copies of sections of your metadata source code directly out
to the internet.  (Somehow the latter doesn't seem to me a very good
business idea, but that's for you to decide.)

You might be able to avoid having to make this decision early by
letting the w3c-xml-sig group know that your business applications
expect to benefit from multiple architectural inheritance a la SGML
Architectures, so you'd like to have XML support SGML Architectures
sooner, rather than later.

I'm not particular about whatever reason you may have for expressing
to the w3c-xml-sig group your interest (if any) in SGML Architectures;
I just think the online product catalog industry should consider doing
so, and very soon indeed.

I've already made clear my own reasons for bringing this issue up in
my earlier note.  For your convenience, I'm attaching it below (sans
some stuff I shouldn't have put in in the first place because it was
from an unpublished W3C discussion about XML).

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA

********************************************************************************

*** Not as originally posted.  Unpublished W3C material has been deleted. ***

Date: Thu, 25 Sep 1997 16:08:44 -0400 
Message-Id: <199709252008.QAA01199@bruno.techno.com>
From: "Steven R. Newcomb" 
To: Jon.Bosak@eng.Sun.COM
CC: ark@DB.Stanford.EDU, gannon@commerce.net, brucek@agentsoft.com,
         btait@mercantec.com, caallen@webmethods.com,
         claire_celeste_carnes@ccm.jf.intel.com, dmarquis@kinetoscope.com,
        f.deschamps@bull.com, harvey@eccnet.eccnet.com, jmt@commerce.net,
        Jon.Bosak@eng.Sun.COM, jonathan@poet.com, jonlewis@cngroup.com,
         marthao@icat.com, Michael.Leventhal@grif.fr, paul@arbortext.com,
        pjordan@microstar.com, ptrevithick@bitstream.com, rcw@commerce.net,
         smith@adobe.com, tbadger@kodak.com, trung@ondisplay.com,
         weld@cs.washington.edu, xml-dev@ic.ac.uk, andrewl@microsoft.com,
         higginsc@lanepowell.com
In-reply-to: <199709251550.IAA13057@boethius.eng.sun.com>
	(Jon.Bosak@eng.Sun.COM)
Subject: Re: XML iMarket Project Planning Meeting

[Jon Bosak:]

> What I as a consumer want to be able to do is quite simple.  I want to
> be able to say, "Hey, I need a new jacket," sit down at my computer,
> call up my find-a-product robot, enter my jacket parameters, and then
> come back a while later to find all the jackets that fit those
> parameters offered by all the vendors whose products I'm interested in
> considering.  If the catalog scheme isn't standardized enough to
> support this, then I as a consumer am not interested in using it.  If
> one of the vendors differentiates itself by adopting a scheme of data
> representation that doesn't allow this kind of transparent direct
> comparison, then it differentiates itself right out of the class of
> vendors I'm interested in, because if all it's giving me is the
> ability to cruise its catalog in isolation, I can get the same
> functionality from the printed version; it no longer participates in a
> way that allows the net to add value to me as a consumer.
> 
> I'm not denying that vendors will want to differentiate their
> offerings, but if they can't do it in a way that supports detailed
> direct comparisons based on the differentia that I am interested in
> *as a consumer* then they are simply not in the game at all.

There is a very serious problem here that bears strikingly on an
ongoing discussion in XML-land: the discussion of so-called
"namespaces".  The idea that there will be consortia of vendors, or
any other sort of authority who will determine some list of names of
characteristics of each sort of product, so that characteristics can
be directly and automatically compared, is dangerous to innovation,
competition, and commerce, and it is totally unnecessary, too.  It
will open the door for existing businesses to use such architectures
as weapons against upstarts in niche markets and in unusual or new
market combinations.  Moreover, the use of information architectures
as weapons will always seem like perfectly reasonable business
practices, so it will be nobody's fault when new concepts fail to be
accepted in the marketplace, because the internet failed to live up to
its promise of helping people find what they are looking for and make
informed purchasing decisions.  The macroeconomy will be damaged.

*** Mr. (or Ms.) X *** (whom I do not know, but would like to) has
laid out a list of requirements for the implementation of namespaces
which, if used as guidance in the development of XML's namespace
features, will create a need for authorities who give "standard" names
to such things as product characteristics.  The concentration of power
in such authorities will hinder innovation, by making it difficult to
compare products regarded as "out of category" for some authority's
set of defined names.

*** [To say that there is no industrial requirement for XML to support
multiple architectural inheritance is to place the design of
XML in conflict] *** with the evolutionary process of defining
and marketing new products.  How will the catalog of everything that
is for sale handle a case where the same product characteristic, or
even the same entire product, arises from multiple industries
simultaneously, and each of those industries already uses its own
authoritative schema?  Will the contents of documents have to be
duplicated and translated so as to conform with multiple schemas, so
that different comparisons can be made?  If so, that will cause much
of the value of making the comparisons in the first place to be lost;
features regarded by authorities as "out of category" will simply
disappear.  Imagine a single device that is a fax machine, a
telephone, a copier, a computer, and a stereo sound system.  Should it
appear in a list of telephones?  Maybe.  Should the output wattage of
its amplifier be listable in a comparison with the output wattage of
other telephones?  Maybe.  Should the people who figure out what are
the interesting characteristics of telephones anticipate that output
wattage may be an important characteristic of telephones?  It's
completely unrealistic to expect those people to anticipate that.
And, yet, it's an interesting and relevant statistic and it may be
important to some consumers.

The ugly truth is that we can't predict whether information that is
now thought to be irrelevant to other information (or, maybe we don't
even know about the existence of the other information yet) will turn
out to be semantically identical or semantically mappable.  In my own
mind, anyway, the real justification for the existence of businesses
that provide "yellow pages on steroids" in support of internet
commerce is to provide the added value of mapping semantics to each
other in such a way that they can be directly compared, just as Jon
says.  That mapping can be expressed in some proprietary fashion, or
it can be done using SGML documents that inherit from multiple SGML
architectures, or, if XML supports it, it can be done with XML
documents that inherit from multiple XML architectures, with no limit
on the number of XML architectures that can be inherited, and no
limits on the number of architectures that can usefully be fielded by
old and new industries.  *** [Without multiple architectural
inheritance, XML documents that represent such semantic mappings will
be more costly to create and maintain.  (I guess you'd have to do it
all with hyperlinks.  Anything can be done with hyperlinks, but that
doesn't mean that everything *should* be done with hyperlinks.  In
general, hyperlinks are best regarded by information managers as a
last resort because they cost more to maintain and their structure is
arbitrary and external.  It's better if the information, in effect,
maps itself.  Inheritable SGML architectures allow information to map
itself in complex ways.  Why shouldn't it be possible to accomplish
the same end in XML, without requiring the use of hyperlinks?)

So, I continue to harp on the importance of allowing a single element
to inherit multiple semantics (and/or the _same_ semantic differently
named or named within different namespaces).  *** [Other opinions
notwithstanding,] *** in my own mind, anyway, this really *is* a
requirement for cataloging companies to extract maximum value from
their listings at minimum information management cost in a dynamic,
non-authoritarian market environment.  It would allow internet catalog
providers to map each new DTD into their existing DTDs simply by
tweaking their existing DTDs.  For example, in the DTD for their
catalog of telephone products, when the output wattage issue first
arises (i.e., when a telephone appears on the market that lists an
output wattage), a declaration is added that allows the
characteristics listed in the DTD for the manufacturer's product
description document to be inherited.  In the same declaration, the
features of the product, such as its "colour", can be mapped to the
things that are the same that are already in the DTD, (such as
"color").  The new feature, "outputWattage", can be made to appear
with a default value of "not applicable", so now all the existing
telephone product listings have this feature, and they can all respond
meaningfully (if uninterestingly) to queries about it.  No need to
create and maintain (!) any hyperlinks.  No need to write or maintain
any extra documents.  One change in one place updates all telephone
products listed in the catalog, regardless of how many there are.  The
amount of information stored hardly increases at all, but the value of
the information increases quite a lot.  Essentially the same change
can be applied to the DTDs for stereo systems (now they can have a
redial feature, yes or no), the DTD for copiers, etc.  Cheap and very
powerful, no?  The catalog provider gets to add a terrific amount of
value at very little cost.  New products can be found by consumers
even if they didn't know the hybrid category existed.  ("I want a very
loud telephone.  Hmmm.")  New products for untried niches can be
usefully listed in multiple catalogs.  Innovation is not penalized for
being unanticipated by the authorities who created DTDs for product
listings in various categories, or by the failure to recognize a
viable category.  Indeed, there is no need for such authorities at
all.  There is only a need for catalogers who can read and understand
incoming DTDs and perform these cheap semantic mapping tricks.

You can do all this now with SGML (as of August 1, 1997; see
http://www.ornl.gov/sgml/wg8/document/1920.htm).  The only question is
whether XML will be able to do it.  Maybe it doesn't matter; providers
of internet shopping directories can always maintain their source
information in SGML and simply deliver it in XML form, if they like.
(Or in HTML form, for that matter.)

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From paul_madsen at qmail.newbridge.com  Mon Sep 29 16:11:21 1997
From: paul_madsen at qmail.newbridge.com (Paul Madsen)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: 

                                          9:31 AM             29/09/97

Hi, I posted this to comp.text.sgml but didn't get much response (thanks J.R.)
_________

The XML-Data specification from Microsoft
(http://www.sil.org/sgml/xml-data9706223.htm) proposes
that the logic traditionally expressed in the DTD (content models, attribute
lists, entity definitions,
etc.) be expressed using the syntax of XML instances instead. 

For instance, instead of the DTD element declaration 

 

the XML-Data scheme rule would be something like 

 
      
 

I'm attracted to the the idea if only because it seems "cool". 

But what does this gain us? What deficiencies with the DTD formalism does it
address? 

Is it the ability to extend object types so that one class of object is a
specialization of another more
general class? 

Do not Architectural forms provide the traditional DTD syntax just that
ability? 

Thanks for any insight. 

Paul 


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From RMcDouga at JetForm.com  Mon Sep 29 16:26:38 1997
From: RMcDouga at JetForm.com (Rob McDougall)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: 

If I remember correctly, the advantages are listed in the spec.  The main 
advantage being that you can include the XML-Data definition within the XML 
file itself, so that you now can send a completely self-describing file 
that can be read by a single (XML) parser.

Rob
=======================================================
Rob McDougall            Phone:  (613)751-4800 ext.5232
JetForm Corporation      Fax:    (613)594-8886
http://www.jetform.com   mailto:rmcdouga@jetform.com
=======================================================

-----Original Message-----
From:	Paul Madsen [SMTP:paul_madsen@qmail.newbridge.com]
Sent:	Monday, September 29, 1997 9:46 AM
To:	XML DEV
Subject:	XML-Data: advantages over DTD syntax?

                                          9:31 AM             29/09/97

Hi, I posted this to comp.text.sgml but didn't get much response (thanks 
J.R.)
_________

The XML-Data specification from Microsoft
(http://www.sil.org/sgml/xml-data9706223.htm) proposes
that the logic traditionally expressed in the DTD (content models, 
attribute
lists, entity definitions,
etc.) be expressed using the syntax of XML instances instead.

For instance, instead of the DTD element declaration



the XML-Data scheme rule would be something like


     


I'm attracted to the the idea if only because it seems "cool".

But what does this gain us? What deficiencies with the DTD formalism does 
it
address?

Is it the ability to extend object types so that one class of object is a
specialization of another more
general class?

Do not Architectural forms provide the traditional DTD syntax just that
ability?

Thanks for any insight.

Paul


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following 
message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From michael at textscience.com  Mon Sep 29 17:41:18 1997
From: michael at textscience.com (Michael Leventhal)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
In-Reply-To: 
Message-ID: <3.0.1.32.19970929080238.0083c5c0@aimnet.com>

At 09:46 AM 9/29/97 -0400, Paul Madsen wrote:
>But what does this gain us? What deficiencies with the DTD formalism does it
>address? 
>
>Is it the ability to extend object types so that one class of object is a
>specialization of another more general class? 

IMHO, this is a strong reason to chuck DTDs as they now exist.  But not
a goal of XML-DATA.

>Do not Architectural forms provide the traditional DTD syntax just that
>ability? 

So say some but not really.

Michael Leventhal

______________________________________________________________________
  Michael Leventhal           Internet  : http://www.grif.com
  G R I F , S. A.             Email     : Michael.Leventhal@grif.fr
  VP, Technology              Telephone : 510-444-2962
  1800 Lake Shore Ave Ste 14  Fax       : 510-444-1672
  Oakland, California  94606  France    : (011) 33 1 30121430 (fr US)
______________________________________________________________________

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Mon Sep 29 17:51:05 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <1.5.4.32.19970929154902.00a202a4@pop.mindspring.com>

XML-Data adds several features that hard-core object oriented folks
appreciate:

1. True inheritance, with semantics more similar to that of OO
languages than indirect mechanisms used to simulate inheritance when
using architectural forms. Architectural forms do not really give us
what OO folks call inheritance.

2. Reflection - the ability to modify the content model at run-time.

3. The syntax for the content model is the same as the syntax for
data, making it easier to write code to manipulate both.

Of course, all existing SGML and XML tools know how to deal with DTDs,
and this is a rather major departure from traditional SGML. It has not
been blessed by any standardization committee. Given the way Microsoft
has approached Java, insisting that it need not implement the portable
libraries everyone else is using, and encouraging people to use their
platform-specific libraries instead, it is easy to wonder what will
happen to the SGML world if Microsoft is in control of an alternative
method of specifying content models.

According to MS representatives, there *will* be tools to transform
XML-Data content models into DTDs, but still, the "real" content model
is in the XML-Data. Is it worth it in order to gain true inheritance
and reflection? For some applications, it may well be. If Microsoft
controls XML-Data, and some vendors support it but others do not, will
we have the same kind of market fragmentation that we have in the Java
world today, where Microsoft is refusing to support the Java standard
libraries, and instead insists that developers should use their own
libraries, which run only on Windows operating systems?

Who knows!

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From gray at interlog.com  Mon Sep 29 18:08:46 1997
From: gray at interlog.com (Graydon Hoare)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
In-Reply-To: 
Message-ID: 


> I'm attracted to the the idea if only because it seems "cool". 

I think the general reasoning behind xml-data and XSL (shiver of horror) 
is that if we settle on a uniform representation for graph-structured data
in transit then we can (soon) live in a world where nobody has to write a
parser for the stuff ever again. I mean, a scheme parser isn't exactly
brain surgery so I'm less inclined to enjoy this argument when used in
favour of XSL, but XSL has other reasons for existing. writing a DTD
parser with architectural forms support is just another stumbling block to
wide deployment of XML, and xml-data nicely circumvents the question. You
can just write an XML parser (in a shoddy one-off proof of concept as many
people are busy writing) and write your validator in terms of the objects
the tried and true parser hands you.  Given that those objects have really
simple property-querying methods, it makes your code better encapsulated,
less likely to mix validating with the parsing of architectural forms.

at least that's the principal advantage I see. 

cool side note: you can use a DSSSL engine to customize an XML-DATA grove
and dump out a new document type ;) or at very least typeset the metadata
in a nice way..

-graydon 
______________________
peccatum poena peccati 



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From peter at techno.com  Mon Sep 29 18:46:19 1997
From: peter at techno.com (Peter Newcomb)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
In-Reply-To: <1.5.4.32.19970929154902.00a202a4@pop.mindspring.com> (message
	from Jonathan Robie on Mon, 29 Sep 1997 11:49:02 -0400)
Message-ID: <199709291643.MAA29767@exocomp.techno.com>

[Jonathan Robie  on Mon, 29 Sep 1997 11:49:02 -0400]
> XML-Data adds several features that hard-core object oriented folks
> appreciate:
> 
> 1. True inheritance, with semantics more similar to that of OO
> languages than indirect mechanisms used to simulate inheritance when
> using architectural forms. Architectural forms do not really give us
> what OO folks call inheritance.

Could you elaborate upon this distinction between architectural form
inheritance and "true OO inheritance"?  What about XML-data makes it
capable of supporting "truer" inheritance than architectural forms?

-peter

--
Peter Newcomb                           TechnoTeacher, Inc.
peter@petes-house.rochester.ny.us       peter@techno.com
http://www.petes-house.rochester.ny.us  http://www.techno.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Mon Sep 29 19:29:20 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <1.5.4.32.19970929172831.0098672c@pop.mindspring.com>

At 12:43 PM 9/29/97 -0400, Peter Newcomb wrote:
>[Jonathan Robie  on Mon, 29 Sep 1997 11:49:02 -0400]
>> XML-Data adds several features that hard-core object oriented folks
>> appreciate:
>> 
>> 1. True inheritance, with semantics more similar to that of OO
>> languages than indirect mechanisms used to simulate inheritance when
>> using architectural forms. Architectural forms do not really give us
>> what OO folks call inheritance.
>
>Could you elaborate upon this distinction between architectural form
>inheritance and "true OO inheritance"?  What about XML-data makes it
>capable of supporting "truer" inheritance than architectural forms?

Let me preface this by saying that I am fairly new to both XML-data and
architectural forms, and I am perfectly willing to be shown wrong on this
statement. Let me explain some properties I see in XML-Data which I have not
yet been able to mirror completely using architectural forms. Since you know
much more about architectural forms than I do, I'll let you tell me if there
is an exact equivalent using architectural forms. In fact, this could be a
great opportunity to do a better comparison than I can do by myself.

In C++, Java, Smalltalk, and other OO languages, if I say that "a duck is an
animal", that means: (1) a duck always has all the data associated with an
animal, (2) a duck has the behavior associated with an animal (unless you
specifically say that a duck does certain things differently), and (3)
references to generic animals can also point to ducks.  To put this in
traditional OO terms, Duck inherits data, behavior, and type from Animal. In
SGML, it can't inherit behavior, but it can inherit data and type.

Microsoft's XML-Data allows me to inherit data and type in a manner very
similar to OO languages. For instance, their description of XML-Data at
their XML standards page gives the following example:


  
    
  

  
    
    
    
  

  
    
    
  

  
    
    
  


Now I can use this type declaration to create an animalFriends element,
which is a list of pets:


  
  
  


So the pet hrefs can point to pets, cats, or dogs.

How would I create this schema using architectural forms?

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From eliot at isogen.com  Mon Sep 29 20:21:12 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <3.0.32.19971129131853.00b5a2c8@swbell.net>

At 01:28 PM 9/29/97 -0400, Jonathan Robie wrote:

>                                                             To put this in
>traditional OO terms, Duck inherits data, behavior, and type from Animal. In
>SGML, it can't inherit behavior, but it can inherit data and type.

In fact, you can inherit behavior if your processor is architecture aware
such that you can write rules that will apply the architecture-specific
behavior in the absense of element-specific behavior.  This could either be
indirectly through object-oriented processors where the implementing
element-specific objects inherit from architecture-specific objects or
explicitly through scripts that embody the architecture derivation rules,
e.g., something like this in DSSSL (here using a 'query' element rule):

(query (case (arch-form-of (current-node) 'myarch')
        (('foo')
         (make paragraph ...))
        (('bar')
         (make sequence ...))))

Behavior is simply processing code associated with types--the only question
is how is the binding done.  With SGML, the binding is [almost] always
loose and indirect and architecture-based binding is just another level of
indirection, similar to, if not identical to, the indirection you get by
inheriting methods from supertypes.

>Microsoft's XML-Data allows me to inherit data and type in a manner very
>similar to OO languages. For instance, their description of XML-Data at
>their XML standards page gives the following example:
>
>
>  
>    
>  
>
>  
>    
>    
>    
>  
>
>  
>    
>    
>  
>
>  
>    
>    
>  
>
>
>Now I can use this type declaration to create an animalFriends element,
>which is a list of pets:
>
>
>  
>  
>  
>
>
>So the pet hrefs can point to pets, cats, or dogs.
>
>How would I create this schema using architectural forms?

I see a one-level schema hierarchy from which the document in the example
is derived:

superclass animalFriends 
   contains pet+
superclass pet
   contains ANY
   attribute owner
   attribute name 

To duplicate this using architectures, I create a meta-DTD that defines the
two supertypes and a document that derives its element types from the
supertypes.  

First the derived document, which declares its derivation from the
architecture (schema):








]>

  
  
  


Now the architectural meta-DTD, which defines the types:








The relationship of the types in the document to the types in the meta-DTD
is clear and machine processible (because of the architecture notation and
meta-DTD entity).  The relationship of the individual elements to their
supertypes is clear, either through the automatic mapping (names in the
document automatically map to the same name in the architecture, e.g.,
'animalFriends' in the document maps to 'animalFriends' in the meta-DTD) or
through the explicit mapping as for the types cat and dog.  The 'extends'
semantic is inherent in architectural derivation.  The architecture conveys
no less information than the example and takes about the same amount of
characters in this case (the verbosity of the XML-Data syntax offset by the
need for the architecture notation and entity declaration in the document).

The architecture approach requires no specialized processors in order to
process the document by architecture-unaware processors and
architecture-aware processing can be added easily through either ad-hoc
means in style sheets or transforms or using more complete architecture
engines (e.g., SP, GroveMinder, etc.).

Note that neither the XML-Data nor the architectural meta-DTD are complete
definitions of the schema--you still need human-understandable definitions
of all the parts (what is a "pet"? What are the rules for pet names? What
are the rules for owner names? What, if any, is the significance of pet
element content? etc.).  You also need to define the expected behavior for
the types in various contexts: formatting, transformation, online display,
etc.  Neither the XML-Data nor the architecture formalism will or can
provide these--they must be provided by other means, mostly
non-standardized and relying heavily on prose to communicate ideas to
humans, not processing to computers.

The only really important part of the schema discussion is how is a schema
associated with its documentation and definitions and how are things
associated with that schema.  That's why the architecture mechanism
requires that you declare the notation for the architecture--that is the
pointer to the authoritative definition of what the architecture rules are.
 The meta-DTD for the architecture is just a convenience that makes it
easier to do processing and validation, but the presence of it doesn't give
you that much and the lack of it doesn't preclude doing architecture-based
processing.  The same will be true of any other formal syntax for defining
the meta-syntax rules for documents.  At least architectures use an
existing syntax that is well understood by all SGML tools.

Given that most XML tools will need to be able to deal with DTDs anyway, I
can see no compelling reason in the short term to define an alternative
syntax for DTDs.  Rethinking how document schemas are created and managed
over the long term needs doing, now doubt, but that is a project that will
take years of careful study and thought and must be done in conjunction
with a major revision to SGML, one in which many different ideas and
requirements can be brought to bear.

In my opinion, none of the name-space requirements and none of the
DTD-editing requirements require a change to existing mechanisms in order
to be satisfied in a reasonable way.  Given that, there can be no good
reason for trying to reinvent the DTD mechanism at this time and trying to
do so is a waste of time that is better spent on more pressing issues.
Certainly people are free to invent whatever document types they want for
representing schemas, but to suggest that any such definition should be
used as standard within XML or SGML is premature, unwise, and unwarranted.
If Microsoft (or anybody else) wants to build tools to support such a
system and see if people will use or buy them, let them do so.  Let the
marketplace decide.  But this is not an area of SGML or XML for which the
standards need to change at this time and we should not attempt to change
them.

Cheers,

E.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ricko at allette.com.au  Mon Sep 29 20:25:09 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <199709291829.EAA23860@jawa.chilli.net.au>


 
> From: Jonathan Robie 
 
> Of course, all existing SGML and XML tools know how to deal with DTDs,
> and this is a rather major departure from traditional SGML. It has not
> been blessed by any standardization committee. Given the way Microsoft
> has approached Java, insisting that it need not implement the portable
> libraries everyone else is using, and encouraging people to use their
> platform-specific libraries instead, it is easy to wonder what will
> happen to the SGML world if Microsoft is in control of an alternative
> method of specifying content models.
 
XML-data would probably fail, that's what.

Because their form of schemas are so complicated and verbose to read
that you will need browsing tools to manipulate them.  This in turn
gives schemas (even though they are written in XML) the nature
of binary objects rather than textual objects.  It seems the weight
of experience is against people making successful schema languages
in non-textual forms.  

For example, Bento and the OpenDoc storage system included API-driven 
routines for decorating cleverly stored objects with all sorts of 
interesting type information, including type conversion, and so it 
can be considered -- in part -- a schema system.  Failing to
have a text form, the thing failed to thrive.  The XML-data 
system does have a text form, but it complicates matters so much by
not having a simple text form (e.g. a separate declaration
syntax) that it seems to be unreadable.

In my view, declarations are actually a kind of processing instruction,
targetted at the parser or entity manager, which also may be of
interest to the application (sorry for using SGML jargon). 
The XML-data view seems to be that they are, more essentially, 
data rather than processing instructions. Tim Bray has said
frequently "metadata is data", to which I would say 
"processing instructions are sometimes data, sometimes not".

Have the XML-data people ever made any requests to ISO for
suggested improvements to the declaration syntax to give
them the functionality they need? (This is unfair really,
since I think XML-data is an experimental system, and 
therefore a good place to generate user requirements for
a less verbose syntax.) Have they proved that
a single-tag language is easier to use than one with multiple
types of tags?  

I am certainly in 100% favour of schema systems and stronger typing
and abstracting interesting information about data into 
header elements. I proposed the SEEALSO parameter in the 
current WebSGML TC specifically to allow richer declarations 
of syntax using any kind of exotic notations including natural 
language, so I am the last person to say that SGML declarations
are enough for all uses.

But I am simply not convinced that XML-data represents a 
usable alternative to the standard declarations (in the
same market), and I think XML-data should not compete 
(or been talked about as competing!) with the standard 
declarations. Their purposes are, I hope, quite
different.



Rick Jelliffe


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ddb at criinc.com  Mon Sep 29 20:45:57 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <3.0.32.19970929114620.009b3590@mailhost.criinc.com>

At 01:28 PM 9/29/97 -0400, Jonathan Robie wrote:
>At 12:43 PM 9/29/97 -0400, Peter Newcomb wrote:
>>> [snip Jonathan Robie's original post]
>>Could you elaborate upon this distinction between architectural form
>>inheritance and "true OO inheritance"?  What about XML-data makes it
>>capable of supporting "truer" inheritance than architectural forms?
>
>[snip]
>In C++, Java, Smalltalk, and other OO languages, if I say that "a duck is an
>animal", that means: (1) a duck always has all the data associated with an
>animal, (2) a duck has the behavior associated with an animal (unless you
>specifically say that a duck does certain things differently), and (3)
>references to generic animals can also point to ducks.  To put this in
>traditional OO terms, Duck inherits data, behavior, and type from Animal. In
>SGML, it can't inherit behavior, but it can inherit data and type.
>[snip]

One thing which Henry Thompson's presentation at HyTime '97 brought forth
in my mind was SGML's lack of support for (3) above.  Architectural forms
do little or nothing to rectify this, although AF could provide a solution
if used in an envirnment which supports simultanious view of the source and
AF instances with links between the two.  Part of the problem is that AF's
do little, if anything to make life easier when I want to build a DTD which
extends an existing DTD.  I have to copy the existing DTD and modify it and
then add the AF meta-info which maps the new DTD back tot he old.  But now
I have a completely different DTD, from the point of view of _all_ existing
SGML software.  Sure I can map my documents to the original, but I can not
see it as both... I must either remove all value added by my modified DTD,
or abandon existing options based on the original DTD, since the new
document is not conforming to the original DTD.  Obviously, since I put the
time into building the new DTD, I think there is some significant value
added, but I can not leverage the value added while at the same time
leveraging the use of the existing DTD as a base architecture.

This is exactly what OO Inheritance allows a programmer to do.  You need
an extra attribute? Easy!  With AF's I either see the document as the new
DTD or I can not see the attribute... value lost either way.

I want to be able to treat it as the original DTD until that special moment
when I can test to see if this has my extended info.. and perform extra
processing based on that...

-derek

     Derek E. Denny-Brown II      ||   ddb@criinc.com
     "Reality is that which,      ||   Seattle, WA USA
  when you stop believing in it,  ||  WWW/SGML/HyTime/XML
 doesn't go away."  -- P. K. Dick || Java/Perl/Scheme/C/C++

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ricko at allette.com.au  Mon Sep 29 21:03:11 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:30 2004
Subject: Animal-friends implemented as a pattern (Re: XML-Data: advantages over DTD syntax?)
Message-ID: <199709291907.FAA24375@jawa.chilli.net.au>


 
> From: Jonathan Robie 
 
> 
>   
>     
>   
> 
>   
>     
>     
>     
>   
> 
>   
>     
>     
>   
> 
>   
>     
>     
>   
> 
> 
> Now I can use this type declaration to create an animalFriends element,
> which is a list of pets:
> 
> 
>   
>   
>   
> 
> 
> So the pet hrefs can point to pets, cats, or dogs.
> 
> How would I create this schema using architectural forms?

And you do not even need architectural forms. Here is a very
simple pattern for doing everything your example does using
a single DTD and standard SGML! (The suffixes "-content"
and "-attributes" are reserved for use in patterns. The
attribute "is-a" is reserved to allow inheritence labelling.)




	



  

	








]>


  
  
  



If you want multiple inhereitance, then you can just 
define a different suffix, and search through attributes
based on that to collect the inheritance tree. I can
provide an example if anyone is interested.

Anyone who is aware of the pattern can see this and implement
it just as easily as they could using XML-data's syntax,
but without breaking SGML compatibility, which generating
new element types outside declarations does.

Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Mon Sep 29 21:07:12 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:30 2004
Subject: Animal-friends implemented as a pattern (Re: XML-Data:
  advantages over DTD syntax?)
Message-ID: <1.5.4.32.19970929190623.00a56820@pop.mindspring.com>

At 05:02 AM 9/30/97 +1000, Rick Jelliffe wrote:
 
>If you want multiple inhereitance, then you can just 
>define a different suffix, and search through attributes
>based on that to collect the inheritance tree. I can
>provide an example if anyone is interested.
 
Please!

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From eliot at isogen.com  Mon Sep 29 21:17:13 1997
From: eliot at isogen.com (W. Eliot Kimber)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <3.0.32.19971129141415.00acab48@swbell.net>

At 11:46 AM 9/29/97 -0700, Derek Denny-Brown wrote:

>>specifically say that a duck does certain things differently), and (3)
>>references to generic animals can also point to ducks.  To put this in
>>traditional OO terms, Duck inherits data, behavior, and type from Animal. In
>>SGML, it can't inherit behavior, but it can inherit data and type.
>>[snip]
>
>One thing which Henry Thompson's presentation at HyTime '97 brought forth
>in my mind was SGML's lack of support for (3) above.  Architectural forms
>do little or nothing to rectify this, although AF could provide a solution
>if used in an envirnment which supports simultanious view of the source and
>AF instances with links between the two.  

I'm not sure I follow you.  If you have an architecture-aware search
engine, then you should be able to do a query of the form "find all
elements derived from the form 'animal'", which will include both 'animal'
elements and 'duck' elements.  How is this not 3?  Or do I misunderstand
Henry's requirement?

Something in the system has to know that a duck is a kind of
animal--architectures convey this information as clearly as any other
method, so I don't see how they can't satisfy the requirement.

>                                          Part of the problem is that AF's
>do little, if anything to make life easier when I want to build a DTD which
>extends an existing DTD.  I have to copy the existing DTD and modify it and
>then add the AF meta-info which maps the new DTD back tot he old.  But now
>I have a completely different DTD, from the point of view of _all_ existing
>SGML software.  Sure I can map my documents to the original, but I can not
>see it as both... I must either remove all value added by my modified DTD,
>or abandon existing options based on the original DTD, since the new
>document is not conforming to the original DTD.  Obviously, since I put the
>time into building the new DTD, I think there is some significant value
>added, but I can not leverage the value added while at the same time
>leveraging the use of the existing DTD as a base architecture.

Again, I don't follow you.  Either you really have a completely new DTD and
you have to define the processing for it completely or you have a DTD
derived from an architecture *and* you have architecture-aware processors
that let you apply the architeture-specific processing to your new
documents, leaving only the new stuff to be defined.  How do architectures
not do this? How would the XML-Data proposal do this any better? In both
cases, it's a function of the processing code both providing the methods
for the base classes and the processing system understanding the derivation
hierarchy.

You can also use the trick of defining the architecture such that its
declarations (and in particular, the parameter entities used to configure
and modularize it) can be also used to create declarations for documents
derived from the architecture.  In essessence you combine architectural
derivation with the sort of clever modularization typified by the TEI and
Docbook declaration sets.

Your comments suggest that you are confusing *parsing* with *processing*.
Parsing is not an issue, because the document is either valid to its DTD or
it isn't, and is either valid with respect the governing schema or isn't.
Whether or not the document is valid doesn't affect how it is *processed*
after parsing, which is purely a function of methods applied to types, not
parsing, and is entirely independent of how the type information got
associated with the data (whether by the architecture syntax or the
interpretation of some XML-Data document).

>This is exactly what OO Inheritance allows a programmer to do.  You need
>an extra attribute? Easy!  With AF's I either see the document as the new
>DTD or I can not see the attribute... value lost either way.

This is only true if you define your processing in terms of architectural
instances derived from documents, but clearly, that is not the way
architectures are intended to be used in the general case.  The
architecture provides part of the processing and an architecture-aware
processor must be able to associate architecture-specific processing with a
document, but it's not an all-or-nothing proposition.  I must always be
aware of the document's architectural nature as well as its base nature
unless the only processing I care about at the moment is that defined by
the architecture.

The XML-Data proposal (to the degree I understand it) and architectures
appear to convey exactly the same information about a schema and a
document's derivation from it.  The fact that the XML-Data syntax appears
to be more "object-oriented" must be a red herring because in both cases
you are providing a purely declarative data description, not the definition
of active methods.  The only way in which XML-Data might appear to be
object-oriented is XML-Data-specific semantics for generating complete
declarations from XML-Data specifications based on implication rules, but
these will either be effectively identical to features in the AFDR syntax,
such as multiple attlists for the same element type, or facilities of
limited utility, such as content model implication (which can be managed
pretty well with parameter entities).  In other words, I don't see that
it's possible for anything like XML-Data to provide significantly more
assistance in creating and managing declaration sets and meta-DTDs than you
already get with the AFDR and normal SGML facilities.

This is why confusing architectures with object-oriented programming
approaches is so dangerous: they are not the same thing and thinking that
they are leads to erroneous conclusions and unrealistic expectations (such
as that content models can be somehow inherited in any but the most trivial
ways).

Note too that when you have DTD-less documents, problems of DTD syntax
munging go away because you don't have any DTD syntax to mung.  Any munging
is managed by the creators of derived schemas.  This is one of the beauties
of XML--it frees us from the need to conflat schema definition with the
definition of the parsing rules for document instances.  

Cheers,

E.

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ricko at allette.com.au  Mon Sep 29 21:53:30 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:30 2004
Subject: Animal-friends implemented as a pattern (Re: XML-Data:advantages over DTD syntax?)
Message-ID: <199709291958.FAA24998@jawa.chilli.net.au>



----------
> From: Jonathan Robie 
> To: ricko@allette.com.au
 
> At 05:02 AM 9/30/97 +1000, Rick Jelliffe wrote:
>  
> >If you want multiple inhereitance, then you can just 
> >define a different suffix, and search through attributes
> >based on that to collect the inheritance tree. I can
> >provide an example if anyone is interested.
>  
> Please!
 
Here is a version which allows multiple inheritance.
(Some parenthesis problems fixed too.)
I have put in even empty attribute values, to make
the pattern uniform in every case, so please do not
confuse this simplicity for elaborateness!

To extract the inheritance tree, collect all attributes
with "-inherit" suffix.  I think the only novel thing
is that people are not used to wildcard searches on 
attribute names, but this is only prejudice.

Also, I think because some tools require precompiled
DTDs, there is a general view in some circles that
DTDs are always compiled, and always made prior
to the generation of the instance. But that is
not intrinsic to SGML.

The PATTERN
-----------

This pattern reserves the suffixes:
	-content	 for a parameter entity with the 
                       element type's contents
	-attributes  for a parameter entity with the 
                       element type's attributes
	-inherit     for a fixed attribute with the 
                       element type's immediate inheritance

The pattern is
	
	
	
	
Where the delimiters {} indicate parameters of the template
which you or your application edit in.  

The EXAMPLE
-----------








   










]>


  
  
  



Please note that I am not saying that this form is always
preferable to using AFs or XML-data.  But it can be done
in XML as it stands now, keeping valid SGML declarations.
And, as has been mentioned, there should be interconversion
possible between the three forms, since they give the
same information.  If XML-data requires the use of specialist
tools to mapulate, since it is so verbose, then this pattern
cannot either be regarded as excessively verbose either, 
since the same kind of tools can be constructed to simplify
creating new objects.


Rick Jelliffe

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From srn at techno.com  Mon Sep 29 22:15:42 1997
From: srn at techno.com (Steven R. Newcomb)
Date: Mon Jun  7 16:58:30 2004
Subject: XML-Data: advantages over DTD syntax?
In-Reply-To: <3.0.1.32.19970929080238.0083c5c0@aimnet.com> (message from
	Michael Leventhal on Mon, 29 Sep 1997 08:02:38 +0200)
Message-ID: <199709291827.OAA01640@bruno.techno.com>

[Paul Madsen:]

> Do not Architectural forms provide the traditional DTD syntax just that
> ability [to extend object types so that one class of object is a
> specialization of another more general class]?

[Michael Leventhal:]

> So say some but not really.

I'm one of those who say so.  How "not really"?

-Steve

--
             Steven R. Newcomb   President
         voice +1 716 271 0796   TechnoTeacher, Inc.
           fax +1 716 271 0129   (courier: 23-2 Clover Park,
      Internet: srn@techno.com    Rochester NY 14618)
           FTP: ftp.techno.com   P.O. Box 23795
    WWW: http://www.techno.com   Rochester, NY 14692-3795 USA


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ddb at criinc.com  Mon Sep 29 22:37:33 1997
From: ddb at criinc.com (Derek Denny-Brown)
Date: Mon Jun  7 16:58:31 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <3.0.32.19970929133641.009a9100@mailhost.criinc.com>

At 02:14 PM 11/29/97 -0600, W. Eliot Kimber wrote:
>At 11:46 AM 9/29/97 -0700, Derek Denny-Brown wrote:
>I'm not sure I follow you.  If you have an architecture-aware search
>engine, then you should be able to do a query of the form "find all
>elements derived from the form 'animal'", which will include both 'animal'
>elements and 'duck' elements.  How is this not 3?  Or do I misunderstand
>Henry's requirement?

This requires a AF aware search engine.  In addition, all current AF
systems can only view the instance as either the source or the AF.  If the
search engine reports where it found the match, it would report it relative
to the AF, not the source document.  As I implied in my original post:
>> although AF could provide a solution if used in
>> an envirnment which supports simultanious view 
>> of the source and AF instances with links between
>> the two.
a number of things start to change when you add an environment wheren you
can easily map back and forth between the two views.

>Again, I don't follow you.  Either you really have a completely new DTD and
>you have to define the processing for it completely or you have a DTD
>derived from an architecture *and* you have architecture-aware processors
>that let you apply the architeture-specific processing to your new
>documents, leaving only the new stuff to be defined.  How do architectures
>not do this? How would the XML-Data proposal do this any better? In both
>cases, it's a function of the processing code both providing the methods
>for the base classes and the processing system understanding the derivation
>hierarchy.

I want to build on tools which assume you are using an existing DTD, say a
custom editor environment. (note: this is not based on a real
implementation, but rather a mental exercise)  From the point of view of
that tool I either am using a new DTD (since I can not have a nice PUBLIC
reference to the "standard" DTD, and the DTD is different in any case,
because I added elements to some content models) or I only give it the AF
and I have lost my value added elements.  I am talking about today and
tomorrow, not next year.  Next year there may be tools which allow better
use of AFs.  I am not in a position where I have enough information to
really know what vendors plan to release next year.  I am in a situation
where if it can not be done today, I can not use it, since my deadlines are
too tight to wait on future releases for most of the software. (note: if
you want grey hair at an early age, this is an excelent recipy.  managers
who do not want their staff to have grey hair should either take note or
buy lots of hair dye...)

I have never said that XML-Data provides anything better, since I do not
know enough about it to even compare it to AFs, which I do have a
reasonable understanding of, I think.

>You can also use the trick of defining the architecture such that its
>declarations (and in particular, the parameter entities used to configure
>and modularize it) can be also used to create declarations for documents
>derived from the architecture.  In essessence you combine architectural
>derivation with the sort of clever modularization typified by the TEI and
>Docbook declaration sets.

This requires that the original be well designed.  A common request, which
is often ignored ;}


>Your comments suggest that you are confusing *parsing* with *processing*.
(Hopefully) no more than current tools force me to  co-relate them.  They
should be seperate, but are more often than not, virtually synonymous.
Groves are setting the stage for a day when parsing and processing are
seperated.  At times I dream of that day, interspersed with my nightmares
imposed by current tools and requirements...

>Parsing is not an issue, because the document is either valid to its DTD or
>it isn't, and is either valid with respect the governing schema or isn't.
>Whether or not the document is valid doesn't affect how it is *processed*
>after parsing, which is purely a function of methods applied to types, not
>parsing, and is entirely independent of how the type information got
>associated with the data (whether by the architecture syntax or the
>interpretation of some XML-Data document).

The problem is that a number of tools/environment define a document's
model/style/environment by the DTD.  If I have a special setup for editing
DocBook documents, that setup needs to make some assumptions on your
instance.  It does not work when I hand it an instance which violate those
assumtions (because it is conformant to a DTD which uses DocBook as a base
architecture, rather than actually being conformant to the DocBook DTD).
If I have access to the source, I could go in and tweak it, but I would
have to do this either specifically for the new DTD or spend the time to
make the environment work with anything which remotely resembles
DocBook....more work than I want.

>>This is exactly what OO Inheritance allows a programmer to do.  You need
>>an extra attribute? Easy!  With AF's I either see the document as the new
>>DTD or I can not see the attribute... value lost either way.
>
>This is only true if you define your processing in terms of architectural
>instances derived from documents, but clearly, that is not the way
>architectures are intended to be used in the general case.  The
>architecture provides part of the processing and an architecture-aware
>processor must be able to associate architecture-specific processing with a
>document, but it's not an all-or-nothing proposition.  I must always be
>aware of the document's architectural nature as well as its base nature
>unless the only processing I care about at the moment is that defined by
>the architecture.

To an extent what I am asking for is an environment where I could build
tools using a traditional OO Inheritence model applied to the SGML AF
model.  A DSSSL Style sheet where I would only have to define rules for new
elements (or changed elements).

>This is why confusing architectures with object-oriented programming
>approaches is so dangerous: they are not the same thing and thinking that
>they are leads to erroneous conclusions and unrealistic expectations (such
>as that content models can be somehow inherited in any but the most trivial
>ways).

I agree that AFs shoud definitely no be equated with OO programming.  I do
see two things which any attempt to equate them does bring out.

1) DTD extension mechanisms which provide for simple type inheritence would
be very usefull.  AFs provide a limited solution, which presents new
difficulties.  This is a problem with SGML.  AFs are an excellent
workaround which stays within the system, and deserve considerable credit
for that.  My reel frustration is with SGML and the limits it imposes, not
AFs.

2) Tools which allow OOP inheritence style defaulting behaviour for
processing of elements based on element-type, architectural type.. AFs may
not map to OOP but they make OOP based processing tools easier...

>Note too that when you have DTD-less documents, problems of DTD syntax
>munging go away because you don't have any DTD syntax to mung.  Any munging
>is managed by the creators of derived schemas.  This is one of the beauties
>of XML--it frees us from the need to conflat schema definition with the
>definition of the parsing rules for document instances.  

But this puts added burden on the tools since all bets are off as to what
the structure looks like.  AF at least provide a set mechanism for mapping
to a known structure.

-derek

     Derek E. Denny-Brown II      ||   ddb@criinc.com
     "Reality is that which,      ||   Seattle, WA USA
  when you stop believing in it,  ||  WWW/SGML/HyTime/XML
 doesn't go away."  -- P. K. Dick || Java/Perl/Scheme/C/C++

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From digitome at iol.ie  Tue Sep 30 00:26:19 1997
From: digitome at iol.ie (Sean Mc Grath)
Date: Mon Jun  7 16:58:31 2004
Subject: XML-Data: advantages over DTD syntax?
Message-ID: <199709292226.XAA06786@GPO.iol.ie>

[Rick Jelliffe]
>
>Because their form of schemas are so complicated and verbose to read
>that you will need browsing tools to manipulate them.  This in turn
>gives schemas (even though they are written in XML) the nature
>of binary objects rather than textual objects.
>
A good point. I have fond memories of being able to understand Make
files for example! These days, with "advanced" tools they are still
"text only" they are pretty impenetrable and effectively locked in to
particular tools:-(

On the other hand, in the specific case of XML-Data I would have to say
I am in favour. DTDs are prefectly good "documents".  XML's reputation as a
meta-language is, I think,  positively served by its use to describe "itself" in
this way.

The approach obviously has its practical limits though. The further one gets
from
"data" the closer one gets towards "algorithm" -  the less *practical* a tagged
 representation becomes. Full scale Scheme would be pretty inpenetrable in
XML but it would be possible! The fact that it is entirely possible is the
important thing. It means (doesn't it????) that  XML can be viewed as the
bed-rock on which all the other required syntactic "short hands" can be based.

So XML could have 8879 DTDs. It could also have a DTD for 8879 DTDs.
Core XML could interpret the latter directly, supporting the 8879 syntax via
a transformation. Future syntaxes, methods etc.; for achieving what 8879 DTDs
achieve could then be cleanly layered on top.



Sean Mc Grath

sean@digitome.com
Digitome Electronic Publishing
http://www.digitome.com


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ricko at allette.com.au  Tue Sep 30 06:56:20 1997
From: ricko at allette.com.au (Rick Jelliffe)
Date: Mon Jun  7 16:58:31 2004
Subject: revised Animal-friends implemented as a pattern (Re: XML-Data:advantages over DTD syntax?)
Message-ID: <199709300500.PAA07205@jawa.chilli.net.au>

Someone has pointed out that the colonized syntax would be
approporiate and clearer.  Here it is again (sorry!) with
colons.  (I have also cleaned up the inheritance to bundle
things more, so please delete previous version.)

Actually, this following fragment is illegal, because 
you cannot use ANY inside a content model. I am not sure how
to read the XML-data format here, but I think this exposes
a flaw in their example:  if pet can contain any subelements,
what use is it to say it can also contain a kitten subelement?
Duplicate paths are a little worrying, if that what they
have done.

If it were desired to use ANY in this way (i.e. different
to how SGML uses it), then it could be coped with by
parametising includes and excludes in a similar fashion.
(Again I can provide example if needed, but I hope not.)

----------
> From: Jonathan Robie 
> To: ricko@allette.com.au
 
> At 05:02 AM 9/30/97 +1000, Rick Jelliffe wrote:
>  
> >If you want multiple inhereitance, then you can just 
> >define a different suffix, and search through attributes
> >based on that to collect the inheritance tree. I can
> >provide an example if anyone is interested.
>  
> Please!
 
Here is a version which allows multiple inheritance.
(Some parenthesis problems fixed too.)
I have put in even empty attribute values, to make
the pattern uniform in every case, so please do not
confuse this simplicity for elaborateness!

To extract the inheritance tree, collect all attributes
with ":inherit" suffix.  I think the only novel thing
is that people are not used to wildcard searches on 
attribute names, but this is only prejudice.

Also, I think because some tools require precompiled
DTDs, there is a general view in some circles that
DTDs are always compiled, and always made prior
to the generation of the instance. But that is
not intrinsic to SGML.

The PATTERN
-----------

This pattern reserves the suffixes:
	contents	 for a parameter entity with the 
                       element type's contents
	attributes  for a parameter entity with the 
                       element type's attributes
	inherit     for a fixed attribute with the 
                       element type's immediate inheritance

The pattern is
	
	
	
	
Where the delimiters {} indicate parameters of the template
which you or your application edit in.  

The EXAMPLE
-----------








   










]>


  
  
  



Please note that I am not saying that this form is always
preferable to using AFs or XML-data.  But it can be done
in XML as it stands now, keeping valid SGML declarations.
And, as has been mentioned, there should be interconversion
possible between the three forms, since they give the
same information.  If XML-data requires the use of specialist
tools to mapulate, since it is so verbose, then this pattern
cannot either be regarded as excessively verbose either, 
since the same kind of tools can be constructed to simplify
creating new objects.


Rick Jelliffe

 

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From ht at cogsci.ed.ac.uk  Tue Sep 30 10:35:37 1997
From: ht at cogsci.ed.ac.uk (Henry S. Thompson)
Date: Mon Jun  7 16:58:31 2004
Subject: Animal-friends implemented as a pattern (Re: XML-Data:advantages over DTD syntax?)
In-Reply-To: "Rick Jelliffe"'s message of Tue, 30 Sep 1997 05:54:19 +1000
References: <199709291958.FAA24998@jawa.chilli.net.au>
Message-ID: <715.199709300835@grogan.cogsci.ed.ac.uk>

Note that as written Rick's solution lacks a feature of the XML-Data
proposal, namely that e.g. in the internal subset I can add a new
declaration


  
  


and non-intrusively extend the content model of animal-friends.  To
cover this Rick's solution would need place-holding empty parameter
entities in most of his existing entities, e.g.



[Note this is not valid XML, I don't think]

This I think completes the reductio -- the point is not that you can
do things with schemata that you can't do in XML, but that you can do
them in ways which are vastly more transparent and maintainable.  Just
because we CAN write all logical formulae using only Shaeffer stroke
and constants doesn't mean we SHOULD do so.  Occam didn't say "Don't
proliferate", he said "Don't proliferate beyond necessity".

Note also that I argued at the XML day in Montreal that to avoid the
dangers of multiple incompatible approaches to schemata, we should
always provide a semantics in terms of vanilla XML, which is how I'd
describe what Rick has shown is possible!

ht

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From jwrobie at mindspring.com  Tue Sep 30 13:16:16 1997
From: jwrobie at mindspring.com (Jonathan Robie)
Date: Mon Jun  7 16:58:31 2004
Subject: Animal-friends implemented as a pattern (Re:
  XML-Data:advantages over DTD syntax?)
Message-ID: <1.5.4.32.19970930111016.009ead94@pop.mindspring.com>

At 09:35 AM 9/30/97 BST, Henry S. Thompson wrote:

So now we have all the players!

Henry, could I ask you to list all the main advantages you see for XML-Data
over XML with architectural forms? Yesterday's traffic makes me think that
this would be a great place to discuss the issues in some depth. One side of
the debate seems to say that XML-Data adds no new functionality, and the
other says that it adds significant new functionality. At this point, I am
not convinced that I know enough to say one way or another.

Jonathan

***************************************************************************
Jonathan Robie   jwrobie@mindspring.com  http://www.mindspring.com/~jwrobie
POET Software, 3207 Gibson Road, Durham, N.C., 27703    http://www.poet.com
***************************************************************************


xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)


From zwang at pstat.ucsb.edu  Tue Sep 30 21:25:55 1997
From: zwang at pstat.ucsb.edu (Zheng Wang)
Date: Mon Jun  7 16:58:31 2004
Subject: msxml contentmodel
Message-ID: 

 Hello,

 We are trying to write an editor application that uses XML via the
 MSXML parser. What we plan to do is to let the editor read the DTD and
 then provide users with an interactive environment that they use to
 fill out the content of the xml document. 

 The problem we have is that MSXML does not provide access to the
 content model of the DTD through the Document class. The API it
 provides is mainly through the Document class. We are not sure whether
 Microsoft intended that the interface to the DTD content model not be
 available (directly or indirectly) to the application. Could anyone
 shed light on how to use MSXML to access the DTD content model, or
 does anyone know if some of the other parsers (e.g., NXP, LARK) 
 provide an interface to the DTD content model?  Also, how does this
 relate to SGML groves as I have seen discussed on XML-DEV at various
 times? 

 Thanks

 Zheng and Matt,
 NCEAS, UCSB



xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk
Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/
To (un)subscribe, mailto:majordomo@ic.ac.uk the following message;
(un)subscribe xml-dev
To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message;
subscribe xml-dev-digest
List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)