My Book

From markus.wodrich at bmw.de Mon Dec 1 00:39:11 1997 From: markus.wodrich at bmw.de (Markus Wodrich) Date: Mon Jun 7 16:59:12 2004 Subject: AW: Entities vs #PCDATA with msxml 1.6 ? Message-ID: <01BCFDF9.B74A11A0@WODRICM1> -----Urspr?ngliche Nachricht----- Von: Patrice.Bonhomme@loria.fr [SMTP:Patrice.Bonhomme@loria.fr] Gesendet am: Sonntag, 30. November 1997 17:57 An: xml-dev@ic.ac.uk Betreff: Entities vs #PCDATA with msxml 1.6 ? Hi, I have a problem with msxml 1.6. If i put only one entity within an element, this element must be able to contain some PCDATA because msxml considers an entity as a piece of PCDATA ! But if i have: A third in a new paragraph."> ]>

~~A sentence.An another.~~

&incs;

I get this message: % java msxml2 -i -d test-ext-ent.xml Invalid element 'PCDATA' in content of 'P'. Expected [S] Location: file:test-ext-ent.xml(12,5) Context:

~~A sentence.An another.~~

&incs;

Is there something broken in the msxml kingdom ? Pat. -- ============================================================== bonhomme@loria.fr | Office : B.228 http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 -------------------------------------------------------------- * Projet Aquarelle : http://aqua.inria.fr * Serveur Silfide : http://www.loria.fr/Projet/Silfide ============================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Mon Dec 1 01:01:07 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:12 2004 Subject: Well-formedness checker available Message-ID: <3482061D.ECF5ECD5@jclark.com> I've enhanced my XML tokenizer to support multiple encodings and to provide enough functionality that it can be used as the basis of high performance full XML processors. As a proof of this, I've written a well-formedness checker (xmlwf) on top of the tokenizer. The main design goal was performance. On my portable (a 133Mhz Pentium running Windows NT), it can check Jon's 3.7Mb ot.xml file in about 0.5sec (this compares to about 8sec for nsgmlsu and about 2sec for RXP on the same system). It seems to be about 15% slower than the original tokenizer. On the other hand, the size of the source and object code has increased a lot. The source has also got a lot hairier. The source code (in ANSI C) and Win32 binaries are available at: ftp://ftp.jclark.com/pub/test/xmltok.zip This is an alpha release. The only documentation is what you're reading now. To use the well-formedness checker, just give xmlwf one or more filenames, and it will check that each one is a well-formed XML document entity. There's a -g option which tells it to check instead that each file is a well-formed XML external general text entity. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Mon Dec 1 01:01:39 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:12 2004 Subject: Test cases available Message-ID: <34820B05.173C152F@jclark.com> I've made available a collection of XML test cases at ftp://ftp.jclark.com/pub/test/xmltest.zip This contains 141 small files that (in my view) fail to be well-formed XML documents, and should therefore cause any conforming XML processor to report a fatal error. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Mon Dec 1 03:55:58 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:12 2004 Subject: Revelling parser writers (was Rebelling) Message-ID: <34822CE5.C30933B9@technologist.com> > Some people seem to use 'processor' to mean an XML parser. Others > seem to use 'processor' as a piece of software 'after' the parser. I do not think that the latter people have a basis in the XML standard. > I think some > people use 'parser' to mean a piece of software that reads in an XML > document (and associated components and transforms them into some > other information structure or sets of actions. the 'Parsers' at > present appear to be able to emit event Streams and/or build trees. I think that most software developers would build trees *from* the event stream. This separation allows you to plug in another parser (reader/event generator) without changing your tree-building software. Maybe I'm just extrapolating incorrectly from SP's design and my design of my own systems. In Jade, there is a parser (SP) that outputs events that are read by a grovebuilder (GroveBuilder.cxx) that serves as the source grove for a DSSSL process. My PyGrove uses the same system. > >Building a grove is not the job of a > ^^^^^^^^^^^^^^^^^ > >parser. Typically the parser outputs the events and some other process > >builds the grove from the information. The only way a parser could be > >not written to create groves is if the parser did not output sufficient > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Is there a difference between 'build' and 'create'? I don't understand how > a parser can 'not build a grove' and 'be not written to create groves'. That tortuous prose is my attempt to integrate your text about parsers being "not written to create a grove." The only way I could imagine a parser being unfit to create a grove is if it did not output enough information for the grovebuilder to do so. > Earlier on XML-DEV we discussed at length what the API to a 'parser' (or > was it a 'processor') was. I thought that this could have included building > a grove. I think that the grovebuilder would be a *client* of the parser API. Then it could build groves from (e.g.) XML or full SGML or even something else, as long as the various parsers exported the same API. > If I rephrase my statement as 'no-one has written any XML-based software > which interfaces with the current crop of (mainly java-based) parsers to > generate groves'. This statement makes more sense to me than your previous one. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Mon Dec 1 06:02:01 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:12 2004 Subject: Response to Simon St.L. on Entities v. XLL Message-ID: <199712010601.RAA09705@jawa.chilli.net.au> > From: Peter Murray-Rust > XML(SGML) entities (NOTATION) have traditionally used PUBLIC and FPIs > (Formal Public Identifier) for adding type information. This works if there > is a registry of FPIs for this purpose. Without it is not much use. (Peter Flynn had such a registry of FPIs this year.) > My > impression - and I'm happy to be corrected - is that there are few useful > FPIs for Typing objects. ... > As yet, MIME is not part of the XLL mechanism. I wish it was, and keep > squeaking for it. If it isn't I suggest we use XDEV:MIME as a FUA > 'frequently used attribute' in XML-LINKs. You can make up your own FPIs *now* for all MIME types using the following pattern. The important thing about an FPI is that it does not have to be syntactically correct to work, unlike a system identifier. OF course, getting an agreed on form will be best. It is interesting to note that I found it very difficult to find the official site for the RFC. There is nothing I could find at IETF, IANA, Internet Sosiety, and searching on websites did not help. For people in this situation, they can use FPIs with, for example: FPIs are a great idea because they do not have to correspond to anything in a fixed location: they can be descriptive. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Mon Dec 1 07:51:31 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:12 2004 Subject: MONDO: Partitioned Design document and New Addition Message-ID: I fixed the MONDO Design document to have a central web page that allows you to look at the TOC and download subsets of the document. The page is at: http://www.chimu.com/projects/mondo/design/index.html All the links are now directed here but the direct "mondoDesign.pdf" file is in the same location and still works. The PDF file has also been broken into smaller subsets of approximately a chapter or two. Each of the subsets is ~70K instead of the full 400K document. My apologies for not doing this the first time, especially to anyone who had problems with downloading a single large file. The Design Document web page also contains an "additions" section which will have new document sections that have not yet been integrated into the main version. This is to keep the main document from having Chapters changing every few days and to make new additions more visible. The newest addition and its first paragraph is: Modeling and Implementing, Objects and Recipes ---------------------------------------------- Recipes describe how to build knowledge though creating objects. An important aspect to working with knowledge is to be able to model it. So far, we have assumed the model of our knowledge preexists and only exists in the ObjectBase's DomainModel. Our other choice is to explicitly describe the Model outside of the ObjectBase and then configure the ObjectBase based on that model. With this approach, information will describe its own model (or models) and we will be provided with a lot more capabilities to automatically and universally understand that information. --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Mon Dec 1 10:37:53 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:12 2004 Subject: The MONDO Approach: Introduction Message-ID: I realize the MONDO Design document is a bit difficult to digest in toto, so I thought I might try to produce short examples of MONDO's approach to particular problems that have been brought up on XML-Dev, c.t.sgml, JXML, java forums, or other related areas. This may help people to see where MONDO is different, useful, or flawed compared to other approaches -- and to have a particular topic to comment on instead of a whole (approaching 100 page) document. The most important word in the above paragraph was "short". I will try to be very brief: 1/2 - 3 pages. I have difficulty with this type of brevity (i.e. I hate leaving out details), but I will try very hard and I do have another outlet for more details: the Design Document and its additions. This brevity means that the approach statements will not really explain anything in detail, especially not the whys. This does not mean I think the problems are trivial or the solutions easy to understand on their own (MONDO is simple at its core but complex in its implications). The fuller description of the problem will come from previous or subsequent discussions, and the MONDO solution is (or will be) more fully explained in the Design Document, the interfaces, or the code. The brevity and the "emailness" of these approach statements also ensures I will not include any diagrams. I love diagrams and I think I produce pretty informative ones. Please look at the relevant (usually referenced) portion of the Design document to check for diagrams that may help explain how MONDO is thinking. Most of the approach statements will be pattern-ish. A Title, A Problem, An Approach, and Tradeoffs/Comments. Because the statements are so short they will not really be patterns (and certainly not good ones), but I thought I would mention the structure. I was planning on posting all of these to XML-Dev & JXML, and some of them to advanced-java. I am currently undecided about c.t.sgml. If anyone has suggestions about this ("not here" or "maybe there") let me know. ================== MONDO is a general architecture for encoding, modeling, and processing information. MONDO is especially designed for building information from human-readable text files and then doing sophisticated interactions with that information. Its first reference implementation is in Java, which will be released shortly. More information about MONDO can be found at the main WWW site: http://www.chimu.com/projects/mondo/ --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Mon Dec 1 11:15:20 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:12 2004 Subject: Data warehousing and XML Message-ID: <199712011115.LAA25449@mail.iol.ie> I have read a number of articles about Data Warehousing and I *think* I know what it is but I have yet to come accross any technical info about how to implement in. On the face of it though, it looks like an interesting potential app. for XML. As I (mis)understand it, you shovel all your corporate data from a variety of sources (sales, purchasing, production, memos, R&D etc.) into one humongous repository of data with a view to asking the seething mass of data questions that benefit from the totality of information in the repository. Prior to putting the stuff there it is "cleaned up". Presumably harmonised into a homogenous format of some format. Sure sounds like XML + related standards (specifically SDQL) to me. Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Mon Dec 1 12:21:41 1997 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 16:59:12 2004 Subject: Data warehousing and XML Message-ID: <01bcfe53$7a24bd20$1e09e391@mhklaptop.bra01.icl.co.uk> -----Original Message----- From: Sean Mc Grath To: xml-dev@ic.ac.uk Date: 01 December 1997 11:16 Subject: Data warehousing and XML >I have read a number of articles about Data Warehousing... >it looks like an interesting potential app. for XML. > >As I (mis)understand it, you shovel all your corporate data from a variety >of sources (sales, purchasing, >production, memos, R&D etc.) into one humongous repository of data ... Firstly, I think XML has some work to do if it is to acquire acceptance in the database community: in particular, someone needs to show how its underlying data model relates to models like UML used in the database world; one would also like to see how a DTD can be translated to/from an ODL schema. The fact that XML uses terms like "entity" and "attribute" with completely different meanings from UML or ODMG doesn't help. Secondly, I think the "humongous repository" concept in data warehousing (sometimes ridiculed as the "data whorehouse") is going out of fashion. The modern approach is usually much more focused. In fact, the data warehouse concept has never really embraced documentary information like memos or research reports: it's all about old-fashioned "data". I do agree that in principle XML provides a good representation of data that is in transit between heterogeneous databases. One drawback is that it provides far more features than are required for this purpose, so people may go for simpler encodings. Mike Kay, ICL M.H.Kay@eng.icl.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Mon Dec 1 13:56:48 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:12 2004 Subject: Data warehousing and XML References: <199712011115.LAA25449@mail.iol.ie> Message-ID: <3482C2EE.3F606F14@technologist.com> This is probably more Sean Mc Grath wrote: > Prior to putting the stuff there it is "cleaned up". Presumably > harmonised into a homogenous format of some format. Database people typically do not store their information in any explicit format. The database handles the representation. Data warehouses are the same. I don't think that data warehouses are any more or less amenable to XML than any other relational database. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Mon Dec 1 14:17:13 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:13 2004 Subject: The MONDO Approach to: Describing the Model of Information Message-ID: The MONDO Approach to: Describing the Model of Information Problem ======= How do we describe the model we are using for our information? Models always exist. In MONDO, the information model is represented by the DomainModel and must exist (no matter how simple) for any ObjectBase. Their is always an implicit model. We can also provide a way to explicitly describe models and have MONDO use those models to understand information better and to validate whether it is being constructed (i.e. by recipes) correctly. Explicity models can also provide a common human-oriented exchange format that is known to be MONDO understandable and verifiable (i.e. it at least makes sense to MONDO). Forces ====== Relying on implicit models provides flexibility in describing and implementing the model, but directly affords no common description and reuse. Providing explicit models makes sharing models easier and allows the information to describe itself, but could be limiting in how information is used. If we do use explicit models we have the choice between using the same form as all other information (i.e. recipes) or a different form that is designed especially for models. MONDO Approach ============== MONDO allows both implicit and explicit modeling of information depending on what the producer of the information wants to describe and what the consumer of the information would like to use. Explicit models are marked up in the same format (i.e. recipes) as all other types of information and the resulting model is simply an organized set of objects that describe another set of objects (the instances of that model). An example model for: end = > might look like: > > > > ) constructors = ( ) > > > ) constructors = ( ) > )> Note that the model does not describe implementation in any way, just the expected Types, properties, associations (non above), behavior (e.g. constructors), and other externally visible features of an object. To associate a model with an instance we are just relating objects. We can do it explicitly (and singularly) in the same recipe file: //... Model> UseModel> end = > In two files but still with a single default interpretation: end = > Or in three+ files which allows multiple interpretations of the model to use with the recipe: Because models are just objects we can also retrieve them by reference instead of direct recipe construction: > > MONDO supports Models as simply the same as any other type of information: objects. The only difference is their role toward other objects. Benefits & Penalties ==================== Allowing both implicit and explicit models provides flexibility. The only tradeoff that can occur is that people assume an implicit model is OK when it would be better to make the model explicit. Other forces than technology should drive this choice. There are very few drawbacks and a great number of benefits by having the model in the same format as all other information. It allows the two core concepts (recipes and objects) to be leverage to understanding new facilities. The new facilities can benefit from all the functionality of objects and recipes (e.g. references, encoding formats, type vs. class separation, properties and all other normal object abilities). And because we have complete closure we can then implement and model the model itself in the exact same terms (recipes and objects). Because the models are objects, the models can be arbitrarily sophisticated and take advantage of subtyping. New modeling refinements can be extension of existing techniques. This avoids closed-end modeling limitations (e.g. DTDs) while still having backward capabilities. Also, the model is for the resulting DomainObjects not the recipe itself (or the parser) so it does not need to worry about, and will not constrain, irrelevant details like the actual names of recipes (e.g. ""). The model only cares about the Types of the resulting DomainObjects that are built by the recipe. Finally, the encoding can be the same for the model as for the objects. This is important on a conceptual level (models are really, really the same things but just have a special role) and on a lower level: users only have to understand a single encoding (if they chose) and parsers can be very simple. The only drawback might be difficulty in encoding the model in the standard MONDO recipe encoding formats. Generally this is probably not a drawback. Recipes allow flexibility that can be very useful for modeling and the encoding formats can be quite concise (plus they are inherently self-describing which is helpful for learning them). See === The MONDO Design addition on "Modeling and Implementing, Objects and Recipes". http://www.chimu.com/projects/mondo/design/index.html#additions Classes-as-objects is part of Smalltalk, CLOS, and (in some form) many other interesting languages (e.g. SELF). Generally this meta-object capability provides a great deal of power and relative simplicity. For some references, see the OO sections of: http://www.chimu.com/projects/mondo/links.html SGML DTDs are somewhat the other extreme as MONDO models in encoding (very limited), but they can still be treated as creating document-oriented Model objects through a different encoding format. [How SGML treats DTDs is quite different (they constrain the parser, recipe, and the model stage) but that is a different topic.] The XML-Data model has similarities to how MONDO works with models. And the approach of XML-Data and representing models as instances was discussed in: http://www.lists.ic.ac.uk/hypermail/xml-dev/ (Search for XML-Data and/or DTD) --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Mon Dec 1 14:20:39 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:13 2004 Subject: The MONDO Approach to: Extending Model Functionality Message-ID: The MONDO Approach to: Extending Model Functionality Problem ======= How can we extend the functionality of our information and information model without becoming language specific? Although information can be interpreted and implemented in many ways, frequently we will want to provide possible implementations so applications can automatically extend their capabilities in interesting new ways. Forces ====== If we put the implementation into the information we will make the information less general. If we provide no implementation (when we have one) we make the information less knowledgeable and capable than we could have. MONDO Approach ============== Describe implementation details in the same knowledge form as all our other information and loosely associate/link Classes to Types through possible Implementations. We can represent a Java class as (the MONDO recipe in OML): > This is readable to both Java and non-Java systems. A non-Java system may not understand the bytecodes, but it can understand everything else and work with the information usefully. Next we can associate this class with a particular Type in our model. language = "Java" class = > The loose association is (relatively) complete, and a particular program can decide whether it can use and wants to use a Java implementation of the Type Period. It can also check whether the VM level is acceptable. We can similarly provide a Smalltalk or ".dll" implementation (assuming we can move the ".dll" around). None of this had any effect on our original instance and model: end = > > > ) constructors = ( ) > //... )> So they still describe "pure" general knowledge and we can still use them independently of all the language-localized implementations. Tradeoffs ========= Generally, it is a win-win situation. Implementation can be reasoned about and chosen without directly coupling it into the information itself. The architecture itself is also no more complicated, but only has new objects and classes to represent implementation information. A negative might be the added complexity of the: instance--[interpretation]--model--[implementation]--classes associations, but the complexity can be selected as needed. Another negative might be the required based functionality of an ObjectBuilder, but generally ObjectBuilders must be able to simply model (e.g. as an object with properties) anything they can not understand in more detail. See === The MONDO Design addition on "Modeling and Implementing, Objects and Recipes". http://www.chimu.com/projects/mondo/design/index.html#additions --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Mon Dec 1 14:23:17 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:13 2004 Subject: The MONDO Approach to: Language Independent Object Serialization Message-ID: The MONDO Approach to: Language Independent Object Serialization Problem ======= How do we serialize Objects so we can later read them back into a program of either the same language or a different language? Also, how do we allow humans to easily create, read, and modify these objects (i.e. include human languages)? It is common to need a simple way to save a web of objects and later read them back in to the same program or a different one. Sometimes the programs will be of the same language (e.g. both Java) and sometimes they will be different. The later is much more complicated, especially in its most general form (any object to any language). A variation on the inter-language movement is from human writable language (through text) to a computer language. Usually this is restricted to very simple information (e.g. String properties) or document-oriented information (e.g. HTML/SGML). Tradeoffs ========= If we do not have a single interchange format we will have multiple ones and the complexity will be higher. If we can not describe objects in a cross-language format than languages will be unable to interoperate with this mechanism. If we can not describe object information in human-readable formats than humans will be less likely to understand the process and will be unable to participate in the general capabilities (e.g. they can only work with simple property files). A single, general, object interchange approach would allow all movements of objects to be easier to both computers and people. On the other hand, if we try to design a general approach that becomes too cumbersome it will not be useful to the many common needs of applications (i.e. same language serialization and simple property files). MONDO Approach ============== Encode information as "recipes" to build objects. Describe the most general information first: what to build and what "ingredients" (recipes for other objects) it needs. Next describe the language independent model of that information. Finally describe the possible implementations for that model in different languages. Any of these steps (other than the first) can be left off but it results in less ability to move between languages. Also, enable "recipes" to be easily convertible to a human readable and writeable form: usually as marked-up text files in any one of XML/SGML/OML (the last being oriented to objects and MONDO, an Object Markup Language similar to XML). Some simple examples of recipes (in OML and no models yet) are: ---------------------- end = > ---------------------- )> ---------------------- > ---------------------- > ) !Recipe> > > > //... > ----------------------

, the company's President. The following is a summary written by Luke on what he views as the key elements to VTL's success. } P> ======================= All of the above, when by themselves, rely on the reading and "building" application to interpret and implement the information model. This might be suitable for language specific encoding of information or when the model is very standard. But we can also explicitly add model information, for example: > > ) constructors = ( ) > //... )> And loosely[1] link it to the actual information: end = > Alternatively we can use well-known models, which allow wider interchange without moving recipes: > > > > ================ The next step is to be able to connect specific implementations to the models, but this is covered in a different MONDO approach statement: "Extending Model Functionality" Benefits ======== We have a very general way to encode information so multiple applications, programming environments, and people can understand it. It is simple to parse and process for a computer and for a person. We have also cleanly separated the information from the specifics required to instantiate that information in any given programming environment. But we can encode general models for the information in the same format (complete reflectivity and closure) and take advantage of them if the application desires. We can take advantage of sharing models and information through public references to objects/recipes or by "shipping" recipes along with the information. This supports both a push and a pull model of moving information, models, and implementation between applications. Inter-language movement is inherently supported, as well as inter-version (e.g. JDK version) movement. The choice is up to the application how well the information is described vs. how much the receiving application will be responsible for interpretation. General document markup, objects, and information modeling are aligned and we can take advantage of the concepts, patterns, designs, and abilities of all of them. Penalties ========= MONDO is a newer approach that is different from industry specific and language-specific approaches. More overhead than language specific binary approaches [this could be lessened or removed by using binary recipe encoding format]. Possibly slightly more overhead than simple text property files, but is much more general. See === The MONDO Design document, an especially Chapters 1-5. http://www.chimu.com/projects/mondo/design/index.html For related technology see the Smalltalk file-in format, LISP, Java serialization, CORBA and the ODMG OIF specification. Some resources for these can be found at: http://www.chimu.com/projects/mondo/links.html Notes ===== [1] Actually, we can be even looser than that using a separate "interpretation" file. [2] I ran over 3 pages by a couple paragraphs :-( I guess the DSSSL example pushed me over. --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Dec 1 14:43:25 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:13 2004 Subject: Notation System Identifiers (was Re: Response to Simon St.L. on Entities v. XLL) In-Reply-To: <199712010601.RAA09705@jawa.chilli.net.au> References: <199712010601.RAA09705@jawa.chilli.net.au> Message-ID: <199712011444.JAA00575@unready.microstar.com> Rick Jelliffe writes: > You can make up your own FPIs *now* for all MIME types using the following > pattern. > > PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION > Multipurpose Internet Mail Extensions::video/mpeg//EN"> Yes, though this is not a well-formed XML notation declaration without a system identifier. On that point, I am still troubled about what to do with system identifiers for notations. WD-xml-971117 states that a system identifier is "a URL, which may be used to retrieve the entity" (sect.4.3.2), but we are not dealing with an entity here. Later on, when describing notations, the draft states that XML processors ... may additionally resolve the external identifier into the system identifier, file name, or other information needed to allow the application to call a processor for data in the notation described. (sect.4.7) It would seem to me that a MIME type would make more sense than a URL for the system identifier of notations, but that would introduce an inconsistency into the external-identifier scheme. I imagine that this topic has already been beaten to death in the WG. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Mon Dec 1 15:06:33 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:13 2004 Subject: Notation System Identifiers (was Re: Response to Simon St.L. on Entities v. XLL) Message-ID: <3.0.32.19971201070724.00955930@pop.intergate.bc.ca> At 09:44 AM 01/12/97 -0500, David Megginson wrote: >Yes, though this is not a well-formed XML notation declaration without >a system identifier. No longer; the WG just voted to allow PUBLIC without SYSTEM, specifically and only for >From a DOM perspective, EMBEDded material will almost certainly not be >considered part of the document tree containing the EMBED element. I very much look forward to seeing what the DOM does (or doesn't do) with the EMBEDded material. But is this an issue for the DOM in particular, or should the XML-Link spec give clearer direction about the nature of EMBEDded material? Especially as some of the replies so far have said that an application _could_ include the EMBEDded material in the document tree _if_ the developer so chose - which opens the door to multiple interpretations in a large way. And, of course, I can think of a considerable number of applications where it might be useful to be apply to apply the DOM to EMBEDded content without having to cope with a separate document tree. Sounds like fun. For the applications I'm proposing, I'd like them in the document tree, but of course that isn't appropriate for many situations. I'd really rather not see this prohibited, either - it would chop off an entire branch of XML development I'm working on. Could be the price of progress. We'll see. I guess what I'd love to see is another XML-Link attribute specifying whether to include an EMBED in the document tree or not - it seems to be the central issue around which this discussion has focused. Failing that, I'll look into Peter's proposals for XDEV, since they seem to address the challenges of multiple application behaviors directly - if they get implemented by application developers, of course. Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Mon Dec 1 16:03:21 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:59:13 2004 Subject: Data warehousing and XML Message-ID: >Database people typically do not store their information in any explicit >format. The database handles the representation. Data warehouses are the >same. I don't think that data warehouses are any more or less amenable >to XML than any other relational database. This may be writing it off too quickly; I think the great advantage of XML for a data warehouse would be its ability to ease the inclusion of non-relational data. A data warehouse that was capable of dealing with information in multiple formats might well take advantage of XML for storing data that wasn't necessarily in a table. Data warehousing meets document management: love match or endless feud? Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tallen at sonic.net Mon Dec 1 16:26:39 1997 From: tallen at sonic.net (Terry Allen) Date: Mon Jun 7 16:59:13 2004 Subject: Proper use of FPI name spaces Message-ID: <199712011626.IAA07954@bolt.sonic.net> Rick Jelliffe wrote: | | You can make up your own FPIs *now* for all MIME types using the following | pattern. | | Really, only the owners of the name space denoted by "IDN ds.internic.net" should be assigning such FPIs. It will not do for just anybody to be assigning names in someone else's name space. In your own name space you could name something belonging to someone else (unless legal issues prevent), but that's different. Regards, Terry Allen Electronic Publishing Consultant tallen[at]sonic.net http://www.sonic.net/~tallen/ Davenport and DocBook: http://www.ora.com/davenport/index.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From RMcDouga at JetForm.com Mon Dec 1 16:27:14 1997 From: RMcDouga at JetForm.com (Rob McDougall) Date: Mon Jun 7 16:59:13 2004 Subject: EMBED and validation Message-ID: I'm new to XML but this doesn't seem to accomplish what I would be looking for as an "include" capability. Let's say I have a markup language (let's call if RML, "Rob's Markup Language"). I create a DTD for it and post it to my public web site. All users of RML put the URL for the DTD in the declaration. So far so good? Now, if one particular user of RML notices that there's a section that's common across every one of their RML documents, they might wish to seperate it out into a distinct file and insert a link to it. This common piece is not a complete document unto itself so it cannot be validated, yet the user may wish to have the documents that include make sure that it is valid within the context that it was embedded. Since this particular file is unique to this user and not all RML users, it does not belong in the commono DTD. This would seem to make an external text entity undesireable for this case. Is this correct, or am I missing something? Is there any other way to accomplish this using the current XML/XLL specs? Rob ======================================================= Rob McDougall Phone: (613)751-4800 ext.5232 JetForm Corporation Fax: (613)751-4864 http://www.jetform.com mailto:rmcdouga@jetform.com ======================================================= >-----Original Message----- >From: Eve L. Maler [SMTP:elm@arbortext.com] >Sent: November 29, 1997 10:09 AM >To: Peter Murray-Rust >Cc: xml-dev@ic.ac.uk >Subject: RE: EMBED and validation > > >I don't think I've seen it explicitly suggested here, so here goes. If you >want to ensure that what's pointed to is real XML, and "belongs" in that >location, how about using a plain old external text entity? With a >validating XML processor, you can guarantee that (a) the entity will be >expanded in place before it even gets to the application and that (b) it >will be validated in context. > > Eve > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From elm at arbortext.com Mon Dec 1 17:11:26 1997 From: elm at arbortext.com (Eve L. Maler) Date: Mon Jun 7 16:59:13 2004 Subject: EMBED and validation Message-ID: <3.0.32.19971201120455.009ca100@village.doctools.com> There is a way to handle this using external text entities. The DTD for any one document is really made up of two parts (if they both exist): the external subset and the internal subset. Most people tend to think of the external subset "the DTD" and think of the internal subset as "the place where I supply my own common text, graphics, etc." However, if you want to create your own set of text entities and put them in the internal subsets of only the documents that you own, you've effectively made a local modification to the DTD. (There hasn't been a formal way to distinguish between "harmless" and "harmful" DTD modifications, and of course different people might draw the line in different places. In interchange of SGML today, typically it's acceptable to provide general entity declarations but not element/attribute declarations; put another way, the "markup model" isn't supposed to be changed by means of the internal subset.) Eve At 11:22 AM 12/1/97 -0500, Rob McDougall wrote: >I'm new to XML but this doesn't seem to accomplish what I would be >looking for as an "include" capability. > >Let's say I have a markup language (let's call if RML, "Rob's Markup >Language"). I create a DTD for it and post it to my public web site. >All users of RML put the URL for the DTD in the declaration. >So far so good? > >Now, if one particular user of RML notices that there's a section that's >common across every one of their RML documents, they might wish to >seperate it out into a distinct file and insert a link to it. This >common piece is not a complete document unto itself so it cannot be >validated, yet the user may wish to have the documents that include make >sure that it is valid within the context that it was embedded. Since >this particular file is unique to this user and not all RML users, it >does not belong in the commono DTD. This would seem to make an external >text entity undesireable for this case. > >Is this correct, or am I missing something? Is there any other way to >accomplish this using the current XML/XLL specs? > >Rob >======================================================= >Rob McDougall Phone: (613)751-4800 ext.5232 >JetForm Corporation Fax: (613)751-4864 >http://www.jetform.com mailto:rmcdouga@jetform.com >======================================================= > >>-----Original Message----- >>From: Eve L. Maler [SMTP:elm@arbortext.com] >>Sent: November 29, 1997 10:09 AM >>To: Peter Murray-Rust >>Cc: xml-dev@ic.ac.uk >>Subject: RE: EMBED and validation >> >> >>I don't think I've seen it explicitly suggested here, so here goes. If you >>want to ensure that what's pointed to is real XML, and "belongs" in that >>location, how about using a plain old external text entity? With a >>validating XML processor, you can guarantee that (a) the entity will be >>expanded in place before it even gets to the application and that (b) it >>will be validated in context. >> >> Eve >> > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Mon Dec 1 17:31:21 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:59:13 2004 Subject: EMBED and validation Message-ID: <3.0.32.19971201112637.00c9a84c@swbell.net> At 03:36 PM 12/1/97 UT, Simon St.Laurent wrote: >>From a DOM perspective, EMBEDded material will almost certainly not be >>considered part of the document tree containing the EMBED element. > >I very much look forward to seeing what the DOM does (or doesn't do) with the >EMBEDded material. But is this an issue for the DOM in particular, or should >the XML-Link spec give clearer direction about the nature of EMBEDded >material? Especially as some of the replies so far have said that an >application _could_ include the EMBEDded material in the document tree _if_ >the developer so chose - which opens the door to multiple interpretations in a >large way. XML (or SGML) data can be used in one of two ways: 1. Use by value (you get the data syntactically). This is what text entities are for. A text entity is, by definition, part of the *character string* of the document that references it. That means that the parser parses it at the point of reference and it must be valid or well formed (if the entire document is well formed). A document with a text entity reference is identical, for parsing purposes, to a document with the reference replaced by the entity's replacement text (note that in base SGML ESIS, text entity references are not communicated by the parser). 2. Use by reference (you point to the data but don't get it syntactically). This is what XML Link means by "EMBED" and what HyTime means by "value reference". The referenced data is a separate, self-contained object and the parser does not parse it at the point of reference (if at all, as it may not be XML data). For use-by-reference, it is up to the processing application to make sense of the reference, for example, presenting a referenced image according to the active style settings or presenting a referenced document as though it had occurred in line, or providing an icon you can select to see the referenced thing. As for "document trees" (groves), the initial result is *never* a single tree containing the results of parsing two documents (if the thing used by reference is another document). However, a processing application might choose to construct a *new* tree that combines the two documents in some way that makes sense *to the application*. For example, I've written several instances of a program that takes a tree of subdocuments and creates a single instance from them. Note that making the distinction between use by value and use by reference keeps separate the storage and logical organization of the data, so that data can be organized into storage objects independently of how it might be used logically by reference. For example, I might put all my chapters in a single storage object (document entity) but use individual chapters by reference (using element-level addressing). It's also important to keep in mind that, for XML and SGML, a reference to a document entity is usually taken as shorthand for reference to that document's root element (that is the HyTime default, and I assume, the TEI default). In HyTime's abstract processing model, use by reference is, by default, transparent to processing applications because the HyTime engine redirects the processing application to the data used by reference, making it look to the processor as though there is but a single grove. However, under the covers the groves are distinct and processors can ask to view them that way. This is probably more sophistication than most XML processors (e.g., browsers) need provide, although more sophisticated browsers and hypertext systems need this flexibility. Cheers, Eliot --

W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com

This is all required for one GIF. Every GIF requires an ENTITY. There *must* be an internal subset. There must be a registry for the FPIs, etc. In XLL I can write a complete document: (excuse the case insensitivity) >It would be nice if there was also an "inline" way of doing includes >that would allow the XML parser to validate the resulting content. Well, XLL does this ***as long as we agree on the semantics***. HREF (or IMG/SRC) is so widely used in HTML that people will certainly start doing their own thing. There are the following possibilities: - wait for a W3C body to pronounce (won't be this year, I suspect) - wait and see what commercial browsers do - invent nine-and-sixty ways of doing it - use XDEV: as at least a means of coordinating *some* people. JUMBO will start with the latter, and junk it as soon as anything official comes along... [BTW I am not very happy with the idea that FPIs are intended to be human- but not machine-readable. That makes them useless for things like image/gif.] P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 2 23:42:13 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:15 2004 Subject: Entities and XPointers In-Reply-To: Message-ID: <3.0.1.16.19971203002645.218f929c@pop3.demon.co.uk> At 16:35 02/12/97 -0500, David G. Durand wrote: >On Dec 2, 5:40pm, Simon St.Laurent wrote: >> Subject: Entities and XPointers >> While they don't provide the actuation flexibility or many of the other >> features of XML-Link, it may be possible to create external entities that use >> XPointers in the URL. Of course, this would require that either the >processing >> application can cope with XPointers (unlikely in this case), or that the >> server can interpret the XPointer and return only the chunk requested. > >There's no reason that you can't do this at the server side, and a client that >was so sophisticated could interpret the URL (but, pracitcally speaking, I >wouldn't expect many such to exist). I am probably missing something, but it seems fairly straightforward to extract something from another document - the question is whether it's allowed. For example, or could return a chunk of well-formed XML. (JUMBO is capable of the second form at present). The question is whether ... &chap3; is legal in an XML parser. I suspect that this is undefined - however it must not be 'application-dependent', because otherwise we get different parser behaviour. (The use of other connectors (| and ?) is presumably similar - it's the mechanics of how the entity is retrieved.) > The only argument I can see against this is that it requires all parser writers who cope with ENTITYs to resolve XLL - and that is quite a strong argument :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 3 00:05:49 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:15 2004 Subject: Validation algorithm/code wanted In-Reply-To: Message-ID: <3.0.1.16.19971203010145.223f5f34@pop3.demon.co.uk> This may come as a shock to some, but I would actually like to use DTD-based validation in JUMBO. The primary purpose is to be able to read in a document and map the content of each ELEMENT onto the DTD. This is so I can have a GUI-based authoring tool. [ATTLISTs are relatively easy and I have already done them, I think]. I would be grateful for some or all of the following: - a java-based library routine (I think this may be optimistic in 1997) - an algorithm, or a pointer to one on the WWW - some wise words about how much effort is involved in writing an algorithm. [Norbert solved this in NXP by including JACC - a java-based yacc-like beast - but it is cumbersome for just analysing single content models against instances]. The operation seems to be somewhere in between a graph matching routine (which I can do except for the optionality) and a BNF parser (e.g. yacc) which I certainly can't. My recollection of regexps is that they use a 'maximal munch' of some sort and so I would try to match as many of the early nodes and then unwind the stack repeatedly if it failed. However, yacc throws up the 'shift-reduce' conflicts which I imagine still pertain in XML. (This means there is more than one way of mapping a document onto the content model, I assume.) I'd really hate to have to hack this myself - maybe there is a mythical grad student on this list who really loves writing parsers. If so, I'll write to her supervisor with a glowing reference :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From cbullard at hiwaay.net Wed Dec 3 00:29:09 1997 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 16:59:15 2004 Subject: Data warehousing and XML References: <3482FAC7.9DD9D6E4@technologist.com> <348360F2.407B@hiwaay.net> <34842224.18C52B52@technologist.com> Message-ID: <3484A795.220@hiwaay.net> Paul Prescod wrote: > I don't doubt that there are some people in the world who want to "mine" > documents, but I think that they are in the minority, and will be for a > long time. But more important, it makes little sense to me to "mine" XML > data. Even if you wanted to mine your structured document data it will > almost always make sense to load that into the mining tool's internal > data structures. Umm.. that actually was one of the often requested capabilities when I was still working on SGML systems. The problem was precisely that a great deal of the *interesting* information was not in relational databases. Comparative policy analysis, for example. > Once again, XML is great as the transfer format, but when you get down > to doing your queries, your data mining software should not be parsing > the XML syntax. Ok. Hmm? Well, what were the various proposals over the years for SGML querying systems for? > > However, let me ask a technical > > question that you can probably answer with a deeper > > technical perspective than mine? How well can one query > > data (or convert it for that matter) for which one > > has no rigorous schema (of some kind)? > > In some cases you can do sophisticated queries on data without a schema, > but you would have to jump through AI hoops. It's not a job I would > apply for, but neural net experts may be able to detect structure in the > chaos. But building the schema first is definately cheaper than trying > to divine the structure later. That is what I thought to be the case. I remember when we were doing the GE CASS system we bounced around the idea of using DTDs as sort of a reversed query, that is, it gave us a way to figure out what kinds of queries should be interesting. We never pursued the idea because the SGML systems of that time were fairly primitive. len xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gmckenzi at JetForm.com Wed Dec 3 01:01:33 1997 From: gmckenzi at JetForm.com (Gavin McKenzie) Date: Mon Jun 7 16:59:15 2004 Subject: EMBED and validation Message-ID: Just some comments on this issue of 'inclusion'. I apologize if this sounds like a ramble... I understand the purpose and usefullness of declaring an entity in the internal DTD subset and employing this mechanism as the proper and valid way to include some (potentially marked up) text. But, echoing Rob McDougall's closing statements, for *many* applications it is simply too difficult for the application to 'predict' these inclusion points and place a corresponding declaration in the internal DTD subset. In fact, I would venture to say that most of my customers would walk away from XML based on this issue alone. Heavens, so many data processing shops still want to continue writing data out in fixed length COBOL style records; and while it may be the nineties, they are resistant to change. As much as it may seem to be a stretch to bring these type of data producers into the XML world, I (naively) think it is possible. So, after reading all the previous submissions (especially Peter's display of the overhead for setting up a GIF reference via the external entity method) I too wish to use an XLL based mechanism for expressing an 'inclusion' linkage, and pine for some agreement on the semantics. Although one thing remains unclear, despite the dozens of submissions I've read: Is it, or is it not acceptable for an application to choose to act upon an XLL linkage in a way that causes the target linked content to be included and validated. Another way, if I create an XML derived format, and document that a processor of this derived format should view a particular usage of an XLL construct as instructions to "retrieved and include 'inline' the target content, and validate it against the originating document's DTD as if the target content was part of the original document". I'd much prefer that there was a way to express this in the syntax. Gavin. ======================================================== Gavin F. McKenzie Vox:+1(613)230-3676 ext 5277 JetForm Corporation Fax:+1(613)594-8886 http://www.jetform.com mailto:gmckenzi@jetform.com ======================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From elm at arbortext.com Wed Dec 3 01:28:16 1997 From: elm at arbortext.com (Eve L. Maler) Date: Mon Jun 7 16:59:15 2004 Subject: EMBED and validation Message-ID: <3.0.32.19971202202247.00a9e690@village.doctools.com> At 07:12 PM 12/2/97 -0500, Peter Murray-Rust wrote: >At 10:29 02/12/97 -0500, Rob McDougall wrote: >>Thanks to everyone for the replies. I now (think I) understand how this >>would be used. This method does require that the person creating the >>document specify all the URLs he will be "include"ing at the top of the >>file. This is somewhat inconvenient for someone who only inserts things >>once into any given document. If the documents are being generated on >>the fly from some application, the application may have to perform two >>passes to derive a list of filenames, or else "bulk up" the document >>with lots of entities that may never be substituted. > >If you are going to 'include' binary 'files' (i.e. entities) then it gets >more complex. This is my current analysis. It's probably wrong. (Are there >any Java parsers which manage this?) > > PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION > Multipurpose Internet Mail Extensions::image/gif//EN"> > > > > > > >]> > >

> > >This is all required for one GIF. Every GIF requires an ENTITY. There >*must* be an internal subset. There must be a registry for the FPIs, etc. (One small note: XML does not currently require public IDs to be formal. This doesn't materially change your point, though...) >In XLL I can write a complete document: > >MIME="image/gif"/> > >(excuse the case insensitivity) It's true that this is another way to do basically the same thing, a way that relies on not only XML but also XLL. In practice, a lot of SGML shops don't use the "pure" way either; they just put a pathname in an attribute value, and use proprietary means to indicate that the named file should be output as a graphic or whatever. XLL is definitely an improvement on that! >>It would be nice if there was also an "inline" way of doing includes >>that would allow the XML parser to validate the resulting content. This feels a bit apples-to-oranges, because unless you're declaring XML itself as a "foreign notation" through a NOTATION declaration, you don't need a lot of the overhead you've shown above: ]> ... &mycontent; ... >Well, XLL does this ***as long as we agree on the semantics***. HREF (or >IMG/SRC) is so widely used in HTML that people will certainly start doing >their own thing. There are the following possibilities: > - wait for a W3C body to pronounce (won't be this year, I suspect) > - wait and see what commercial browsers do > - invent nine-and-sixty ways of doing it > - use XDEV: as at least a means of coordinating *some* people. > >JUMBO will start with the latter, and junk it as soon as anything official >comes along... XLL itself isn't intended to pull in content and have it validated as part of the same context in which the linking element appears. I think you'd have to use the DOM to dynamically change your document, and then reparse if you choose to. E.g., if you were to define a ROLE attribute value that means "parse me in context once you've pulled me in," you'd have to start another XML processor pass to do this, and it would be part of your own application semantics, not those of XLL. >[BTW I am not very happy with the idea that FPIs are intended to be human- >but not machine-readable. That makes them useless for things like image/gif.] > > P. Eve xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Wed Dec 3 04:00:17 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:59:15 2004 Subject: EMBED and validation In-Reply-To: <3.0.1.16.19971203001246.21bffac6@pop3.demon.co.uk> References: Message-ID: At 12:12 AM -0000 12/3/97, Peter Murray-Rust wrote: >If you are going to 'include' binary 'files' (i.e. entities) then it gets >more complex. This is my current analysis. It's probably wrong. (Are there >any Java parsers which manage this?) Actually, I just noticed, it _is_ wrong (I removed > quoting because it's too gross for SGMl examples): > This should be: The notation is attached to the entity, not the citation of the entity. ]>

Finally, this is a bit overstated, the following lines could (and should) all be included in any reasonable CML DTD: So the internal subset would have to contain the following to support _one_ gif: and the document would contain:

> > >Finally, this is a bit overstated, the following lines could (and should) >all be included in any reasonable CML DTD: > > PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION > Multipurpose Internet Mail Extensions::image/gif//EN"> > > > > > The whole point here is that you *have* to have a DTD of some sort to manage this. An external DTD is yet another level of indirection for the poor DXBH. >So the internal subset would have to contain the following to support _one_ >gif: > > > >and the document would contain: > >

> >Defining a DTD (and its associated stylesheets) generally requires careful >thought about what external notations are required in the intended >application. Predefined notation sets (in the form of external entities >with Public indentifiers) are common as dirt in the SGML world, for the >reasons of interchangeability and author sanity. I accept this - in the SGML world. But in the HTML world - whose case I am trying to present :-), 'my.gif' usually means a GIF and it works 10^7 times a day pretty well :-) > >The only place the FPI need appear is in the shared declaration, the >stylesheet (used to actually render or trigger processing of the non-XML >data), can use the notation name "gif" to detect a GIF file. No FPI is >involved at the "browser end" (non-validating processor augmented with a >CML stylesheet). Since I can't sleep, let's have a little story showing what the hacker has to do to resolve problem. Lets' assume that we are looking for mentions of GIFs in an XML document. With the XLL approach (and hardcoded MIME attribute, we grep for 'MIME="image/gif"' - exactly.) With the FPI NOTATION approach we have: Elephant's Child: Where are the GIFs in this document? Parser-man (for it is he, and his hat reflects the rays of the SUN in more than Oriental splendour): Come here and be spanked for your curtiosity for it is All very Simple. Find the NOTATIONs and follow their Indirections. EC: I have found the NOTATION, but where please (for the Elephant's Child was always polite) do I go PM: Your mygif must be searched in the Hashtable of ENTITYs (Parser-men always speak in long words) EC: I have found the Hashtable of ENTITYs but I am still lost. PM: Come here and be spanked again [and he was] Do you not see that NotationDeclaration on the ENTITY (for the parser man *always* spoke in Long Words) EC (who saw the NotationDeclaration but didn't want to be spanked and asked ever so ever so politely) Where do I go now? PM: Your must find the NOTATION and its Formal Public Identifier (because Parser Men *always* speak in Long Words, Best Beloved) EC: And so I want: What do I do with it? (ever so politely, but he got spanked again for his 'satiable curtiosity). PM: You must travel to the deserts in the middle of Australia and speak to the Big God Rick. Then ran JUMBO, poor old JUMBO, dusty in the sunshine, very much bewildered and came to the Big God Rick, and asked 'where do I go from here'? [... to be continued ...] P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 3 08:03:14 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:16 2004 Subject: EMBED and validation In-Reply-To: References: <3.0.1.16.19971203001246.21bffac6@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971203090021.2c27b9ba@pop3.demon.co.uk> At 22:59 02/12/97 -0500, David G. Durand wrote: >At 12:12 AM -0000 12/3/97, Peter Murray-Rust wrote: >>If you are going to 'include' binary 'files' (i.e. entities) then it gets >>more complex. This is my current analysis. It's probably wrong. (Are there >>any Java parsers which manage this?) > >Actually, I just noticed, it _is_ wrong (I removed > quoting because it's >too gross for SGMl examples): The Elephant's child has been spanked again...:-) > > > PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION > Multipurpose Internet Mail Extensions::image/gif//EN"> > > > > > > > > >> > > This should be: > > > > The notation is attached to the entity, not the citation of the entity. Enlightenment has slowly come. I think we actually need an additional NOTATION as well as SRC so that the final document reads. ]>

> > > > > >Finally, this is a bit overstated, the following lines could (and should) > >all be included in any reasonable CML DTD: > > > > > PUBLIC "+//IDN ds.internic.net/rfc/rfc2046.txt//NOTATION > > Multipurpose Internet Mail Extensions::image/gif//EN"> > > > > > > > > > > > > The whole point here is that you *have* to have a DTD of some sort to > manage this. An external DTD is yet another level of indirection for the > poor DXBH. Yes, but none of that information need be parsed except when _validating_ the document. So an author, and a browser implementor need not deal with that complexity. Furthermore, an application that does read the DTD can determine _exactly_ what the notation "gif" is supposed to represent by checking the PUBLIC and SYSTEM IDs given in the DTD. However, you can write a working stylesheet that need know nothing about thsis stuff. It strikes me as absolutely parallel to the case with element declarations, that are more useful in production than in simple processing by applications that know the DTD in question. The same is true of notations. So what is the problem? > I accept this - in the SGML world. But in the HTML world - whose case I am > trying to present :-), 'my.gif' usually means a GIF and it works 10^7 times > a day pretty well :-) This depends on HTTP MIME typing, and can be implemented by XLL and any sensible stylesheet language. > >The only place the FPI need appear is in the shared declaration, the > >stylesheet (used to actually render or trigger processing of the non-XML > >data), can use the notation name "gif" to detect a GIF file. No FPI is > >involved at the "browser end" (non-validating processor augmented with a > >CML stylesheet). > > Since I can't sleep, let's have a little story showing what the hacker has > to do to resolve problem. Lets' assume that we are looking for mentions of > GIFs in an XML document. Cute story clipped due to lack of relvance. You can use XLL and an appropoiate styylesheet, or Entities and an appropriate stylesheet. I thought JUMBO was a browser based on (at least a partial) DTD. So if you want to use NOTATION, you can declare _the notations you expect to need_. If an author needs a new notation, she can _declare_ the notations she needs. Of course, she needs to either add to your stylesheet, or create a new one that knows what that notation means, but this is not that hard. It's certainly no _harder_ than creating a new MIME-type. We both agree that a simple set of public identifiers for MIME-types would be useful. So define one. or, if you prefer the XLL-based mechanism (which does _not_ require delcaration of the MIME-type, and is probably much better implemented _without_ such a declaration) then use it. If you are going to declare the type in your document, perhaps because it is essential that you get the correct format for a multi-format resource, then you might as well use notation, since the markup is in fact simpler and less-redundant, unless you go out of your way to complicate it. My modification or your example shows you how to do this, so there's nothing stopping you. -- David ------------------------------------------+---------------------------- David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com Boston University Computer Science | Dynamic Diagrams http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/ | MAPA: mapping for the WWW xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dgd at cs.bu.edu Wed Dec 3 17:23:54 1997 From: dgd at cs.bu.edu (David G. Durand) Date: Mon Jun 7 16:59:17 2004 Subject: EMBED and validation Message-ID: On Dec 3, 9:00am, Peter Murray-Rust wrote: > Subject: RE: EMBED and validation > At 22:59 02/12/97 -0500, David G. Durand wrote: > >At 12:12 AM -0000 12/3/97, Peter Murray-Rust wrote: > Enlightenment has slowly come. I think we actually need an additional > NOTATION as well as SRC so that the final document reads. No. I will correct it again, levaing only _legal_ text in the message. In the DTD, the following must appear (for XML-validating applications only): The Inbternal subset would look like this: ]>

> The notation _does not need_ to appear on the link. At all. It's a property of the _entity_. Once you declatre the entity you are done. I _think_ that you can also turn the same instance markup into XLL markup (by using HREF instead of SRC (or can you rename it, I forget), and also changing the entity declaration. I have to reread the XLL spec., as it's been a while. > Have I finally got there? It seems to make sense... (The same levels of > indirection still apply, of course). No, you're still making it too complicated. Look at my entity declarations, and instance markup carefully, that's the only thing I had to change. ------------------------------------------+---------------------------- David Durand dgd@cs.bu.edu| david@dynamicDiagrams.com Boston University Computer Science | Dynamic Diagrams http://www.cs.bu.edu/students/grads/dgd/ | http://dynamicDiagrams.com/ | MAPA: mapping for the WWW xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From RMcDouga at JetForm.com Wed Dec 3 19:48:06 1997 From: RMcDouga at JetForm.com (Rob McDougall) Date: Mon Jun 7 16:59:17 2004 Subject: Problems with Entities (was re:Embed and validation) Message-ID: I'm seeing some disturbing similarities between Peter's problem and mine. It reflects a general problem that I've seen in many other languages. There seems to be two schools of thought about where declarations should go: (1) Declarations must be performed near the top of the file. (2) Declarations should be performed near where they are used. Method (1) works well for declarations that are going to be referenced many times throughout the file, and is able to accommodate the cases where a reference only occurs once. Method (2) works well for declarations that are only referenced once, but works rather poorly for ones that are referenced many times throughout the file. Which school is right? I think the trend is to allow either. Take for example C vs C++. C required you to define all your variables at the top of a function, but C++ also allows you to define them just before you use them. I don't think anyone would argue that the additional flexibility is a bad thing. Method (1) requires that the user be able to establish "order" in the file (i.e. make sure the declarations occur at the top). This greatly hinders creating files with declarations in them "on the fly". In order to know what declarations will be used, the user must perform a first pass on the data before writing it out. This is not always possible and is seldom desirable. I realise this inflexibility is something that has been inherited from SGML, but I worry that this will impede XML's adoption into the marketplace. This is the second time I've had to reject using XML's entity substitution capabilities because of the need to declare all your entities at the top of the file. I originally had wished to use the entity substitution as a text substitution, but unfortunately, my users will want to "re-define" the value of an entity several times throughout the file. This cannot be done using XML entities. The entity substitution capabilities within XML seem to get me 50% of the way to where I want to be on a couple of different issues (file inclusion and text substitution), but unfortunately, I've had to choose alternative solutions because they don't get me 100% of the way. :( Rob xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From hubick at medlib.com Thu Dec 4 01:14:37 1997 From: hubick at medlib.com (Chris Hubick) Date: Mon Jun 7 16:59:17 2004 Subject: PI, XMLDecl, and EncodingPI Message-ID: <34860337.ACBD09C8@medlib.com> I am writing a recursive descent XML parser in Java and have a couple questions.... The XML Working Draft dated 17-November-1997 states: [24] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [28] Misc ::= Comment | PI | S [19] PI ::= '' Char*)))? '?>' [25] XMLDecl ::= '' [79] EncodingPI ::= '' Within a PI is the Name "xml" reserved? If it is, should there not be a [wfc] on PI stating so? By the current definition any XMLDecl and EncodingPI is also a valid PI. In a prolog an XMLDecl is optional, and is followed by Misc, which includes PI. Ok, so I have can have an XML file with no XMLDecl (it's optional) followed by "" which matches PI, in my Misc*. And this is legal? My parser will take this just fine as such, but I wonder about the others. It makes detecting a bad XMLDecl impossible! My parser will just say fine, that wasn't an XMLDecl, and feed it to Misc, which will most likely match (or possibly spew) it as a PI. Shouldn't [19] PI have an S? at the end before '?>' ? Also shouldnt PCData be: [17] PCData ::= [^<&]+ rather than the current: [17] PCData ::= [^<&]* [44] content ::= (element | PCData | Reference | CDSect | PI | Comment)* because: This is a test In my recursive descent parses to: This is a test ... And we get infinite matches on a zero length PCData. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Thu Dec 4 01:44:07 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:17 2004 Subject: PI, XMLDecl, and EncodingPI Message-ID: <3.0.32.19971203174532.00986d90@pop.intergate.bc.ca> At 06:11 PM 03/12/97 -0700, Chris Hubick wrote: > Within a PI is the Name "xml" reserved? If it is, should >there not be a [wfc] on PI stating so? In fact, in the latest rev, we wired it right into the grammar. > By the current definition any XMLDecl and EncodingPI is also >a valid PI. In a prolog an XMLDecl is optional, and is followed >by Misc, which includes PI. > Ok, so I have can have an XML file with no XMLDecl >(it's optional) followed by "" which >matches PI, in my Misc*. And this is legal? Nope. And the grammar will getcha, because this no longer matches PI. >Shouldn't [19] PI have an S? at the end before '?>' ? No, because Char includes S >[17] PCData ::= [^<&]+ >rather than the current: >[17] PCData ::= [^<&]* >In my recursive descent parses to: It can't be +, because the empty string must match PCData. You'll just have to figure out how to stop descending. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Thu Dec 4 03:20:58 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:17 2004 Subject: Problems with Entities (was re:Embed and validation) Message-ID: <199712040320.OAA20074@jawa.chilli.net.au> > From: Rob McDougall > (1) Declarations must be performed near the top of the file. > (2) Declarations should be performed near where they are used. There has been a proposal for inline declarations recently. They would use declaration syntax but be inside a processing instruction, e.g. (This, I believe, will not be in WebSGML, now being finalized. But it may make it through the big SGML revision which looms.) > I realise this inflexibility is something that has been inherited from > SGML, but I worry that this will impede XML's adoption into the > marketplace. This is the second time I've had to reject using XML's > entity substitution capabilities because of the need to declare all your > entities at the top of the file. I originally had wished to use the > entity substitution as a text substitution, but unfortunately, my users > will want to "re-define" the value of an entity several times throughout > the file. This cannot be done using XML entities. In XML, the system identifier of an entity is a URI. This can include a query. The query can trigger an update of the value. There is no way to update the value of an external entity dynamically in XML, but that is because it is not a programming language. However, you can markup that you want updates to take place. For example, if the text was a running header, you could have an element like blah and make your software update the entity every time it was found. If you want to embed this more clearly into your document, you could use a processing instruction, for example Use entities to bring data in and PIs to send messages out. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mecom-gmbh at mixx.de Thu Dec 4 12:57:44 1997 From: mecom-gmbh at mixx.de (james anderson too) Date: Mon Jun 7 16:59:17 2004 Subject: Problems with Entities (was re:Embed and validation) References: <199712040320.OAA20074@jawa.chilli.net.au> Message-ID: <3486A9E0.CE1A923C@mixx.de> why do inline declarations need an additional operator? what's wrong with allowing element - or, in a similar sense, entity declarations - to appear with their standard syntax? Rick Jelliffe wrote: > > From: Rob McDougall > > > (1) Declarations must be performed near the top of the file. > > (2) Declarations should be performed near where they are used. > > There has been a proposal for inline declarations recently. They would > use declaration syntax but be inside a processing instruction, e.g. > > > (This, I believe, will not be in WebSGML, now being finalized. But it > may make it through the big SGML revision which looms.) > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Thu Dec 4 15:48:03 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:17 2004 Subject: Problems with Entities (was re:Embed and validation) Message-ID: <199712041547.CAA11063@jawa.chilli.net.au> > From: james anderson too > why do inline declarations need an additional operator? what's wrong with > allowing element - or, in a similar sense, entity declarations - to appear with > their standard syntax? That all has to be discussed. (I really shouldnt have mentioned that, I find it confusing enough trying to keep up with the latest XML draft, let alone all the suggestions being considered for SGML! There is a big back catalog of changes that WG4 has approved for the revision. The ones with a specific correlation to XML have been expedited for WebSGML [the update to SGML] so that XML and SGML will be in synch.) Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mtbryan at sgml.u-net.com Thu Dec 4 23:12:20 1997 From: mtbryan at sgml.u-net.com (Martin Bryan) Date: Mon Jun 7 16:59:17 2004 Subject: Position Statement on FPIs sought Message-ID: <01bd00f6$d5689e40$LocalHost@default> Charles For the record, David Durand has pointed out this week that: ISO 9070 is very clear on the subject, and I quote: "3.10 Owner name: the portion of a public identifier that names its owner. NOTES .... 13 The owner of a public identifier is not necessarily the owner of the object it identifies" and from the introduction: "... and an 'owner name', which identifies the originator of the public identifier" ISO 8879 defines owner identifier as: "The portion of a public identifier that identifies the owner or orignator of public text" and defines public text as: "The text that is known beyond the context of a single document..." There would seem to be a conflict here. 8879's two rules can be conflated to read "_identifies the owner of the text_ that is known beyond the context of a single document" whereas 9070's definitions can be conflated to read "the portion of a public identifier that _names the owner of a public identifier_, who is not necessarily the owner of the object it identifies". These definitions seem to be contradictory. Additionally David has said: "IDN is not in 9070 rev 2, and thus is not suitable _de jure_; it is also unsuitable _de facto_, since domain names can be reused by different organizations. Unless Internet policies and 9070 have both changed, I think this is also wrong." and "and 9070 is more recent, normatively cited by 8879, and edited by the same editor; so I am inclined to prefer the 9070 reading." We need to review the relationship between these two standards. Martin xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 5 10:27:02 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:17 2004 Subject: Vertical bar character In-Reply-To: Message-ID: <3.0.1.16.19971205111400.2187b142@pop3.demon.co.uk> I am building a module to parse DTD content models and have a strange (to me) problem on java DOS with the vertical bar character in the command line. I am using W95, JDK1.02 and the DOS prompt window. I type: java jumbo.sgml.ContentChunk (A|B) using the 'vertical bar' character on my keyboard (the 'or' symbol in Java/C). I assume this has decimal value 124 (from 'man ascii'). Under jview, this character is created with a value of 166. Under java it is created with a value of -90 If I quote the argument under java (i.e. "(A|B)" ), I get a value of 65446 [corresponds to 2^16 - 90] A - is this symptomatic of a general problem (e.g. something in Unicode). B - how can I 'quote' a '|' symbol in the DOS commandline? P. There is also a character 214 whose glyph seems to be a vertical bar with a break in it. I assume this is unrelated to the problem (even though it is what my keyboards display :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Fri Dec 5 11:41:34 1997 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 16:59:17 2004 Subject: Vertical bar character Message-ID: <01bd0172$5fab4000$1e09e391@mhklaptop.bra01.icl.co.uk> >I am building a module to parse DTD content models and have a strange (to >me) problem on java DOS with the vertical bar character in the command >line. I am using W95, JDK1.02 and the DOS prompt window. Unicode and Latin-1 have: VERTICAL BAR: 124 BROKEN BAR: 166 I believe that in the original ASCII, 124 was called vertical line, but many printers displayed it as a broken line. In the IBM PC-DOS code set 850, code 124 became broken line, while in Latin-1 it remained as vertical bar with the new code 166 (your minus 90) being allocated to broken bar. This means that software that is converting files between Latin-1 (or UNICODE, or Microsoft "ANSI") and PC-DOS code page 850 ought to perform a conversion on these characters. Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Fri Dec 5 13:27:52 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:17 2004 Subject: MSXML 1.6 problem Message-ID: <199712051327.NAA00475@mail.iol.ie> I have just installed MSXML 1.6. I can run the applet viewer etc. from IE 4 but jview is giving me a problem:- c:\msxml>jview msxml samples\tire.xml ERROR: java.lang.NoSuchMethodError: com/ms/xml/om/Document: method setCaseInsenst ive(Z)V not found Any ideas? Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From adrian at solero.force9.net Fri Dec 5 19:17:56 1997 From: adrian at solero.force9.net (Adrian Orlowski) Date: Mon Jun 7 16:59:18 2004 Subject: Seems it's all been worthwhile :) Message-ID: <199712051851.TAA03832@relay2.force9.net> Some news. In the July 1997 issue of EXE:The Software Developers' Magazine there was an article of mine on XML which I ended with this speculation: "It's possible to look away from the small print of the XML proposal to the larger picture of real world documents perhaps sceptical of the changes being asked for [by W3C]. Arguably though XML is the best attempt yet to move on from so-called plain text as the lowest common denominator for document interchange... somewhere in my mind's dark recesses I recall that Microsoft Word is based on an implicit structured outline model of documents; what price Word 9 or 10 coming XML-enabled with a DTD to cover all documents ever produced by versions 1 through 8?" (If you would like a copy of the article point your whatsit at http://www.dotexe.co.uk/ or email me and I will send you the SGML original. Please allow for the fact that it was written February based on the 1st XML draft.) The news is that this scenario might not be that far away: "Microsoft CEO Bill Gates recently said that XML will be the data format for Office and HTML will be the display standard." I have the following reference for this: Vendors to push XML as all-purpose Web middleware format http://www.infoworld.com/cgi-bin/displayStory.pl?97121.exml.htm Some news. If you have that killer XML app in the works, you'd better start looking to your laurels. -- adrian Adrian Orlowski adrian@solero.force9.net xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Robert.H.Dolin at kp.ORG Fri Dec 5 23:13:50 1997 From: Robert.H.Dolin at kp.ORG (Dolin,Robert H) Date: Mon Jun 7 16:59:18 2004 Subject: Message Length vs Processing Speed Message-ID: <01BD0190.2C9A9AD0@gren-exch-1.kpscal.org> Greetings XML-DEV list, We've been working on an SGML (?XML) syntax for HL7 messages, and one of the significant issues that has come us is the concern over message length. Here's a message I posted to the HL7 SGML/XML Listserver suggesting how we might try to optimize the length/speed consideration. Would appreciate any additional comments. Thanks, Bob Bob Dolin, MD Kaiser Permanente Robert.H.Dolin@kp.org ----------------------- >---------- >From: Dolin, Robert H >Sent: Tuesday, December 02, 1997 11:24 PM >To: 'HL7-SGML' >Cc: Dolin,Robert H >Subject: RE: DATATAG minimization > >I appreciate all the feedback on this 'innocent' posting of mine. > >Perhaps I should point out that just because we are looking for optimal >minimization techniques, there need not be anything in a DTD that >precludes it from being XML-compliant - well, at least this is partly >correct... Can a DTD be part XML-compliant and part non-XML-compliant, >and can the non-XML-compliant part be used only by those who need to >minimize message length?? > >As an aside, there are several knowledgeable and respected members of >the HL7 community who continue to feel that message length is of >significant concern. So, where length is of major concern, we can >examine minimization techniques. Where the use of XML or where message >parsing speed is of major concern, fully normalized messages/documents >can be passed. > >And as John Spinosa points out, there may be a tradeoff in message >length versus speed of parsing/validating messages. > >Here's an example DTD based on the HL7 Version 3.0 Draft to show how we >might possibly enable both - tiny messages where length is important, >and XML-compliant messages where parsing speed is important: > >(This DTD uses SHORTREF minimization, and actually can make messages >SMALLER then their ER7 representation. The specifications for the >SHORTREF can be added to an existing (XML-Compliant) DTD without >changing the portion of the DTD that was already there.) > >Example 1: A sample ER7 message (based on a draft of the HL7 Version 3 >specifications) (361 Characters) > >Example 2: A sample DTD based on the same message used for Example 1. > >Example 3: A fully normalized SGML message conveying the same >information as in Example 1, based on the DTD in Example 2. (708 >Characters). > >Example 4: The same SGML message as in Example 3, minimized using >SHORTREF. (354 characters). > >Example 5: The DTD from Example 2, along with the SHORTREF mappings >appended, which allow an SGML parser to take the minimized message in >Example 4 and convert it to the fully normalized message in Example 3. > > > ------------------------------------------------------------------------ >Example 1: A sample ER7 message (based on a draft of the HL7 Version 3 >specifications) (361 Characters) [there may be errors in my use of the >ER7 syntax] > >MSH|~ >PE|X703421|I||~ >BC|IPChoice|I~ >IPE|3|4~ >PADM|Emergency Dept|9708170430|BAPT|{Jones^Houston}~ >PTP|Dallas, TX|HS~ >BL|Acnt~ >PTBA|X746343|198768353|D3|{X3^Trauma}~ >NX~ >PTBA|M1|D|D4|{Martha^Steward}~ >EL|Acnt~ >PCP|ABX1234567|CONS||{Jimmie^Steward}~ >BL|PartProv~ >EP|~ >HCP|19283746X-879||D2|{DD-15264^SNM}~ >NX~ >EP|ISO 8879 SGML~ >HCP|X12-EDI-HL7-XML|1999|X12-13|{F-12345^SNM}~ >EL|Acnt~ > > > ------------------------------------------------------------------------ >Example 2: A sample DTD based on the same message used for Example 1. > > > > > > > > > > A CDATA #IMPLIED > B CDATA #IMPLIED> > > ------------------------------------------------------------------------ >Example 3: A fully normalized SGML message conveying the same >information as in Example 1, based on the DTD in Example 2. (708 >Characters). > > >X703421I > >34 > >Emergency Dept9708170430BAPTA="Jones" B="Houston"> > > > >Dallas, TXHS > >X746343198768353D3 > > >M1DD4 > > >ABX1234567CONSB="Steward"> > > > > >19283746X-879D2 > > >ISO 8879 SGML > >X12-EDI-HL7-XML1999X12-13B="SNM"> > > > ------------------------------------------------------------------------ >Example 4: The same SGML message as in Example 3, minimized using >SHORTREF. (354 characters). > > >|X703421|I|| > >|3|4| > >|Emergency Dept|9708170430|BAPT| >|Dallas, TX|HS| > >|X746343|198768353|D3| >|M1|D|D4| >|ABX1234567|CONS|| >|| > >|19283746X-879||D2| >|ISO 8879 SGML| > >|X12-EDI-HL7-XML|1999|X12-13| > > ------------------------------------------------------------------------ >Example 5: The DTD from Example 2, along with the SHORTREF mappings >appended, which allow an SGML parser to take the minimized message in >Example 4 and convert it to the fully normalized message in Example 3. > >"> >"> >"> >"> >"> >"> >"> >"> >"> >"> >"> >"> >"> >"> >"> >"> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > A CDATA #IMPLIED > B CDATA #IMPLIED> > >> > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 6 10:20:25 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:18 2004 Subject: Vertical bar character In-Reply-To: <01bd0172$5fab4000$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: <3.0.1.16.19971206110359.325f9208@pop3.demon.co.uk> At 11:38 05/12/97 -0000, Michael Kay wrote: >>I am building a module to parse DTD content models and have a strange (to >>me) problem on java DOS with the vertical bar character in the command >>line. I am using W95, JDK1.02 and the DOS prompt window. > > >Unicode and Latin-1 have: >VERTICAL BAR: 124 >BROKEN BAR: 166 > >I believe that in the original ASCII, 124 was called vertical line, but many >printers >displayed it as a broken line. In the IBM PC-DOS code set 850, code 124 >became >broken line, while in Latin-1 it remained as vertical bar with the new code >166 >(your minus 90) being allocated to broken bar. Thanks. This helps a good deal. I'm mystified as to why 166 (aka 'Broken bar') is displayed as a minute formless squiggle and 214 is displayed as a broken bar but I can survive without that knowledge > >This means that software that is converting files between Latin-1 (or >UNICODE, or >Microsoft "ANSI") and PC-DOS code page 850 ought to perform a conversion on >these characters. Yes. It performs an unwanted one :-). It looks like a problem between Java and the DOS commandline. What particularly worried me was that simple Java code using 'char' translated this character into 65446, which presumably has a completely different meaning in Unicode. IOW there is a danger that corruptions could take place. P. > >Mike Kay > > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Sat Dec 6 11:15:22 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:59:18 2004 Subject: Seems it's all been worthwhile :) In-Reply-To: <199712051851.TAA03832@relay2.force9.net> Message-ID: In message <199712051851.TAA03832@relay2.force9.net>, Adrian Orlowski writes >... >The news is that this scenario might not be that far away: > >"Microsoft CEO Bill Gates recently said that XML will be the >data format for Office and HTML will be the display >standard." I have the following reference for this: >Vendors to push XML as all-purpose Web middleware format >http://www.infoworld.com/cgi-bin/displayStory.pl?97121.exml.htm Excellent news. I make/made a similar plea in 'Presenting XML' ("XML- Based Authoring", p.44-47). Richard. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Sat Dec 6 11:54:21 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:18 2004 Subject: Vertical bar character Message-ID: <199712061154.WAA29576@jawa.chilli.net.au> > From: Peter Murray-Rust > Thanks. This helps a good deal. I'm mystified as to why 166 (aka 'Broken > bar') is displayed as a minute formless squiggle and 214 is displayed as a > broken bar but I can survive without that knowledge > > > >This means that software that is converting files between Latin-1 (or > >UNICODE, or > >Microsoft "ANSI") and PC-DOS code page 850 ought to perform a conversion on > >these characters. > > Yes. It performs an unwanted one :-). It looks like a problem between Java > and the DOS commandline. What particularly worried me was that simple Java > code using 'char' translated this character into 65446, which presumably > has a completely different meaning in Unicode. IOW there is a danger that > corruptions could take place. This must be a bug. 65446 = FFA6, but I figure that 166=00A6 which is suspiciously close. FFA6 is a naughty Korean character, so I guess someone has programmed wrong. I dont know whay 214 = D6 is displayed as a broken bar. Have a look in the keycaps application whether 214= D6 is indeed a broken bar in the font you are using. (It is also quite possible for a font designer to decide to use a broken bar glyph where a single bar is wanted, and vice versa. If that is the case, change the font to one that isnt broken.) Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From adrian at solero.force9.net Sun Dec 7 18:35:41 1997 From: adrian at solero.force9.net (Adrian Orlowski) Date: Mon Jun 7 16:59:18 2004 Subject: Seems it's all been worthwhile :) In-Reply-To: <199712051851.TAA03832@relay2.force9.net> Message-ID: <199712071815.TAA10431@relay1.force9.net> On 5 Dec 97 at 19:39, Adrian Orlowski wrote: > In the July 1997 issue of EXE:The Software Developers' > Magazine there was an article of mine on XML > (If you would like a copy of the article point your whatsit > at http://www.dotexe.co.uk/ or email me and I will send you Apologies to anyone on a wild goose chase to the above URL. (It should have been http://www.exe.co.uk -- except you won't find it there). It is now available at http://www.solero.force9.co.uk/ -- adrian Adrian Orlowski adrian@solero.force9.net -- ------------------------------------------ -- Adrian Orlowski adrian@solero.force9.net Information Systems Software Ltd 20 Andover Road, Newbury, Berkshire RG14 6LR, UK Voice/Fax: +44(0)1635 49574 E-mail: adrian@solero.force9.net xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Mon Dec 8 12:59:50 1997 From: ht at cogsci.ed.ac.uk (Henry Thompson) Date: Mon Jun 7 16:59:18 2004 Subject: Message Length vs Processing Speed In-Reply-To: "Dolin,Robert H"'s message of Fri, 5 Dec 1997 15:12:16 -0800 References: <01BD0190.2C9A9AD0@gren-exch-1.kpscal.org> Message-ID: In the words of our former president, "We could do that, but it would be wrong." You SGML is impeccable, but without understanding why people care about message length it's very hard to address the larger issues you raise. Could you elaborate a bit on the numbers and attitudes involved, i.e. average message size now (is your example typical?), anticipated traffic volume, size of archives, etc.? ht -- Henry S. Thompson, Human Communication Research Centre, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.cogsci.ed.ac.uk/~ht/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dkuhlman at netcom.com Tue Dec 9 00:55:25 1997 From: dkuhlman at netcom.com (G. David Kuhlman) Date: Mon Jun 7 16:59:19 2004 Subject: MSXML 1.6 problem In-Reply-To: <199712051327.NAA00475@mail.iol.ie> from "Sean Mc Grath" at Dec 5, 97 01:27:42 pm Message-ID: <199712090055.QAA17149@netcom.netcom.com> > > I have just installed MSXML 1.6. I can run the applet viewer etc. from IE 4 > but jview is giving > me a problem:- > > c:\msxml>jview msxml samples\tire.xml > > ERROR: java.lang.NoSuchMethodError: com/ms/xml/om/Document: method > setCaseInsenst > ive(Z)V not found > > > Any ideas? Add a path to the msxml classes. Something like: jview /cp d:\msxml\classes msxml -d samples\Tire.xml By the way, is anyone successfully running msxml under Linux? With which version of the JDK? 1.1.3? I'm interested in comments on this. -- Dave > > Sean Mc Grath > sean at digitome dot com > > > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eric at macweb.com Tue Dec 9 04:44:17 1997 From: eric at macweb.com (Eric Bickford) Date: Mon Jun 7 16:59:19 2004 Subject: entity arrays/quotes Message-ID: <1330508586-10638612@macweb.com> I'm investigating the XML spec for conformance by my CGI, and I have a couple questions: 1) I was surprised to see that a single quote is valid for attribute values (as apposed to double quotes). Is this new with XML, or does HTML also allow single quotes? 2) Is there some standard way to declare an ENTITY that includes an array of values? To be specific, I'd like to include a list of values in a document so my parser can build a SELECT menu of OPTIONS. 3) Does anyone have an opinion on how &entities; can best be used with a database application? For example, assume you declare in your DTD a list of &entities;, one for each database field name/value. If we are to expect browsers to parse an xml document with entities, how can a found table or hit list of values get substituted? Eric Bickford eric@macweb.com Web Broadcasting Corporation http://macweb.com/ Web Essentials for FileMaker Pro WEB FM, PICT FM, LOG FM, TAG FM xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Tue Dec 9 04:53:29 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:19 2004 Subject: entity arrays/quotes Message-ID: <3.0.32.19971208205259.00c54fe0@pop.intergate.bc.ca> At 08:44 PM 08/12/97 -0800, Eric Bickford wrote: >1) I was surprised to see that a single quote is valid for attribute >values (as apposed to double quotes). Is this new with XML, or does HTML >also allow single quotes? No, and yes. >2) Is there some standard way to declare an ENTITY that includes an array >of values? To be specific, I'd like to include a list of values in a >document so my parser can build a SELECT menu of OPTIONS. No, but you could have an entity whose value was text text text but, I want to prevent this: text text ========================================================== Mary Holstege wrote: > > Russell East writes: > > How come the following doesn't work? > > > > > > I basically want my element a to either form an hierarchy > > *or* have some text data. > > > > But it seems I'm forced to have > > > > > > which I don't really want at all. > > Try this: > > > > Yours is ambigious when you have nothing -- is it a list of a's of length zero > or is it a #PCDATA with a null string? > > //Mary -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Russell East mailto:reast@esri.com _|_| Programmer phn: +1 (909) 793 2853 _|_| ESRI, 380 New York St fax: +1 (909) 307 3067 Redlands CA 92373-8100 http://maps.esri.com/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 10 19:28:05 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:21 2004 Subject: Mixed content not working for me In-Reply-To: <348EC4DD.623632@esri.com> References: <348EC4DD.623632@esri.com> Message-ID: <199712101927.OAA00337@unready.microstar.com> Russell East writes: > How come the following doesn't work? > > > I basically want my element a to either form an hierarchy > *or* have some text data. > > But it seems I'm forced to have > XML bans this type of mixed content because it has been causing trouble in full SGML for over a decade. The problem comes with something like this: After an SGML parser reads the opening tag, it doesn't know whether the element will contain #PCDATA or subelements. The first character it reads is a linefeed -- that's character data, so the parser assumes that it is reading #PCDATA; when the parser finds the tag a few characters later it throws an error. You need to do two things: 1) submit a bug report to Microsoft; and 2) create a new subelement to hold the text: Now you can have This is some text or This is a subelement All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 10 20:08:16 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:21 2004 Subject: XML of Darkness Message-ID: <199712102007.PAA00841@unready.microstar.com> I have put online a rough-and-ready version of Conrad's HEART OF DARKNESS, with an XML 1.0 DTD and markup. You can get at the document through the following URL: http://home.sprynet.com/sprynet/dmeggins/texts/ You may, of course, simply download the document and parse it on your local system; however, if you happen to have an active Internet connection, it's much more interesting (and much more in line with the XML philosophy) to parse the document directory from its source URL: http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml For example, with �lfred (http://www.microstar.com/XML/), you would type java EventDemo http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml and watch the events roll down your screen. I have not tried this yet with other XML parsers like Lark or MSXML. For a _really_ fun test in the future, I might put different chapters of the book on different Internet hosts (you could still parse it through a single top-level URL). This is where XML can be exciting for managing distributed information. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Wed Dec 10 20:10:35 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:59:21 2004 Subject: msxml 1.8 questions Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C099F2E@red-msg-56.dns.microsoft.com> 1) I found the NullElementFactory problem. The msxml.class file in 1.8 is out of date. Recompile msxml.java and everything should work fine. NullElementFactory is inside msxml.java. 2) "Error: java.lang.NoSuchMethodError: com/ms/xml/Document: setLoadExternal(Z)V not found" definately indicates an install or classpath problem. Looks like it isn't picking up the new stuff. Try the following: jview /cp:p "d:\devel\msxml;d:\devel\msxml\classes;" msxml Hamlet.xml Glad to hear that our parser works fine under Sun-JDK 1.1.5. > -----Original Message----- > From: Ingo Macherius [SMTP:Ingo.Macherius@TU-Clausthal.de] > Sent: Wednesday, December 10, 1997 12:26 AM > To: xml-dev@ic.ac.uk > Subject: msxml 1.8 questions > > Here's a list with problems regarding msxml 1.1.8. > > 1) Fast mode > > Did anyone get msxml 1.8 to work with "-f" set ? > I tried with Sun-JDK 1.1.{2,3} on Linux, Sun-JDK 1.1.5 on Win95 > and latest jview. All fail to parse any XML-document. > > With Sun-JDK: > [inim@voyager samples]$ java msxml -f Hamlet.xml > java.lang.NoClassDefFoundError: NullElementFactory > at msxml.main(msxml.java) > > With jview: > c:\temp\samples> jview msxml -f Hamlet.xml > Error: java.lang.NoClassDefFoundError: NullElementFactory > > 2) jview vs. Sun-JDK on win95 > > Called from commandline, jview fails this way: > > > echo %CLASSPATH% > d:\devel\msxml;d:\devel\msxml\classes;. > > jview msxml Hamlet.xml > Error: java.lang.NoSuchMethodError: com/ms/xml/Document: > setLoadExternal(Z)V not found > > Strange enough: Sun-JDK 1.1.5 works fine ! > > > Once again clueless, > ++im > > -- > Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld > mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/ > Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank > Zappa) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 10 21:38:13 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:21 2004 Subject: LISTRIVIA (was Re: Microsoft's JScript XML Sample) In-Reply-To: Message-ID: <3.0.1.16.19971210221359.2c0f2bdc@pop3.demon.co.uk> At 17:50 09/12/97 -0500, Craig Gingell wrote: >I am keen to exploit the potential of XML in a project I am currently >working on. Good :-) >I have visited the Microsoft website page >http://www.microsoft.com/msdn/sdk/inetsdk/help/itt/xml/overview/Sample_4 >.htm#Sample_4 >and cut and pasted the JScript to my own file. Here is my file - > Please do not include attachments to posting to XML-DEV - the mailer and the hypermail can get confused by them and they don't appear on the latter. If these are useful resources, find a permanent site for them (we have volunteers). P. (remember also that some people - certainly myself - have to pay personally for all mail they received from XML-DEV). Best of luck, P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 10 21:40:23 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:21 2004 Subject: RTF merge In-Reply-To: Message-ID: <3.0.1.16.19971210220932.2c0f4a36@pop3.demon.co.uk> At 09:42 09/12/97 -0800, you wrote: >Hi, > >I'm looking for a RTF merge utiity to merge 2 or more files. RTF header in theses files have to be same and is removed from 2nd file onwards. I know it is pretty easy to write but why to spend time if it is available. This list is essentially for those interested in developing XML applications :-) and not for general wordprocessing queries. There are better newsgroups where you are more likely to find an answer. Best of luck. P. > >Let me know if you have it? > >Thanks >Satwinder Mangat > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 10 21:41:09 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:21 2004 Subject: LISTRIVIA (was Re: msxml 1.8 questions) In-Reply-To: <001501bd0549$ed433c80$0100007f@localhost> Message-ID: <3.0.1.16.19971210220657.2c0fb3e8@pop3.demon.co.uk> At 00:59 10/12/97 -0800, Don Park wrote: [... useful help with problem...] > > >Attachment Converted: "c:\eudora\attach\NullElem.java" > >Attachment Converted: "c:\eudora\attach\NullElem.class" > >Attachment Converted: "c:\eudora\attach\XMLStrea.java" > >Attachment Converted: "c:\eudora\attach\XMLStrea.class" > It's probably a poor idea to attach material that is going to a mailing list which is then hypermailed. I have instances where non-printables have crashed the Hypermail system on our machine, and the attachments don't go anywhere useful. We have already several volunteers for providing various XML resources and I am sure some of those would mount material if asked. P. BTW it would be extremely useful to collect together all the MSXML-related material somewhere since I think some of us are now confused by what we need to download and what to so with it :-). And is there a WORA version yet :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 10 21:53:35 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:21 2004 Subject: General comments on parsers (was [NEW] AElfred) In-Reply-To: <199712100018.TAA00263@unready.microstar.com> Message-ID: <3.0.1.16.19971210223816.2c0f5bd6@pop3.demon.co.uk> At 19:18 09/12/97 -0500, David Megginson wrote: >Microstar Software Ltd. is happy to announce lfred (AElfred), a >small, fast, DTD-aware Java-based XML parser, especially suitable for >use in Java applets. Great! I have bolted support for (AE)?lfred into JUMBO and tested the last but one lfred pre-release. Many thanks to Microstar (and David) for having approached JUMBO. JUMBO now supports three parsers (in alpha order) - Lark - lfred - NXP (is MXSML WORA yet??) They are run with the commandline java jumbo.sgml.SGMLTree myfile.xml PARSER=AElfred (or whatever) It has proved relatively easy to bolt these in, but there have been significant differences in the interfaces offered and I hope that we can move towards some uniformity - at least in the terminology. I shall post more on this to XML-DEV. Specific comments: >lfred is free for both commercial and non-commercial use, and COMES ^^^^^ I am not sure whether the ligature has disappeared here or whether you have shortened it to 'lfred' (5 chars). Although I support the use of Unicode, many mailers don't (this is Eudora). Note also that I use names for Java classes as well and so do authors, so we have Lark.class, etc. I doubt whether JDK1.02 supports ligature. There are 3 possibilities: 7 chars (AElfred) 6 chars (lfred) 5 chars (lfred) I think you need to standardise on ONE! [... valuable design points omitted...] > >6. lfred must produce correct output for well-formed and valid > documents, but need not reject every document that is not valid or > not well-formed. > > STATUS: lfred is DTD-aware, and handles all current XML features, I can see several ways a parser can treat the DTD: - ignore external and internal subsets completely - read and parse the internal subset and apply ATTLISTs and ENTITYs - ditto and provide handles for the application to retrieve DTD information - ditto, but include the external subset - as above, but validate attribute values - as above but also validate content Only the latter is full validation. JUMBO wants to retrieve the DTD information for its authoring process, and needs the ELEMENT and ATTLIST information. At my last attempt I was unable to extract ELEMENT information from Lark (but can get ATTLISTs) and I don't think I could get ELEMENT info from lfred. I haven't looked at NXP, and perhaps Norbert could update us. > including CDATA and INCLUDE/IGNORE marked sections, internal and > Again, many thanks to Microstar and David, Tim, Norbert (and the MSXML players when we get the WORA version). P. > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 10 22:00:26 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:21 2004 Subject: General comments on parsers (was [NEW] AElfred) In-Reply-To: <3.0.1.16.19971210223816.2c0f5bd6@pop3.demon.co.uk> References: <199712100018.TAA00263@unready.microstar.com> <3.0.1.16.19971210223816.2c0f5bd6@pop3.demon.co.uk> Message-ID: <199712102159.QAA00524@unready.microstar.com> Peter Murray-Rust writes: > There are 3 possibilities: > 7 chars (AElfred) > 6 chars (lfred) > 5 chars (lfred) > > I think you need to standardise on ONE! Just for clarification, the proper name is "?lfred" (with an AE ligature at the start), but that will not come through older mailers; the ASCII transliteration is "AElfred", but the point of the AE ligature is that XML is not limited to ASCII (though many people's e-mail is). The unimaginative Java class name is com.microstar.xml.XmlParser, so there's no problem with ligatures there. We could type Ælfred, but we'd scare away the Java hackers. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Dec 10 22:18:05 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:22 2004 Subject: LISTRIVIA (was Re: msxml 1.8 questions) Message-ID: <001201bd05b8$ff3819f0$0100007f@localhost> >It's probably a poor idea to attach material that is going to a mailing >list which is then hypermailed. I have instances where non-printables have >crashed the Hypermail system on our machine, and the attachments don't go >anywhere useful. Sorry about that. I have now uploaded XMLStreamReader.java at: http://www.quake.net/~donpark/XMLStreamReader.java I will keep it there until the corrected version of MSXML 1.8 is released (should be RSN). It turns out that NullElementFactory.java is not needed because it is inside msxml.java. Just recompile and you should get the NullElementFactory.class file. >BTW it would be extremely useful to collect together all the MSXML-related >material somewhere since I think some of us are now confused by what we >need to download and what to so with it :-). And is there a WORA version >yet :-) I agree but I am short of disk space on my web site. Anyway, I am willing to take responsibility for XML example files and DTDs. MSXML 1.8 runs just fine on the latest JDK and MS Java SDK. Don > > P. > >Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic >net connection >VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary >http://www.venus.co.uk/vhg > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 10 22:20:03 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:22 2004 Subject: New AElfred Release (1.0beta2) Message-ID: <199712102219.RAA00729@unready.microstar.com> I have put up a new beta release of Microstar's Java-based XML parser, ?lfred (AElfred), with two minor bugs fixed: 1) When ?lfred finds " but not All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mike at datachannel.com Wed Dec 10 22:23:04 1997 From: mike at datachannel.com (Mike Dierken) Date: Mon Jun 7 16:59:22 2004 Subject: Latest news on NXP Message-ID: <01BD0576.C7D81970@NEMO> Peter, Norbert is currently at XML'97, and I'm not sure if he is monitoring this list right now. Here is a press release talking about the future development efforts and availability of NXP. http://www.datachannel.com/pressroom/releases/Press32.htm Here is a page with links to the parsers and samples: http://www.datachannel.com/products/xml/index.html Mike D DataChannel xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Wed Dec 10 22:32:23 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:59:22 2004 Subject: Hi, I notice that the current draft has switched the case of the XML declaration and its arguments to lower case: Now that case is significant, this presumably matters. Is there a particular reason for this? Other PIs will have a PItarget where 'xml' sits, and this isn't constrained to be any particular case. Wouldn't it be kinder to make it ' Message-ID: <3.0.1.16.19971210234937.34e73996@pop3.demon.co.uk> At 14:14 10/12/97 -0800, Don Park wrote: [... thanks Don...] > >MSXML 1.8 runs just fine on the latest JDK and MS Java SDK. Does this mean on either or are both necessary? i.e. do I have to download the MS SDK? Thanks, P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 10 23:28:06 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:22 2004 Subject: Message-ID: <3.0.1.16.19971211001535.29676d76@pop3.demon.co.uk> At 22:26 10/12/97 +0000, Richard Light wrote: > >Hi, > >I notice that the current draft has switched the case of the XML >declaration and its arguments to lower case: > > > >Now that case is significant, this presumably matters. Is there a >particular reason for this? Other PIs will have a PItarget where 'xml' >sits, and this isn't constrained to be any particular case. Wouldn't it >be kinder to make it ' >(The DTD declarations (for compatibility with what SGML systems produce.) Maybe WG members authorised to speak about this will answer the 'why' questions :-) The main problems now facing XML-DEV'ers are: - to remember what the various cases are in the XML spec. Of course the parsers will remind us ungently :-) [These are Draconian bomb-out errors unless I am mistaken :-)] - to remember what the case sensitivity is in *other peoples* DTDs and documents. The second promises to be a real problem. (BTW I support the WG's motives in introducing case sensitivity). I don't know whether we can help ameliorate it here. This sort of thing: [bringgg, bringgg]. "Hi Sue, my XML document has bombed out with 'unknown element FOOBAR'." "Mary, did you remember the capitals?" "yes, I put them all in!" "How many?" "The whole lot." "What? Two?" "No, all SIX". "Ah, you should only have two." "Where?" "The F and the B." "Oh, well HTML is all caps". "Yes, but this isn't HTML." "Well it's a sort of extended HTML, isn't it?." ... and so on ... I have no idea how to construct CML cases at present. If I follow the XML spec I get all-lower-case-with-dashes-between-words. OK, except that -'- is not a very friendly character for forming java names from. If I follow the WC namespace proposal I get random upper and lower case for namespaces and for elements. If I follow the RDF I get consistent namespace case and some capitalisation in names. So: PLEA TO W3C Please, it would help us a lot if at least the W3C could use a consistent case style in their public-facing documents. At the moment it suggests they haven't addressed this problem. [I don't believe they don't care.] If this happened, at least some of the rest of us can follow W3C style. I doubt we can convince the whole world to use one style, but languages like Java and C++ do quite a good job of gently persuading people to use a communal approach. XML/W3 could do, if they address it. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jones at nceas.ucsb.edu Thu Dec 11 00:15:10 1997 From: jones at nceas.ucsb.edu (Matt Jones) Date: Mon Jun 7 16:59:22 2004 Subject: General comments on parsers (was [NEW] AElfred) References: <3.0.1.16.19971210223816.2c0f5bd6@pop3.demon.co.uk> Message-ID: <348F306B.ADB07AFA@nceas.ucsb.edu> Thanks to the parser writers! Like Peter, I am working on a project where we are building an XML editing application in Java and therefore need access to the content model for determining allowable content. The msxml parser currently doesn't make its internal representation of the DTD public -- Chris Lovett suggested using the XML-Data Schemas instead of trying to access the DTD info directly. When one wants access to the DTD, what is the recommended method? Is there any concensus? Do any of the available parsers (Lark, MSXML, NXP, PaxSyn, etc.) plan on offering access to the DTD through their APIs at some point? Standardization of APIs (a la XAPI-J) would make life better as well -- are people working on this (Lark? MSXML? etc?)? Thanks in advance, Matt -- ****************************************************************** Matt Jones jones@nceas.ucsb.edu http://www.nceas.ucsb.edu/ Ph: 805-892-2508 Fax: 805-892-2510 National Center for Ecological Analysis and Synthesis (NCEAS) ****************************************************************** Peter Murray-Rust wrote: > JUMBO wants to retrieve the DTD information for its authoring process, > and > needs the ELEMENT and ATTLIST information. At my last attempt I was > unable > to extract ELEMENT information from Lark (but can get ATTLISTs) and I > don't > think I could get ELEMENT info from lfred. I haven't looked at NXP, > and > perhaps Norbert could update us. > > Again, many thanks to Microstar and David, Tim, Norbert (and the MSXML > > players when we get the WORA version). > > P. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Thu Dec 11 01:41:33 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:22 2004 Subject: msxml 1.8 questions Message-ID: <000f01bd05d5$6ef33f10$0100007f@localhost> >> >>MSXML 1.8 runs just fine on the latest JDK and MS Java SDK. > >Does this mean on either or are both necessary? i.e. do I have to download >the MS SDK? No. Either one should be just fine. Both is also fine. None would pose a little difficulty. I typically compile using MS Java SDK and run using JDK. Have fun, Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 11 01:52:45 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:22 2004 Subject: General comments on parsers In-Reply-To: <348F306B.ADB07AFA@nceas.ucsb.edu> References: <3.0.1.16.19971210223816.2c0f5bd6@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971211024212.2d87bafc@pop3.demon.co.uk> At 16:14 10/12/97 -0800, Matt Jones wrote: >Thanks to the parser writers! > >Like Peter, I am working on a project where we are building an XML >editing application in Java and therefore need access to the content >model for determining allowable content. The msxml parser currently Since my last posting I have been hacking AElfred into JUMBO and it does a nice job of getting almost everything from the DTD *except* the content. [It seems to require an external DTD for this - it complains about elements in the internal subset, although this is the pre-beta version :-)] >doesn't make its internal representation of the DTD public -- Chris >Lovett suggested using the XML-Data Schemas instead of trying to access I am going to post something along these lines tomorrow (I hope). >the DTD info directly. When one wants access to the DTD, what is the >recommended method? Is there any concensus? Do any of the available >parsers (Lark, MSXML, NXP, PaxSyn, etc.) plan on offering access to the >DTD through their APIs at some point? > >Standardization of APIs (a la XAPI-J) would make life better as well -- >are people working on this (Lark? MSXML? etc?)? Yes, please. This list (especially John Tigue) worked hard to come up with Xapi-J - everyone seemed to think it was a good way forward, but no parsers implement it. Instead we have an increasing (and rather difficult) variety of approaches (and especially terminology). For example, it's clear that AElfred and Lark use 'Entity' in different ways [I'm slightly confused by Lark's use of Entity]. Parsers are NOT equivalent, and there are many reasons why an application may wish to use more than one. - different interfaces, giving different views of the document - different optimisations of speed, memory, etc. - different treatment of entities - different features It's very tedious to have to implement different interfaces for each (AElfred has about 30 methods - and they are all valuable). So: - Chris - David - James - John - Norbert - Tim any comments on a common interface :-)? P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 11 05:45:10 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:22 2004 Subject: LISTRIVIA In-Reply-To: <001201bd05b8$ff3819f0$0100007f@localhost> Message-ID: <3.0.1.16.19971211064248.17ef2e4e@pop3.demon.co.uk> A gentle reminder to posters to clip quoted material before posting. Including the whole text of a previous posting is rarely necessary, and means that (a) the disk space for the list gets filled up and (b) that people like me who pay for mail out of their own personal pockets have to pay more. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From smith at interlog.com Thu Dec 11 08:01:03 1997 From: smith at interlog.com (Chris Smith) Date: Mon Jun 7 16:59:22 2004 Subject: XML vs the Dreaded Whitespace In-Reply-To: Message-ID: I'm part of a group that has decided to use XML as an encoding for documents which are effectively carrying transactions. Seeing XML make it to Proposed Recommendation is great, and makes our decision less of a concern. Part of this work requires that these documents carry document authentication information. This, in turn, requires that some regions of an XML document must be transported *exactly*, and must be received and checked identically so that the message authentication actually works. That fact that we are considering the idea of including email as a transport mechanism doesn't help matters. There are two questions at hand, largely directed at those creating parsers. I'd like to know if the application requirements we are proposing ("what to do with the document") are going to be incredibly difficult to manage, given what the parsers are providing. I confess I'm just getting started here - I will get to investigating the various parsers. For now the questions may be useful anyway. The first criteria is that message authentication is applied to an element in the document. This is a start to precisely defining what is being checked. The second criteria is that the message authentication must be applied to the XML document as represented in UTF-16 encoding, with big-endian convention, AS IT IS WRITTEN. This is to prevent us having to specify a consistent *internal* representation. The XML spec itself helps define a consistent *external* representation, which we figure is easier to stick with than dealing with all the cross-platform issues. The question: can this readily be dealt with? Is it straight-forward to ask for MessageAuthentication over ..., with all the content included? The second question is much less firm right now. We would like make whitespace handling robust - if someone along the way uses a tool which breaks a line, we should be able to fix it rather than die. If we add the following character entities to our DTD, then it should be possible to use these to represent 'wanted' whitespace, and thus allow for a simple rule prior to checking message authentication - that is, remove all 'native' space, tab, LF, and CR from the #PCDATA and check what remains (whitespace inside tags is handled in a more draconian fashion). (According to the previous section, "Hi&spc;there!" will be checked exactly that way you see it here - not as "Hi there!" The question? - is this distinction (between eg the native 0x0009 and &tab; (which converts to 0x0009) going to be difficult to keep track of? --------------------------------------------------------------------------- Chris Smith xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 11 09:51:44 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:23 2004 Subject: XML vs the Dreaded Whitespace In-Reply-To: References: Message-ID: <3.0.1.16.19971211103053.17efbd0a@pop3.demon.co.uk> Thanks very much Chris, I'm probably not going to be much practical help, but I hope your posting catalyses a practical response from the SGML experts. I'd be surprised if conventional XML-enhanced SGML tools couldn't handle this problem, but I have no idea what they would cost. [The last flier I got was 2 orders of magnitude greater than an impecunious academic could afford.] At 03:00 11/12/97 -0500, Chris Smith wrote: > [... first problem punted ...] >The second question is much less firm right now. We would like make >whitespace handling robust - if someone along the way uses a tool >which breaks a line, we should be able to fix it rather than die. > >If we add the following character entities to our DTD, > > > > > > >then it should be possible to use these to represent 'wanted' >whitespace, and thus allow for a simple rule prior to checking message >authentication - that is, remove all 'native' space, tab, LF, and CR >from the #PCDATA and check what remains (whitespace inside tags is >handled in a more draconian fashion). (According to the previous >section, "Hi&spc;there!" will be checked exactly that way you see it >here - not as "Hi there!" The question? - is this distinction (between >eg the native 0x0009 and &tab; (which converts to 0x0009) going to be >difficult to keep track of? As one of the few authors of a generic native XML application I have to face this problem and have repeatedly failed to get practical solutions. the main response is: Yes, its' a problem and Yes, it's your problem As I understand it, your XML document may contain two sorts of white space: whitespace that matters whitespace that doesn't matter The latter may be inserted randomly by authors whose lines don't wrap. From my very limited experience of SGML I would say your approach looks a sensible one. However the major problem is 'where is your application software going to come from?' I have argued very strongly (and shall continue to do so), that there need to be generic conventions honoured by common application programs. Otherwise you have to write your own application for your problem. At present you have only two options: - write it yourself (and maintain it) - pay an SGML house to solve your problem for you I hope shortly to propose some generic whitespace problems (implemented in JUMBO) for certain types of document. I don't know whether they would solve your problems, but thanks for giving me the chance to think about a real problem. :-) As a corollary: Is anyone testing the ESIS output of the current crop of XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace model or the value of xml:space they should all produce identical ESIS (right?) If not, then one or more is wrong. And all applications should (IMO) be prepared to work with ESIS which I think is isomorphous with a WF XML document. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Thu Dec 11 10:57:59 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:23 2004 Subject: XML vs the Dreaded Whitespace Message-ID: <199712111033.VAA09204@jawa.chilli.net.au> Attached is a repost summary of the white-space characters available in XML from ISO 10646. Of course, it is still up to applications to implement them correctly. At the moment, spaces and newlines are very overloaded which causes all sorts of problems. So it would solve many problems to use these characters. For example, if you want a hard return, use the hard return character   and if you need non-collapsing white-space, use 　 In this particular case, one thing to do is put an attribute at the top-level element xml:space="preserve" to prevent collapsing and stripping of spaces and tabs. As far as CR/LF, I think the XML spec can only be interpreted to mean that

should be preserved. This is because 2.11 "To simplify the tasks of applications, wherever an external parsed entity or the literal entity value of an internal parsed entity contains either the literal two-character sequence "#xD#xA" or a standalone literal #xD, an XML processor must pass to the application the single character #xA. (This behavior can conveniently be produced by normalizing all line breaks to #xA on input, before parsing.)" So normalization *should* apply only to direct characters, not references. However, I dont think you can trust parsers to do this. So if you want to send facsimile documents with whitespace preserved, you might find you have to use a Unicode private-use-area character to substitute for CR. Your application at the other end has to replace that character again to reconstruct the document. For example, you could use This is a case where you want to do something that is definitely contrary to the simplifying rules of XML, so don't be alarmed that you have to use markup (which you give a significance to) rather than being able to do it direct. Rick Jelliffe -------------- next part -------------- A non-text attachment was scrubbed... Name: space (1).htm Type: application/octet-stream Size: 2841 bytes Desc: space (1).htm (Internet Document (HTML)) Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971211/7421f8e2/space1.obj From ak117 at freenet.carleton.ca Thu Dec 11 11:35:21 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:23 2004 Subject: XML vs the Dreaded Whitespace In-Reply-To: References:

Message-ID: <199712111134.GAA00411@unready.microstar.com> Chris Smith writes: > There are two questions at hand, largely directed at those creating > parsers. I'd like to know if the application requirements we are > proposing ("what to do with the document") are going to be incredibly > difficult to manage, given what the parsers are providing. I confess > I'm just getting started here - I will get to investigating the > various parsers. For now the questions may be useful anyway. > > The first criteria is that message authentication is applied to an > element in the document. This is a start to precisely defining what is > being checked. The second criteria is that the message authentication > must be applied to the XML document as represented in UTF-16 encoding, > with big-endian convention, AS IT IS WRITTEN. This is to prevent us > having to specify a consistent *internal* representation. The XML spec > itself helps define a consistent *external* representation, which we > figure is easier to stick with than dealing with all the > cross-platform issues. The question: can this readily be dealt with? > Is it straight-forward to ask for MessageAuthentication over > ..., with all the content included? It would be possible to use a parser to do authentication by generating checksums based on a normalised version of each element, but not to do it based on the external representation. Right now, parsers must report whitespace in mixed content and sort-of report it in element content (yech). There is no requirement to report whitespace within markup, however. As a result, parsers are very unlikely to report any difference between the following two examples (assuming that the "idrefs" attribute is declared as IDREFS in the DTD): Example 1: This is a link. Example 2: This is a link. There are many other problems too, include comments, whitespace outside of the document element, etc., etc. I'd recommend that you do your checksum validation on any files that you have transmitted _before_ you parse them; that way, you can use existing software (it doesn't have to be XML-aware). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Dec 11 11:42:29 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:23 2004 Subject: XML vs the Dreaded Whitespace In-Reply-To: <3.0.1.16.19971211103053.17efbd0a@pop3.demon.co.uk> References:

<3.0.1.16.19971211103053.17efbd0a@pop3.demon.co.uk> Message-ID: <199712111141.GAA00445@unready.microstar.com> Peter Murray-Rust writes: > As a corollary: Is anyone testing the ESIS output of the current crop of > XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace model > or the value of xml:space they should all produce identical ESIS (right?) > If not, then one or more is wrong. And all applications should (IMO) be > prepared to work with ESIS which I think is isomorphous with a WF XML > document. There are quite a few more XML parsers out there, including at least one in TCL -- see http://www.sil.org/sgml/XML.html#xmlSoftware As for ESIS, there are some problems that we'd have to overcome first: 1) How should empty elements be represented? Right now, ?lfred generates a startElement event immediately followed by an endElement event. 2) How should the XML declaration be represented? Should it appear as a processing instruction, or should it be ignored? 3) How should space in element content be handled? According to the spec, a DTD-aware parser should handle whitespace in element content differently from whitespace in mixed content (?lfred just ignores whitespace in element content right now). 4) DTD-aware and non-DTD-aware parsers will handle whitespace in attribute values differently. Non-DTD-aware parsers will treat all attributes as CDATA, but DTD-aware parsers will treat tokenised attributes specially, by stripping all leading an trailing whitespace, and normalising internal whitespace to single spaces. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mecom-gmbh at mixx.de Thu Dec 11 13:38:38 1997 From: mecom-gmbh at mixx.de (james anderson too) Date: Mon Jun 7 16:59:23 2004 Subject: Message-ID: <348FEDF2.FD454815@mixx.de> i think the "proposed recommendation" drafters agree with you. to wit (from http://www.w3.org/TR/PR-xml-971208): [17] PI ::= '' Char*)))? '?>' [18] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) Richard Light wrote: > Hi, > > I notice that the current draft has switched the case of the XML > declaration and its arguments to lower case: > > > > Now that case is significant, this presumably matters. Is there a > particular reason for this? Other PIs will have a PItarget where 'xml' > sits, and this isn't constrained to be any particular case. Wouldn't it > be kinder to make it ' > (The DTD declarations ( for compatibility with what SGML systems produce.) > > Richard. > > Richard Light > SGML/XML and Museum Information Consultancy > richard@light.demon.co.uk > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 11 13:39:31 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:23 2004 Subject: XML vs the Dreaded Whitespace In-Reply-To: <199712111141.GAA00445@unready.microstar.com> References: <3.0.1.16.19971211103053.17efbd0a@pop3.demon.co.uk>

<3.0.1.16.19971211103053.17efbd0a@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971211143739.37172be2@pop3.demon.co.uk> At 06:41 11/12/97 -0500, David Megginson wrote: >Peter Murray-Rust writes: > > > As a corollary: Is anyone testing the ESIS output of the current crop of > > XML parsers (4 Java + nsgmls, I think)? Regardless of the whitespace model > > or the value of xml:space they should all produce identical ESIS (right?) > > If not, then one or more is wrong. And all applications should (IMO) be > > prepared to work with ESIS which I think is isomorphous with a WF XML > > document. > >There are quite a few more XML parsers out there, including at least >one in TCL -- see > > http://www.sil.org/sgml/XML.html#xmlSoftware Apologies to anyone I missed. I am a great fan of tcl and wrote costwish in it to sit on top of Joe English's CoST... > >As for ESIS, there are some problems that we'd have to overcome first: Are there? How does a WF document differ from the corresponding ESIS stream? IOW if I do the transformation: WF -> ESIS -> WF shouldn't I be able to recover the original? > >1) How should empty elements be represented? Right now, ?lfred generates a > startElement event immediately followed by an endElement event. Yes - and JUMBO is happy with that. As far as JUMBO os concerned and are processed in the same way and I will need a very clear argument to convince me that it should do different. > >2) How should the XML declaration be represented? Should it appear as > a processing instruction, or should it be ignored? JUMBO regards it as a PI. I hang all PIs off the preceding ELEMENT (not PCDATA). In that way the tree can be processed with these intact. JUMBO understands namespace PIs, PIs and will also store the others. It's useful to store them in case one wants to compare trees. BTW - although it is nowhere stated most people seem to create PIs as name-value pairs and JUMBO expects this. > >3) How should space in element content be handled? According to the > spec, a DTD-aware parser should handle whitespace in element > content differently from whitespace in mixed content (?lfred just > ignores whitespace in element content right now). This is a critical area for the parser writers to agree on. I assume that for the DTD-aware stuff there has to be a validating parser (i.e. one that matches contentspec against element content). I am not sure what algorithms are being used - JUMBO wants a java one for its birthday, please - but I can imagine that with certain contentspecs they might get different answers. > >4) DTD-aware and non-DTD-aware parsers will handle whitespace in > attribute values differently. Non-DTD-aware parsers will treat all > attributes as CDATA, but DTD-aware parsers will treat tokenised > attributes specially, by stripping all leading an trailing > whitespace, and normalising internal whitespace to single spaces. In this case presumably only the TYPE in the ATTLIST is needed. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Dec 11 15:08:00 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:23 2004 Subject: References: <348FEDF2.FD454815@mixx.de> Message-ID: <199712111506.KAA00685@unready.microstar.com> james anderson too writes: > i think the "proposed recommendation" drafters agree with you. to wit (from > http://www.w3.org/TR/PR-xml-971208): > > [17] PI ::= '' Char*)))? '?>' > [18] PITarget ::= Name - (('X' | 'x') ('M' | 'm') ('L' | 'l')) No, not at all. At least as I read it (and I'm not in the WG or the SIG), you _must_ begin the XML declaration with lowercase " The situation is complicated by the fact that W3C is working on and has not yet released its own version of Java XML Object Model. Since it will be difficult to have all existing Java XML parsers to conform to a single object model, I think the best approach is for someone to write a new Java parser framework which provides a reasonable object model and acts as the Universal XML Parser (UXP?:-). UXP should use some kind of simple registry scheme and a UI to allow users to plug in new UXP compatible parsers. Writing UXP adapters for each of existing Java XML parsers should not be too hard. Once UXP is in place, new parsers will start to conform. When W3C XML API is out, all we need to do is write two adapters: 1) UXP to W3C adapter so programs using W3C XML API can use UXP parsers (i.e. JavaScript). 2) W3C to UXP adapter so programs using UXP can use any XML parsers providing W3C XML API. BTW, I have taken a look at Xapi-J and W3C OM API and, frankly, I am not satisfied with either of them. Enumeration by index is problematic and callbacks are either not supported or primitive. Not that I can offer any better in the near future . Call me a stuck up critic, if you will. Don >Yes, please. This list (especially John Tigue) worked hard to come up with >Xapi-J - everyone seemed to think it was a good way forward, but no parsers >implement it. Instead we have an increasing (and rather difficult) variety >of approaches (and especially terminology). For example, it's clear that >AElfred and Lark use 'Entity' in different ways [I'm slightly confused by >Lark's use of Entity]. > >Parsers are NOT equivalent, and there are many reasons why an application >may wish to use more than one. > - different interfaces, giving different views of the document > - different optimisations of speed, memory, etc. > - different treatment of entities > - different features > >It's very tedious to have to implement different interfaces for each >(AElfred has about 30 methods - and they are all valuable). So: > - Chris > - David > - James > - John > - Norbert > - Tim >any comments on a common interface :-)? > > P. > >Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Thu Dec 11 17:09:15 1997 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 16:59:23 2004 Subject: MSXML 1.8 Viewer Applet problem Message-ID: <01bd0657$6f259fa0$1e09e391@mhklaptop.bra01.icl.co.uk> I'm using the XML Viewer applet in MSXML 1.8 Having trouble because there doesn't seem to be any way of closing the file after you've finished with it, so all subsequent attempts to edit the XML file after viewing it fail saying "file in use". Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dima at paragraph.com Thu Dec 11 17:20:00 1997 From: dima at paragraph.com (Dmitri Kondratiev) Date: Mon Jun 7 16:59:24 2004 Subject: Newbie Q: NXP attrribute validation Message-ID: <2.2.32.19971211172105.0091a70c@dream.paragraph.com> Please help a newbie DTD writer :) I am trying to validate with NXP attribute ID in element Foo : With the following DTD: As a result I get "Attribute has not be declared : ID" error. What am I doing wrong ? Thanks, Dima --- NXP output: NXP - Norbert's XML Parser 0.97 - 05.08.1997 Fetch file : test/test.xml Start parsing ... Validate : true Fetch file : test/FooBar.dtd " " Error : Attribute has not be declared : ID " " " " Error : Parsing stopped with exception : java.util.EmptyStackException Parsing finished - Time : 490 msec. --------------------------- dima@paragraph.com 102401.2457@compuserve.com http://www.geocities.com/SiliconValley/Lakes/3767/ tel: 07-095-464-9241 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Patrice.Bonhomme at loria.fr Thu Dec 11 18:02:08 1997 From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme) Date: Mon Jun 7 16:59:24 2004 Subject: BUG : msxml 1.6 In-Reply-To: Your message of "Mon, 01 Dec 1997 09:33:28 PST." <2F2DC5CE035DD1118C8E00805FFE354C099E93@red-msg-56.dns.microsoft.com> Message-ID: <199712111801.TAA06638@chimay.loria.fr> I have downloaded msxml 1.8 and tried to run it on my sample files and it seems that the EXTENTITYDCL has not been fixed. I have always a "stammering" inclusion of the external data ! What's wrong ? Thanks, Pat. DOCUMENT |---XMLDECL | +---CDATA " VERSION="1.0" " |---WHITESPACE 0xa |---DOCTYPE NAME="EXAMPLE" | |---WHITESPACE 0xa | |---ELEMENTDECL EXAMPLE (P)+ | |---WHITESPACE 0xa | |---ELEMENTDECL P (#PCDATA|S)* | |---WHITESPACE 0xa | |---ELEMENTDECL S (#PCDATA)* | |---WHITESPACE 0xa | +---EXTENTITYDCL incs | |---ELEMENT S | | +---PCDATA "a third." | +---PCDATA "a third. " <--- HERE |---WHITESPACE 0xa |---ELEMENT EXAMPLE | |---WHITESPACE 0xa | |---ELEMENT P | | |---ELEMENT S | | | +---PCDATA "A sentence." | | |---ELEMENT S | | | +---PCDATA "An another." | | +---ENTITYREF incs "a third.a third. " <--- AND HERE | +---WHITESPACE 0xa +---WHITESPACE 0xa [] Chris Lovett said: []--------------------------------- ] Thanks, I have a fix already, and will be posting it shortly. ] ] > -----Original Message----- ] > From: Patrice Bonhomme [SMTP:Patrice.Bonhomme@loria.fr] ] > Sent: Saturday, November 29, 1997 1:37 AM ] > To: Chris Lovett ] > Subject: BUG : msxml 1.6 ] > ] > ] > Hi, ] > ] > I found a bug in msxml 1.6 relative to the External Entity checking. ] > ] > Main file (test-ent.xml): ] > ] > ] > ] > ] > ] > ] > ] > ]> ] > ] >

] > ~~a sentence.an another.~~ ] >

] >

&inc-s;

] > ] > ] > Auxiliary file (inc-s.xml): ] > ~~a third.~~ ] > ] > And i ve got this message : ] > ] > % java msxml -i -d test-ext-ent.xml ] > Invalid element 'PCDATA' in content of 'P'. Expected [S] ] > Location: file:test-ext-ent.xml(14,5) ] > Context:

] > ] > The parser should make a difference between ENTITYREF and SYSTEM ] > ENTITYREF. ] > ] > Pat. ] > -- ] > ============================================================== ] > bonhomme@loria.fr | Office : B.228 ] > http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 ] > -------------------------------------------------------------- ] > * Projet Aquarelle : http://aqua.inria.fr ] > * Serveur Silfide : http://www.loria.fr/Projet/Silfide ] > ============================================================== ] > []--------------------------------- -- ============================================================== bonhomme@loria.fr | Office : B.228 http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 -------------------------------------------------------------- * Projet Aquarelle : http://aqua.inria.fr * Serveur Silfide : http://www.loria.fr/Projet/Silfide ============================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 11 18:18:24 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:24 2004 Subject: General comments on parsers In-Reply-To: <000801bd0650$77334140$0100007f@localhost> Message-ID: <3.0.1.16.19971211183623.166722a2@pop3.demon.co.uk> At 08:18 11/12/97 -0800, Don Park wrote: >The situation is complicated by the fact that W3C is working on and has not >yet released its own version of Java XML Object Model. Since it will be Is this the same as DOM? If so, is there any timescale. Not being part of the DOM process I am now somewhat confused. Does this mean that there is a formal program to produce an API for XML parsers? If so, what is the timescale? I'm sure there are some readers who are involved ;-) I'm an impatient beast and I worry about waiting for things like this to happen if it's going to be a long time. During that time we'll have another 5-10 Java based parsers, all with different terminology. In another proposal I will try to address the terminology :-) >difficult to have all existing Java XML parsers to conform to a single >object model, I think the best approach is for someone to write a new Java >parser framework which provides a reasonable object model and acts as the >Universal XML Parser (UXP?:-). Is this a short-term or long term solution? If long term, what is the difference/benefit between this and the OM? > >UXP should use some kind of simple registry scheme and a UI to allow users Please [ignorance] what does a registry scheme entail? >to plug in new UXP compatible parsers. Writing UXP adapters for each of >existing Java XML parsers should not be too hard. Once UXP is in place, new >parsers will start to conform. When W3C XML API is out, all we need to do >is write two adapters: > >1) UXP to W3C adapter so programs using W3C XML API can use UXP parsers >(i.e. JavaScript). >2) W3C to UXP adapter so programs using UXP can use any XML parsers >providing W3C XML API. > >BTW, I have taken a look at Xapi-J and W3C OM API and, frankly, I am not Where is the reference for W3C OM API? >satisfied with either of them. Enumeration by index is problematic and >callbacks are either not supported or primitive. Not that I can offer any >better in the near future . Call me a stuck up critic, if you will. > I take a very simple approach and find that the AElfred approach gives me almost everything I want. It allows me to extract the components of the document (start/end/content, PIs, entities) and it allows me to get almost everything from the DTD (except the contentspec). I don't think that *I* need anything more. I just don't want - and don't intend to write 30 adapter functions for every new parser. If everyone had getContentSpec(String elementType) that is the level I am quite happy with :-) P. >Don > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mike at datachannel.com Thu Dec 11 18:39:19 1997 From: mike at datachannel.com (Mike Dierken) Date: Mon Jun 7 16:59:24 2004 Subject: General comments on parsers Message-ID: <01BD0620.5FC70300@NEMO> I do believe that the Java XML Object Model referred to is the same as the W3C DOM. However, the DOM is programming language independent. I don't know the timeframe for final acceptance, however, XML parser writers are free to read up on the working draft and align their code with the defined functionality. The W3C DOM page is here: http://www.w3.org/DOM/ The DOM Spec is here: http://www.w3.org/TR/WD-DOM/ "The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents." Mike D DataChannel -----Original Message----- From: Peter Murray-Rust [SMTP:peter@ursus.demon.co.uk] Sent: Thursday, December 11, 1997 10:36 AM To: xml-dev@ic.ac.uk Subject: Re: General comments on parsers At 08:18 11/12/97 -0800, Don Park wrote: >The situation is complicated by the fact that W3C is working on and has not >yet released its own version of Java XML Object Model. Since it will be Is this the same as DOM? If so, is there any timescale. Not being part of the DOM process I am now somewhat confused. Does this mean that there is a formal program to produce an API for XML parsers? If so, what is the timescale? I'm sure there are some readers who are involved ;-) I'm an impatient beast and I worry about waiting for things like this to happen if it's going to be a long time. During that time we'll have another 5-10 Java based parsers, all with different terminology. In another proposal I will try to address the terminology :-) >difficult to have all existing Java XML parsers to conform to a single >object model, I think the best approach is for someone to write a new Java >parser framework which provides a reasonable object model and acts as the >Universal XML Parser (UXP?:-). Is this a short-term or long term solution? If long term, what is the difference/benefit between this and the OM? > >UXP should use some kind of simple registry scheme and a UI to allow xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Thu Dec 11 18:45:34 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:24 2004 Subject: General comments on parsers Message-ID: <000c01bd0664$7a677a20$0100007f@localhost> >At 08:18 11/12/97 -0800, Don Park wrote: >>The situation is complicated by the fact that W3C is working on and has not >>yet released its own version of Java XML Object Model. Since it will be > >Is this the same as DOM? If so, is there any timescale. > >Not being part of the DOM process I am now somewhat confused. Does this >mean that there is a formal program to produce an API for XML parsers? If >so, what is the timescale? I'm sure there are some readers who are involved >;-) Sorry about the confusion. I am pretty careless with names and stuff. I was refering to DOM level-one XML which btw is out already in draft form (reality lag) at http://www.w3.org/TR/WD-DOM/level-one-xml-971209.html. They also have one for HTML so I should be able to get through another weekend with buying a book to read . So, we could probably implement the UXP based on XML DOM (gosh, I am provising terms left and right). >I'm an impatient beast and I worry about waiting for things like this to >happen if it's going to be a long time. During that time we'll have another >5-10 Java based parsers, all with different terminology. In another >proposal I will try to address the terminology :-) That was the shortest wait ever, eh? >Is this a short-term or long term solution? If long term, what is the >difference/benefit between this and the OM? Long term solution. No difference now since we have better outline of XML DOM to work with. >Please [ignorance] what does a registry scheme entail? I don't know how your JUMBO allows different parsers to be used but I was talking about registry for storing current user preferences as far as which parser to use in your application. It could even involve some migrating DOM liason classes for enhancing visual representation of XML documents. Currently, I have this vexing problem of trying to figure out how to represent an XML document as a tree of objects where each object is something more than a tag. CDF has a Channel object which contains attributes which represented as tags as well as contents of tags. Exposing those attributes as a tree node would be too distracting, especially since I have a perfectly nice object inspector to show the attributes in. >Where is the reference for W3C OM API? See above. Sorry again about the confucious glibbing (here I go again, making sense only to myself). >I take a very simple approach and find that the AElfred approach gives me >almost everything I want. It allows me to extract the components of the >document (start/end/content, PIs, entities) and it allows me to get almost >everything from the DTD (except the contentspec). I don't think that *I* >need anything more. I just don't want - and don't intend to write 30 >adapter functions for every new parser. If everyone had >getContentSpec(String elementType) that is the level I am quite happy with :-) Is this a different song? Hmm, I swear I heard something else before...;-) Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Thu Dec 11 19:12:44 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:24 2004 Subject: General comments on parsers (was [NEW] AElfred) Message-ID: <199712111912.TAA25575@GPO.iol.ie> > Chris >Lovett suggested using the XML-Data Schemas instead of trying to access >the DTD info directly. When one wants access to the DTD, what is the >recommended method? > You can use the XML-Data approach but maintain the ability to work with the standard DTD syntax by using msxml to spit out the XML-Data encoding of the DTD info and then re-parse it. Maybe this is what you meant though... If so, sorry. If not, hope this helps. Sean Mc Grath sean@digitome.com Digitome Electronic Publishing http://www.digitome.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dima at paragraph.com Thu Dec 11 21:57:02 1997 From: dima at paragraph.com (Dmitri Kondratiev) Date: Mon Jun 7 16:59:25 2004 Subject: *Validating* XML Parser written in Java ? Message-ID: <2.2.32.19971211215544.006caa0c@dream.paragraph.com> Hi, Does anybody know any free *validating* XML parserers written in Java ? With NXP I haven't managed to suceed to validate the following : With FooBar.dtd file in the same directory : I have the following output : java NXP.Cl -v -f test/test.xml NXP - Norbert's XML Parser 0.97 - 05.08.1997 Fetch file : test/test.xml Start parsing ... Validate : true Fetch file : test/FooBar.dtd " " Error : Attribute has not be declared : ID " " Parsing finished - Time : 1260 msec. Any help is most welcome! Dima ----------------- Dmitri Kondratiev dima@paragraph.com 102401.2457@compuserve.com http://www.geocities.com/SiliconValley/Lakes/3767/ tel: 07-095-464-9241 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Dec 11 22:03:53 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:25 2004 Subject: *Validating* XML Parser written in Java ? In-Reply-To: <2.2.32.19971211215544.006caa0c@dream.paragraph.com> References: <2.2.32.19971211215544.006caa0c@dream.paragraph.com> Message-ID: <199712112202.RAA05850@unready.microstar.com> Dmitri Kondratiev writes: > Does anybody know any free *validating* XML parserers written in Java ? There is a serious problem right now with the XML terminology. There are at least four Java-based XML parsers right now that will parse a DTD: - Lark - MSXML - NXP (a little out of date) - ?lfred Of these, I think that only MSXML claims to be validating. Do you need full validation, or do you just need a DTD-driven parser that will pick up entity declarations, default attribute values, etc? We really need to invent some better terms, since validation and DTD-awareness are really separate concepts. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Dec 11 22:23:33 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:25 2004 Subject: AElfred 1.0beta3 release Message-ID: <199712112222.RAA06459@unready.microstar.com> There is a new release of Microstar's �lfred XML parser at http://www.microstar.com/XML/ The new version is still interface-compatible with the first two public betas, but it adds the ability to query for content models and enumerated attribute types (both returned as normalised strings, with whitespace removed and parameter entities resolved). With the new query routines, �lfred is now capable of producing a normalised version of an XML document's DTD; in fact, the distribution now includes a new demonstration class, DtdDemo.java, that does exactly that. Enjoy! David (on behalf of Microstar) -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 11 23:19:24 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:25 2004 Subject: Newbie Q: NXP attribute validation In-Reply-To: <2.2.32.19971211172105.0091a70c@dream.paragraph.com> Message-ID: <3.0.1.16.19971211185135.2007d876@pop3.demon.co.uk> At 20:21 11/12/97 +0300, Dmitri Kondratiev wrote: Hi Dima, >Please help a newbie DTD writer :) I am trying to validate with NXP >attribute ID in element Foo : > > > > > > > This is not a well-formed document and the last line should (probably) be: instead of [...] > >As a result I get "Attribute has not be declared : ID" error. What am I >doing wrong ? > One of the problems with XML parsers (rather like compilers) is that it can be quite difficult to produce error messages that tell you precisely what is wrong. So I can't tell you *why* you got this message, but most error messages are 'somewhere near' the error. Sometimes it can be helpful to run more than one parser because they often give different clues. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dima at paragraph.com Thu Dec 11 23:50:23 1997 From: dima at paragraph.com (Dmitri Kondratiev) Date: Mon Jun 7 16:59:25 2004 Subject: Newbie Q: NXP attribute validation Message-ID: <2.2.32.19971211234901.006d9474@dream.paragraph.com> At 18:51 11.12.97, Peter Murray-Rust wrote: ... > > >This is not a well-formed document and the last line should (probably) be: > > instead of > >[...] >> >>As a result I get "Attribute has not be declared : ID" error. What am I >>doing wrong ? >> >One of the problems with XML parsers (rather like compilers) is that it can >be quite difficult to produce error messages that tell you precisely what >is wrong. So I can't tell you *why* you got this message, but most error >messages are 'somewhere near' the error. > >Sometimes it can be helpful to run more than one parser because they often >give different clues. > Peter, Thanks for your help. With my stupid bug ( to ) corrected, I still have the same error ! The following NXP output shows that it uses the FooBar.dtd, as specified in test.xml file. What can be wrong then ? --Dima NXP output: NXP - Norbert's XML Parser 0.97 - 05.08.1997 Fetch file : test/test.xml Start parsing ... Validate : true Fetch file : test/FooBar.dtd " " Error : Attribute has not be declared : ID " " Parsing finished - Time : 1260 msec. ----------------- Dmitri Kondratiev dima@paragraph.com 102401.2457@compuserve.com http://www.geocities.com/SiliconValley/Lakes/3767/ tel: 07-095-464-9241 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 12 00:11:46 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:25 2004 Subject: General comments on parsers In-Reply-To: <000c01bd0664$7a677a20$0100007f@localhost> Message-ID: <3.0.1.16.19971212005949.190f27fa@pop3.demon.co.uk> At 10:39 11/12/97 -0800, Don Park wrote: >Sorry about the confusion. I am pretty careless with names and stuff. I >was refering to DOM level-one XML which btw is out already in draft form >(reality lag) at http://www.w3.org/TR/WD-DOM/level-one-xml-971209.html. >They also have one for HTML so I should be able to get through another >weekend with buying a book to read . I have had a spook through it this evening... I appreciate that an API may come out of it. [...] > > >That was the shortest wait ever, eh? ??? > > >I don't know how your JUMBO allows different parsers to be used but I was Very simple. I read the interfaces, try to understand what they are talking about, try to configure JUMBO so it reads them, see if I understand the results and take it from there. Unfortunately this has to be done for very parser :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 12 00:19:19 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:25 2004 Subject: *Validating* XML Parser written in Java ? In-Reply-To: <199712112202.RAA05850@unready.microstar.com> References: <2.2.32.19971211215544.006caa0c@dream.paragraph.com> <2.2.32.19971211215544.006caa0c@dream.paragraph.com> Message-ID: <3.0.1.16.19971212010654.2aefd570@pop3.demon.co.uk> At 17:02 11/12/97 -0500, David Megginson wrote: [...] >Of these, I think that only MSXML claims to be validating. Do you >need full validation, or do you just need a DTD-driven parser that >will pick up entity declarations, default attribute values, etc? We >really need to invent some better terms, since validation and >DTD-awareness are really separate concepts. Terminology is really critical here and I shall address it later. If everyone agrees on the terms half the problems will be solved. :-) P Wait for the XML-based hyperglossary of XML terminology (next week) Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Fri Dec 12 04:32:56 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:59:25 2004 Subject: BUG : msxml 1.6 Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C099F41@red-msg-56.dns.microsoft.com> You're right. I'll look into it. > -----Original Message----- > From: Patrice Bonhomme [SMTP:Patrice.Bonhomme@loria.fr] > Sent: Thursday, December 11, 1997 10:01 AM > To: Chris Lovett > Cc: xml-dev Mailing List > Subject: Re: BUG : msxml 1.6 > > > > I have downloaded msxml 1.8 and tried to run it on my sample files and it > seems that the EXTENTITYDCL has not been fixed. I have always a > "stammering" > inclusion of the external data ! > > What's wrong ? > > Thanks, > > Pat. > > DOCUMENT > |---XMLDECL > | +---CDATA " VERSION="1.0" " > |---WHITESPACE 0xa > |---DOCTYPE NAME="EXAMPLE" > | |---WHITESPACE 0xa > | |---ELEMENTDECL EXAMPLE (P)+ > | |---WHITESPACE 0xa > | |---ELEMENTDECL P (#PCDATA|S)* > | |---WHITESPACE 0xa > | |---ELEMENTDECL S (#PCDATA)* > | |---WHITESPACE 0xa > | +---EXTENTITYDCL incs > | |---ELEMENT S > | | +---PCDATA "a third." > | +---PCDATA "a third. " <--- HERE > |---WHITESPACE 0xa > |---ELEMENT EXAMPLE > | |---WHITESPACE 0xa > | |---ELEMENT P > | | |---ELEMENT S > | | | +---PCDATA "A sentence." > | | |---ELEMENT S > | | | +---PCDATA "An another." > | | +---ENTITYREF incs "a third.a third. " <--- AND HERE > | +---WHITESPACE 0xa > +---WHITESPACE 0xa > > > [] Chris Lovett said: > []--------------------------------- > ] Thanks, I have a fix already, and will be posting it shortly. > ] > ] > -----Original Message----- > ] > From: Patrice Bonhomme [SMTP:Patrice.Bonhomme@loria.fr] > ] > Sent: Saturday, November 29, 1997 1:37 AM > ] > To: Chris Lovett > ] > Subject: BUG : msxml 1.6 > ] > > ] > > ] > Hi, > ] > > ] > I found a bug in msxml 1.6 relative to the External Entity checking. > ] > > ] > Main file (test-ent.xml): > ] > > ] > ] > > ] > > ] > > ] > > ] > > ] > > ] > ]> > ] > > ] >

> ] > ~~a sentence.an another.~~ > ] >

> ] >

&inc-s;

> ] > > ] > > ] > Auxiliary file (inc-s.xml): > ] > ~~a third.~~ > ] > > ] > And i ve got this message : > ] > > ] > % java msxml -i -d test-ext-ent.xml > ] > Invalid element 'PCDATA' in content of 'P'. Expected [S] > ] > Location: file:test-ext-ent.xml(14,5) > ] > Context:

> ] > > ] > The parser should make a difference between ENTITYREF and SYSTEM > ] > ENTITYREF. > ] > > ] > Pat. > ] > -- > ] > ============================================================== > ] > bonhomme@loria.fr | Office : B.228 > ] > http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 > ] > -------------------------------------------------------------- > ] > * Projet Aquarelle : http://aqua.inria.fr > ] > * Serveur Silfide : http://www.loria.fr/Projet/Silfide > ] > ============================================================== > ] > > []--------------------------------- > > > -- > ============================================================== > bonhomme@loria.fr | Office : B.228 > http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 > -------------------------------------------------------------- > * Projet Aquarelle : http://aqua.inria.fr > * Serveur Silfide : http://www.loria.fr/Projet/Silfide > ============================================================== > > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Fri Dec 12 07:02:36 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:59:25 2004 Subject: MSXML 1.8 Viewer Applet problem Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C099F4F@red-msg-56.dns.microsoft.com> Wow, I don't get this at all. It should close the file immediately after it's finished parsing it. Try reinstalling, there was a bad xmlinst up there for a couple of days. > -----Original Message----- > From: Michael Kay [SMTP:M.H.Kay@eng.icl.co.uk] > Sent: Thursday, December 11, 1997 9:09 AM > To: xml-dev@ic.ac.uk > Subject: MSXML 1.8 Viewer Applet problem > > I'm using the XML Viewer applet in MSXML 1.8 > > Having trouble because there doesn't seem to be any way of closing the > file > after you've finished with it, so all subsequent attempts to edit the XML > file after viewing it fail saying "file in use". > > Mike Kay > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Fri Dec 12 07:11:50 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:59:25 2004 Subject: *Validating* XML Parser written in Java ? Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C099F50@red-msg-56.dns.microsoft.com> The Microsoft XML Parser for Java is available for download from http://www.microsoft.com/standards/xml/xmlparse.htm and I've tested it on your example below and it works just fine. The exact terms of use are described in the License Agreement in http://www.microsoft.com/standards/xml/xmllic.htm, which I think you'll find to be very open ended. > -----Original Message----- > From: Dmitri Kondratiev [SMTP:dima@paragraph.com] > Sent: Thursday, December 11, 1997 1:56 PM > To: xml-dev@ic.ac.uk > Subject: *Validating* XML Parser written in Java ? > > Hi, > > Does anybody know any free *validating* XML parserers written in Java ? > With NXP I haven't managed to suceed to validate the following : > > > > > > > > > With FooBar.dtd file in the same directory : > > > > > > ID ID #REQUIRED> > > > I have the following output : > > java NXP.Cl -v -f test/test.xml > > NXP - Norbert's XML Parser 0.97 - 05.08.1997 > > Fetch file : test/test.xml > Start parsing ... > Validate : true > Fetch file : test/FooBar.dtd > > " > " > > Error : > Attribute has not be declared : ID > > " > " > > Parsing finished - Time : 1260 msec. > > Any help is most welcome! > Dima > > > ----------------- > Dmitri Kondratiev > dima@paragraph.com > 102401.2457@compuserve.com > http://www.geocities.com/SiliconValley/Lakes/3767/ > tel: 07-095-464-9241 > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Fri Dec 12 08:50:27 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:25 2004 Subject: Classification of XML Parsers Message-ID: <199712120850.IAA18479@GPO.iol.ie> The real truth behind XML's simplicity and ease of implementaton is being badly let down by the haziness with with parsers are classified:- Well Formed Valid Type Valid (In the DOM level 1 spec.) Tag Valid (ditto) DTD Aware (Aelfred) Then there is a bevvy of terminology to do with what the parsers do and do not provide the application - Comments - Expansion of general entities - Access to element type declarations etc. Given that it is on this list that most of the implementors hang out I think we could usefully attempt to put together a classification. Also, from a quick reading of the DOM there does not seem to be a node type for unexpanded general entity. How come? Sean Mc Grath sean@digitome.com Digitome Electronic Publishing http://www.digitome.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mecom-gmbh at mixx.de Fri Dec 12 11:29:27 1997 From: mecom-gmbh at mixx.de (james anderson) Date: Mon Jun 7 16:59:25 2004 Subject: external dtd subset content Message-ID: <34912136.2C5BEBBA@mixx.de> we're trying to understand the necessary form for the external dtd subset. in particular two questions have arisen. 1? since the external subset contains markup declarations only, it would appear that it establishes no constraint on the root element. is it legitimate to use the same dtd for various xml documents, each with a different root element? 2? among the example DTD's we've found, some begin with an form. others don't. isn't that form excluded from being a PI and thus from being a markupdecl? xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Fri Dec 12 14:22:25 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:25 2004 Subject: Java DOM ObjectBuilder Message-ID: I have a first pass at an ObjectBuilder that generates objects based on the W3C Java DOM Interfaces[*]. So any XML-Parser with a BuilderClient [currently MS-XML and Aelfred] can generate DOM objects, including the Model information itself. It is also easy to modify both the objects and the construction process to be different from the DOM specific ones (e.g. "Tag" specific objects instead of generic Elements). This applies to the DTD objects (Use a different Node, ElementDefinition, or any other interface/class) as well as the normal Element content. If enough people are interested I will try to make a specific release of this code and the minimum amount of MONDO that is needed to make it work (see below for size information), otherwise I will include it as an example in the next MONDO release. The rest of this mail just discusses the details a bit more. ------------------------------------------------------- The DOM ObjectBuilding process can generally be 1-pass (direct) from the parser, except for the DTD which parsers digest first and must be 'redescribed' to the builder. For Aelfred, it looks something like this: +---------------+ XmlParser->| XmlProcessor |-->DOMObBuilder->SpecificFactory->>DOMObject | BuilderClient |^ or BeanFactory v +---------------+ \<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<>' indicates the Factory actually creates an object (at least conceptually) and the '<<<' is a return line for that object to be used in the subsequent recipe. Generally the arrows to the right are the response to an ESIS type of event, but the ordering for building is sometimes a little different (attribute processing occurs inside an object's context, not before). You can think of Recipes as a more general ESIS event model with a feedback loop. Sending the DTD across is the one exception in terms of the ESIS analogy, because it is not a part of the event flow. It needs to be redescribed as soon as it is available. For Aelfred, the DTD is sent to the builder at the 'doctypeDecl' which looks like this at the moment [We are in the XmlProcessor/BuilderClient]: public void doctypeDecl (XmlParser p, String name, String pubid, String sysid) { this.startObject(DOM_DOCUMENT_TYPE_RECIPE); this.startParameter("externalSubset"); Enumeration enum = p.declaredElements(); while (enum.hasMoreElements()) { String elementName = (String) enum.nextElement(); buildObjectForElementDefNamed_in(elementName,p); } this.finishParameter(); this.finishObject(); } Conceptually, the recipe for a DocumentType looks like: ----------------- occurrence = tokens = ( ) > > ... ----------------- or in an XML-Recipe form it would look like: ----------------- ... ----------------- The DTD recipe and the normal Element content recipes are shipped to the ObjectBuilder which has the necessary factories to build objects from the recipe. For the DTD recipes it builds pre-known and very specific classes: "Document", "ElementDefinition", "ModelGroup", etc. For the Element content the ObjectBuilder currently builds a generic Element hierarchy. The construction process for both the DTD and the Elements can be easily (and almost arbitrarily) changed. The two semi-constants ar the DOM recipes which are encoded into the DOM-oriented BuilderClient and the source document itself. It is also easy to turn on and off the DTD generation in the BuilderClient, and the result of a document without a DTD is a DOM Document object with a null DocumentType. SIZE and Other Info =================== The total amount of MONDO-oriented DOM Building code is about 10K. This is divided into 6 factories for the enumerated types, 1 factory for Document, and 1 main builder. The rest of the DOM was done with a Bean Factory. The BuilderClient is another 9K for a stack-based version (Aelfred) and a bit less for an object-based version (MS-XML). BuilderClients are pretty easy to write, about two hours or so for me, but I haven't gotten around to the other parsers yet. MONDO itself is a bit large (~100K + requires ~100K general library) but I am trying to produce a version (mindo) that only includes what is needed for this type of task which may be 50K for mindo and 40-60K for the general library. The DOM interfaces are about 10K and the skeleton classes are 16K. The classes only serve the purpose of construction and printing (i.e. dumping). More interesting classes would be quite a bit larger. [*] Note that I modified the DOM interfaces to: (1) fix what I thought were bugs or deprecated behavior (2) Provide some extra services (e.g. Integer objects for the 'int's) (3) collapse specific types into more generic Map and List collections and (4) Added a naming convention (i.e. suffixing an interface which only has constants in it with 'Constants'). Changing things back into the original form should be easy (I have the originals from the spec also) and should have little significance to the rest of the process. ========================================== For more information on MONDO see http://www.chimu.com/projects/mondo Part of the design document is in HTML now and for this particular topic (XML->DOM Objects), you might want to look at Chapters 2&4 at: http://www.chimu.com/projects/mondo/design/part0002.html http://www.chimu.com/projects/mondo/design/part0004.html --Mark mark.fussell@chimu.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Dec 12 15:10:10 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:26 2004 Subject: Classification: XML Parser Features In-Reply-To: <199712120850.IAA18479@GPO.iol.ie> References: <199712120850.IAA18479@GPO.iol.ie> Message-ID: <199712121508.KAA00821@unready.microstar.com> Sean Mc Grath writes: > The real truth behind XML's simplicity and ease of implementaton is being badly > let down by the haziness with with parsers are classified:- > > Well Formed > Valid > Type Valid (In the DOM level 1 spec.) > Tag Valid (ditto) > DTD Aware (Aelfred) I'd suggest that there at least three logically-separate realms of here, all of which we've been overloading onto the same single set of terminology. Here's what I suggest: Realm #1: Functionality a) Scanning This type of parser simply skips the DOCTYPE declaration (using regular expressions) and parses the markup in the document instances. It is not required to handle any but the built-in entities, and as a result, does not include any external entities. For the purposes of whitespace handling, it assumes that all specified attributes are CDATA and that all elements have mixed content. Optionally, a scanning parser may attempt to extract some information from the DOCTYPE declaration, such as entity declarations and attribute default values. b) DTD-driven This type of parser reads the DTD (both internal and external subsets) to obtain entity declarations, attribute declarations, and element-type declarations. It handles any entities declared in the DTD (internal or external), and provides default values when attributes are not specified. For the purposes of whitespace handling, it uses the declared type for each attribute, and distinguishes between element types with element content and elements with mixed content. Realm #2: Validation a) Non-validating This type of parser assumes that its input document is both well-formed and valid, and is not required to report any errors at all. Optionally, a non-validating parser may report some lexical or DTD-related errors, but it does not qualify as a well-formed or validating parser unless it reports _all_ relevant errors. b) Well-formed This type of parser reports any lexical errors in an XML document (including well-formedness constraints in the spec), but is not required to report DTD-related errors (such as attribute-type mismatches, elements out of context, etc.). A well-formed parser must report an error for all 141 tests in James Clark's test suite. Optionally, a well-formed parser may report some DTD-related errors, but it does not qualify as a validating parser unless it reports _all_ DTD-related errors. c) Validating A validating parser must report all of the errors reported by a well-formed parser, together with all DTD-related errors ("validity constraints" in the spec), such as elements in contexts not allowed by the current content model, attempts to change #FIXED attributes, failure to specify #REQUIRED attributes, unresolved IDREFS, and attribute-type-mismatches. Validating parsers must provide DTD-driven functionality. Realm #3: Interface a) Event-based An event parser returns a series of XML document events, such as character data or the start or end of an element, usually through call-backs to user-defined handlers. Events are returned in the order that they occur in the XML source document. b) Tree-based A tree-based parser builds an in-memory tree of an entire document, then provides some means for the user to navigate the tree. The user is not constrained to navigating the tree in the order that it was parser. Tree-based parsers are often built on top of an event-based layer. According to this classification, ?lfred is a DTD-driven, non-validating, event-based XML parser. There are other realms, including the type of information delivered by a parser (simple ESIS-like production information, or full information for an XML editor, such as comments, ignored whitespace, etc.), but I think that we would be best standardise a few basic terms first. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Dec 12 15:14:09 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:26 2004 Subject: external dtd subset content In-Reply-To: <34912136.2C5BEBBA@mixx.de> References: <34912136.2C5BEBBA@mixx.de> Message-ID: <199712121513.KAA00842@unready.microstar.com> james anderson writes: > we're trying to understand the necessary form for the external dtd > subset. > in particular two questions have arisen. > > 1? since the external subset contains markup declarations only, it > would appear that it establishes no constraint on the root element. is > it legitimate to use the same dtd for various xml documents, each with a > different root element? Yes -- that's standard practice in the SGML world (you can use the same external DTD for an entire book or for just one chapter of it). > 2? among the example DTD's we've found, some begin with an > form. others don't. isn't that form excluded from being a PI and thus > from being a markupdecl? No, on two counts: 1) The grammatical production for markupdecl [30] explicitly includes processing instructions. 2) The that you see at the beginning of the external subset is not a processing instruction but a text declaration (which is similar but not identical to an XML declaration). For example, if my external subset were encoded in ISO Latin 1, I would be required to put the following declaration at the top: If it were in ASCII, however, I could just let the encoding default to UTF-8. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Fri Dec 12 15:37:28 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:26 2004 Subject: external dtd subset content Message-ID: <3.0.32.19971212073419.00967b00@pop.intergate.bc.ca> At 12:34 PM 12/12/97 +0100, james anderson wrote: >1? since the external subset contains markup declarations only, it >would appear that it establishes no constraint on the root element. is >it legitimate to use the same dtd for various xml documents, each with a >different root element? That's right. >2? among the example DTD's we've found, some begin with an >form. others don't. isn't that form excluded from being a PI and thus >from being a markupdecl? I think that should be OK since a DTD is an external parsed entity. But I've put your mail in the errata file to make sure it's clear enough. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Fri Dec 12 15:37:31 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:26 2004 Subject: Classification: XML Parser Features Message-ID: <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca> At 10:08 AM 12/12/97 -0500, David Megginson wrote: >Realm #1: Functionality > >a) Scanning > This type of parser simply skips the DOCTYPE declaration (using > regular expressions) and parses the markup in the document > instances. This is not a conformant XML processor per the spec. There are certain things a processor is required to do with the internal subset, including parse it and check it for syntax. >b) DTD-driven There are a whole range of behaviors. Parsers may, not must, read external markup declarations and external parsed entities. >Realm #2: Validation > >a) Non-validating > This type of parser assumes that its input document is both > well-formed and valid, and is not required to report any errors at > all. No such animal is envisioned in the standard. If it doesn't check for WF problems, it's not an XML processor. I'll stop here. I suggest you go back and re-work your (potentially helpful) list based on a re-reading of the specification. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mecom-gmbh at mixx.de Fri Dec 12 15:41:35 1997 From: mecom-gmbh at mixx.de (james anderson) Date: Mon Jun 7 16:59:26 2004 Subject: external dtd subset content References: <34912136.2C5BEBBA@mixx.de> <199712121513.KAA00842@unready.microstar.com> Message-ID: <34915C46.21E8527A@mixx.de> aha! i was missing the relation between external parsed entity and external subset. to the drafters: a link from the discussion between [30] and [31] down to the discussion concerning parsed entities ([78]+) would help here. thanks. David Megginson wrote: > james anderson writes: > > > 2? among the example DTD's we've found, some begin with an > > form. others don't. isn't that form excluded from being a PI and thus > > from being a markupdecl? > > No, on two counts: > > ... > 2) The that you see at the beginning of the external subset > is not a processing instruction but a text declaration (which is > similar but not identical to an XML declaration). For example, if > my external subset were encoded in ISO Latin 1, I would be required > to put the following declaration at the top: xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Dec 12 17:19:13 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:26 2004 Subject: Classification: XML Parser Features In-Reply-To: <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca> References: <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca> Message-ID: <199712121717.MAA01762@unready.microstar.com> Tim Bray writes: > >a) Scanning > > This type of parser simply skips the DOCTYPE declaration (using > > regular expressions) and parses the markup in the document > > instances. > > This is not a conformant XML processor per the spec. > > There are certain things a processor is required to do with the internal > subset, including parse it and check it for syntax. Quite right; to my knowledge, however, there exist no XML processors that do so, except possibly for James's new one (I haven't tried it). In particular, few handle UTF-8 correctly. As I've mentioned in private e-mail, even the 1997-12-08 spec is not currently well-formed, since it uses ISO-8859-1 encoding without saying so in its encoding declaration, so any conforming processor would have to reject it. More generally, this requirement makes no provision for the desperate Perl hacker who has played such a central role in XML discussions. Creating a truly well-formed parser is very, very difficult, because of the enormous number of constraints imposed both explicitly and implicitly by the grammar (I could probably write a full SGML parser with about the same level of effort, especially if I limited myself to a single, simple SGML declaration). For example, both ?lfred and Lark fail to report the two errors in the following document: This is a ]]> paragraph. I could support complete well-formedness error reporting in ?lfred, but its size would bloat to about 35-40K (entity-boundary checking, in particular, would be messy), while I still want to get it down to under 20K so that Java applet writers can use it. I did have a version that passed the first 101 of James Clark's 141 tests, but it was already at about 30K, and I was aware of many other cases that he wasn't testing for. > >b) DTD-driven > > There are a whole range of behaviors. Parsers may, not must, read > external markup declarations and external parsed entities. Yes, you control that using the standalone declaration. I am recommending that parsers that do not handle the full DTD (internal and external) be referred to as "scanning parsers", while parsers that handle everything be referred to as "DTD-driven parsers". If necessary, we could always add another degree in the middle. > >Realm #2: Validation > > > >a) Non-validating > > This type of parser assumes that its input document is both > > well-formed and valid, and is not required to report any errors at > > all. > > No such animal is envisioned in the standard. If it doesn't check for > WF problems, it's not an XML processor. I am aware of the constraints in the spec, but I believe that this is a serious strategic error. ?lfred is a non-conforming XML processor, as are Lark, MSXML, and all others that I have had a chance to try: ?lfred will produce correct output for valid and well-formed XML documents, but will not necessarily report errors for documents that are not valid/well-formed. If the XML spec does not make allowance for software tools like these, then it will have little to distinguish it from full SGML except for a bit of marketing hype. > I'll stop here. I suggest you go back and re-work your > (potentially helpful) list based on a re-reading of the > specification. -Tim Thank you very much for your comments. I am grateful for the work that you and the rest of the WG have done with the spec, and I hope that you find my comments constructive rather than confrontational. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 12 19:22:04 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:26 2004 Subject: LISTRIVIA In-Reply-To: <2F2DC5CE035DD1118C8E00805FFE354C099F41@red-msg-56.dns.micr osoft.com> Message-ID: <3.0.1.16.19971212200149.30077e3a@pop3.demon.co.uk> At 20:32 11/12/97 -0800, [several people] wrote: A very short message >but >included >a >great >deal >of >unnecessary >quoted >material Please try to cut down the volume of material you quote :-) As I said before I have to pay for this personally. Some people are charged by volume of e-mail. Any mailer that quotes is also able to delete material. Good quoting is not only courteous, but it makes what you write more valuable to read. Unlike many lists, one of the purposes of XML is to produce attractive documents :-) P. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 12 23:34:56 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:26 2004 Subject: Classification: XML Parser Features In-Reply-To: <199712121717.MAA01762@unready.microstar.com> References: <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca> <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971213000403.3177233e@pop3.demon.co.uk> At 12:17 12/12/97 -0500, David Megginson wrote: >Tim Bray writes: [.. extremely important discussion deleted ...] I also (unfortunately) have sympathy with David's view that it's harder to write a conforming parser than appears on first reading. I agree that there are few if any fully conforming parsers at present. > > I'll stop here. I suggest you go back and re-work your > > (potentially helpful) list based on a re-reading of the > > specification. -Tim > >Thank you very much for your comments. I am grateful for the work >that you and the rest of the WG have done with the spec, and I hope >that you find my comments constructive rather than confrontational. > I am sure this is not a confrontational issue. I think David has made an excellent first pass at defining what we need to do. WG and SIG discussions (which David has not seen) are confidential, but it's clear from the relatively recent introduction of 'standalone' that this issue has been thought about. I do not believe this problem is solved yet. I have always felt that until we get working prototypes we shall not uncover all the difficult semantic problems. It is exactly now that they will start to appear with a 'stable' spec and a crop of new software. If you think 'no need to write a new parser, it's all been done' that's probably optimistic. The problem is that the semantics are very hidden and depend on what your background is. You may use SGML as a marker and it would be *logical* to design an XML parser to do exactly what an SGML one does. However, XML deliberately introduces flexibility into the spec, and in so doing introduces fuzziness. If anyone thinks this isn't a fuzzy area, state precisely what you think of David's classification (amended if necessary). Only if most of the 'XML experts' agree, can we say it isn't fuzzy. There will be worse fuzziness introduced if it isn't clear to 'non-XML-experts' what to do. IMO there are still areas of difficulty and different authors will introduce different 'features' - often without realising it. I suspect that a useful way forward will be to attach commandline options to parsers. They are already potentially required for 'may' clauses. Perhaps we should identify the areas where there are two schools of thought (e.g 'assume document is WF'/'check for WF error') and add a switch. Then the newcomers will understand that there is an area they have to think about. These may also help to clarify the drafters' minds if necessary. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 12 23:35:53 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:26 2004 Subject: external dtd subset content In-Reply-To: <34912136.2C5BEBBA@mixx.de> Message-ID: <3.0.1.16.19971213001510.30077718@pop3.demon.co.uk> At 12:34 12/12/97 +0100, james anderson wrote: >we're trying to understand the necessary form for the external dtd >subset. >in particular two questions have arisen. > >1? since the external subset contains markup declarations only, it >would appear that it establishes no constraint on the root element. is >it legitimate to use the same dtd for various xml documents, each with a >different root element? Good point! I have never really understood why it's necessary to have consistency between the root element and the doctypedeclName. For example If I am authoring HTML 2.0 (assume there is an official XML DTD) and I write:

This is a para

that is presumably valid, but:

This is a para

is invalid. Is this what the WG intends? If so, what's the rationale? P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Fri Dec 12 23:59:22 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:59:26 2004 Subject: XML 1.0 Proposed Recommendation Message-ID: <199712122357.PAA21803@boethius.eng.sun.com> XML 1.0 is now a W3C Proposed Recommendation: http://www.w3.org/TR/PR-xml The announcement was formally made at the SGML/XML '97 Conference in Washington, D.C. on Monday, December 8, 1997. This is the same conference at which the first Working Draft of the XML specification was released in November, 1996. W3C member organizations now have about six weeks to vote on the PR. Organizations may vote yes; yes, with comments; no, unless specified deficiencies are corrected; or no, this Proposed Recommendation should be abandoned. During this voting period, the XML Working Group expects to resolve minor technical issues and communicate its results to the W3C Director. After this time, the Director will announce the disposition of the document; it may become a W3C Recommendation (possibly with minor changes), revert to Working Draft status, or may be dropped as a W3C work item. While the disposition of the Proposed Recommendation is entirely at the discretion of the Director, the XML Working Group considers its work on XML 1.0 to be complete and does not expect to be making substantive changes to the proposal as it now stands. There have been a number of requests for enhancement to the specification that will be considered for XML 1.1, but at this time the WG is strongly inclined to delay work on XML 1.1 until some experience has been gained with implementations of XML 1.0. In the meantime, the WG will continue its work on XLL, the part of the XML family of specifications that deals with linking and addressing. Jon Bosak Chairman, W3C XML Working Group ---------------------------------------------------------------------- Jon Bosak, Online Information Technology Architect, Sun Microsystems ---------------------------------------------------------------------- 901 San Antonio Road, MPK17-101 | Best is he that inuents, Palo Alto, California 94303 | the next he that followes ISO/IEC JTC1/WG4::NCITS V1::SGML Open | forth and eekes out a good Davenport Group::W3C XML WG and SIG | inuention. ---------------------------------------------------------------------- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dima at paragraph.com Sat Dec 13 00:18:46 1997 From: dima at paragraph.com (Dmitri Kondratiev) Date: Mon Jun 7 16:59:26 2004 Subject: ?SGML decl. for XML to run in NSGMLS? Message-ID: <2.2.32.19971213001737.006c0330@dream.paragraph.com> I am trying to validate my xml with James Clark's nsgmls. When I use xml.dcl that SP distribution has together with nsgmls, I get lots of the following error messages : SPAM\BIN\NSGMLS.EXE:spam\pubtext\xml.dcl:48:20:E: there is no unique character in the document character set corresponding to character number 12288 in the syntax reference character set What SGML declaration for XML should I use ? Any help is most welcome! Thanks, Dima ----------------- Dmitri Kondratiev dima@paragraph.com 102401.2457@compuserve.com http://www.geocities.com/SiliconValley/Lakes/3767/ tel: 07-095-464-9241 xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sat Dec 13 00:24:48 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:27 2004 Subject: ?SGML decl. for XML to run in NSGMLS? Message-ID: <3.0.32.19971212162547.0099cb50@pop.intergate.bc.ca> At 03:17 AM 13/12/97 +0300, Dmitri Kondratiev wrote: >I am trying to validate my xml with James Clark's nsgmls. When I use xml.dcl >that SP distribution has together with nsgmls, I get lots of the following >error messages : First of all, you should use James' nsgmlsu (the u is for Unicode). Second, you might want to try the attached for an SGML declaration, I'm not sure James has finished polishing it, but it's what we use for the XML spec and it's close. -Tim -------------- next part -------------- " PIC "?>" SHORTREF SGMLREF NAMES SGMLREF QUANTITY SGMLREF -- Quantities are not restricted in XML -- ATTCNT 99999999 ATTSPLEN 99999999 -- BSEQLEN NOT USED -- -- DTAGLEN NOT USED -- -- DTEMPLEN NOT USED -- ENTLVL 99999999 GRPCNT 99999999 GRPGTCNT 99999999 GRPLVL 99999999 LITLEN 99999999 NAMELEN 99999999 -- NORMSEP NO NEED TO CHANGE IT -- PILEN 99999999 TAGLEN 99999999 TAGLVL 99999999 FEATURES MINIMIZE DATATAG NO OMITTAG NO RANK NO -- SHORTTAG is the only allowed feature. It is required. -- SHORTTAG YES -- SHORTTAG is needed for NET -- LINK SIMPLE NO IMPLICIT NO EXPLICIT NO OTHER CONCUR NO SUBDOC NO FORMAL NO APPINFO NONE -- ??? Do we want some APPINFO ??? -- > From tbray at textuality.com Sat Dec 13 00:53:03 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:27 2004 Subject: external dtd subset content Message-ID: <3.0.32.19971212165211.009a78c0@pop.intergate.bc.ca> At 12:34 PM 12/12/97 +0100, james anderson wrote: >2? among the example DTD's we've found, some begin with an >form. others don't. isn't that form excluded from being a PI and thus >from being a markupdecl? I just got around to checking the spec, and it's pretty clear. Section 4.3.2 makes it clear that an external PE can begin with an At 12:17 PM 12/12/97 -0500, David Megginson wrote: >Creating a truly well-formed parser is very, very difficult, because >of the enormous number of constraints imposed both explicitly and >implicitly by the grammar (I could probably write a full SGML parser >with about the same level of effort, especially if I limited myself to >a single, simple SGML declaration). To start with, "full SGML parser" is directly contradictory to "a single SGML declaration" - abstract syntax in fact being one of the things that makes a full parser hard to write. As to David's main point, that a WF parser is hard to write, I don't agree; most of the work can be done in the low-level lexer, the number of constraints that require ad-hoc code is pretty small. Two things are in fact hard, it seems: 1. handling multiple input encodings, and 2. making it run real fast while you're doing #1. These don't really bother me that much as we are in the infancy of learning what the right way is to build truly internationalized software; for example, I can parse the UTF16 Japanese version of the XML spec in a few seconds; then it takes the best part of a minute to load the .ttf for the Unicode font so you can look at anything; so we have a few problems in this area. Having said that, I am now in the middle of coding up validation for Lark, and there are a TREMENDOUS NUMBER of irritating little details about that. No rocket science at all, but the code is going to be substantially larger than the rest of Lark and it's all real code; more than half of Lark is compressed parser tables. Mind you, the validator is in a separate package and can be bypassed, so Lark effectively need be no larger. But still; I wonder if validation is intrinsically hard or we could have found a better 80/20 point? -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 13 01:08:13 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:27 2004 Subject: LISTRIVIA and Re: ?SGML decl. for XML to run in NSGMLS? In-Reply-To: <3.0.32.19971212162547.0099cb50@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971213015731.323f1db4@pop3.demon.co.uk> I would be grateful if poster to xml-dev did not attach documents since they do not appear on the hypermail and can also cause problems with the software. I suspect that there a considerable number of people who read the XML-DEV list through the hypermail system rather than subscribing. I have therefore included the attachment in clear in this message. FWIW the word 'Alphbet' is an unusual spelling, and since it occurs in an FPI is presumably significant. P. At 16:26 12/12/97 -0800, Tim Bray wrote: [... human-readable text deleted ...] > >Attachment Converted: "c:\eudora\attach\xml.dcl" " PIC "?>" SHORTREF SGMLREF NAMES SGMLREF QUANTITY SGMLREF -- Quantities are not restricted in XML -- ATTCNT 99999999 ATTSPLEN 99999999 -- BSEQLEN NOT USED -- -- DTAGLEN NOT USED -- -- DTEMPLEN NOT USED -- ENTLVL 99999999 GRPCNT 99999999 GRPGTCNT 99999999 GRPLVL 99999999 LITLEN 99999999 NAMELEN 99999999 -- NORMSEP NO NEED TO CHANGE IT -- PILEN 99999999 TAGLEN 99999999 TAGLVL 99999999 FEATURES MINIMIZE DATATAG NO OMITTAG NO RANK NO -- SHORTTAG is the only allowed feature. It is required. -- SHORTTAG YES -- SHORTTAG is needed for NET -- LINK SIMPLE NO IMPLICIT NO EXPLICIT NO OTHER CONCUR NO SUBDOC NO FORMAL NO APPINFO NONE -- ??? Do we want some APPINFO ??? -- > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Sat Dec 13 06:03:07 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:27 2004 Subject: Classification: XML Parser Features References: <3.0.32.19971212073841.009ac460@pop.intergate.bc.ca> <199712121717.MAA01762@unready.microstar.com> Message-ID: <3492CB04.6A99DD70@jclark.com> David Megginson wrote: > > Tim Bray writes: > > > >a) Scanning > > > This type of parser simply skips the DOCTYPE declaration (using > > > regular expressions) and parses the markup in the document > > > instances. > > > > This is not a conformant XML processor per the spec. > > > > There are certain things a processor is required to do with the internal > > subset, including parse it and check it for syntax. > > Quite right; to my knowledge, however, there exist no XML processors > that do so, except possibly for James's new one (I haven't tried it). > In particular, few handle UTF-8 correctly. As I've mentioned in > private e-mail, even the 1997-12-08 spec is not currently well-formed, > since it uses ISO-8859-1 encoding without saying so in its encoding > declaration, so any conforming processor would have to reject it. The spec says that not specifying the right encoding is merely an error (which means a processor is not required to detect it) rather than a fatal error. In general a processor can't detect whether the specified encoding is correct or not (consider ISO-8859-1 v ISO-8859-2). > More generally, this requirement makes no provision for the desperate > Perl hacker who has played such a central role in XML discussions. The desperate Perl hacker doesn't require his code to be blessed as a conforming XML processor. One reason for requiring conforming parsers to detect and report errors is to avoid the situation we see now with HTML where it has become extremely difficult to create a production quality HTML processor because users have come to expect an HTML processor to accept almost any random garbage they throw at it. Personally I would have preferred to see XML allow conforming processors to continue processing in the presence of errors, but I think the decision to require that errors be detected and reported was the right one. > Creating a truly well-formed parser is very, very difficult, because > of the enormous number of constraints imposed both explicitly and > implicitly by the grammar (I could probably write a full SGML parser > with about the same level of effort, especially if I limited myself to > a single, simple SGML declaration). I think that assessment is way off base. My xmlwf processor aims to catch all well-formedness errors. There are a couple of cases I know the current version doesn't catch and there are probably a few cases I've missed, but I think it is pretty close. I wouldn't say writing it was very, very difficult. However it's certainly not trivial, and does require considerable attention to detail. I think having a test suite should help here. Getting good performance also requires effort. There are a couple of things in this area I would like to see 1.1 change: - for well-formedness almost any character should be allowed as a name character; detailed checking of a character against the table of name characters should be a validity check; - whitespace in the prolog shouldn't be handled in the grammar, but should instead be regularised (still compatible with ISO 8879 of course) and handled at a lexical level. A fully conforming SGML parser (even one limited to a single SGML declaration) is substantially more difficult. For example, in order to enforce the RS/RE ignoring rules a parser has to determine whether an element is an inclusion or not, which in turn requires it to do content checking. > I did have a > version that passed the first 101 of James Clark's 141 tests, but it > was already at about 30K, and I was aware of many other cases that he > wasn't testing for. Additional test cases are welcome. (By the way, test 088.xml was overtaken by events and is now well-formed.) > > >b) DTD-driven > > > > There are a whole range of behaviors. Parsers may, not must, read > > external markup declarations and external parsed entities. > > Yes, you control that using the standalone declaration. I am > recommending that parsers that do not handle the full DTD (internal > and external) be referred to as "scanning parsers", while parsers that > handle everything be referred to as "DTD-driven parsers". If > necessary, we could always add another degree in the middle. The intent (at least as I understand it) was to enable the following two classes of parser: - standalone parsers which can handle only the internal subset (and hence which are able to produce the correct parse only for documents which specify or could specify standalone="yes") - full parsers which can parse the complete DTD. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Sat Dec 13 10:10:06 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:28 2004 Subject: Test cases and xmltok updated Message-ID: <3493068A.57CB5B4A@jclark.com> I've updated my collection of test cases at ftp://ftp.jclark.com/pub/test/xmltest.zip I changed one test case (088.xml) to reflect a change in the XML spec and added some more tests. There are now 164 test cases which all fail to be well formed according to the XML Proposed Tecommendation. I've also updated my XML tokenizer/well-formedness checker at ftp://ftp.jclark.com/pub/test/xmltok.zip I believe this is now up to date for the XML Proposed Recommendation. I know of one well-formedness violation it fails to detect: when the encoding is UTF-8 it fails to detect illegal characters whose encoding requires more than one byte (ie 0xFFFF, 0xFFFE, surrogates and characters >= 0x10000). If you find any others, please let me know. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From rsiera at steunpunt.be Sat Dec 13 13:42:34 1997 From: rsiera at steunpunt.be (Robrecht Siera) Date: Mon Jun 7 16:59:28 2004 Subject: XML software for Visual Basic Message-ID: <34937a0a.957564@mailhost.innet.be> It is getting more and more interesting to start programming applications using XML. Until now Java gets the most attention to do this programming in (for obvious reasons). But to have some programming routines for Visual Basic would be very welcome also. Because we would like to develop a data management and data exchange application where XML is used as file format. Is anybody capable of developing such parser routines or API usable in Visual Basic ? Groetjes, Robrecht Siera ------------------------------------------------ In Petto - Jeugddienst Informatie en Preventie In Petto - National Youth Service for Youth Information and Prevention Diksmuidelaan 50, 2600 Berchem, Belgium tel +32/3/366.15.20, +32/3/366.45.45 fax +32/3/366.11.58 email: inpetto@cybco.be www : http://www.cybco.be/inpetto ------------------------------------------------ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 13 14:22:46 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:28 2004 Subject: Classification: XML Parser Features In-Reply-To: <3.0.32.19971212170758.009ab470@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971213121125.3f37eb1e@pop3.demon.co.uk> At 17:08 12/12/97 -0800, Tim Bray wrote: >At 12:17 PM 12/12/97 -0500, David Megginson wrote: >>Creating a truly well-formed parser is very, very difficult, because ^^^^^^^^^^^^^^^^^^^^ I think I would rephrase this - like TimB - to read something like: "Creating a WF parser is a *lot* of work with a large number of small decisions where the author may not always get help from the spec." The author has to make (small) decisions which may appear intuitive to her but may be interpreted differently by others. These decisions may not matter in the vast majority of cases. There is/was a measure for XML that a 'mythical computer science graduate student' could hack up a parser in a couple of weeks. Armed with this promise I set about writing a recursive descent parser (which still exists in JUMBO and is the default). But I have stopped working on it because (a) others have written much better ones and (b) it's a lot more work than it looks. Not difficult, I suspect, (it *was* difficult with the early version of PEs) but lots of unrelated niggles. As an example I started writing an editor for WF XML, including editing elementTypes and attributes. I suddenly realised that I had to check for Name validity - as highlighted by James Clark. This requires validating characters against Appendix B of the spec. I applaud and support the WG's concentration on Internationalization (i18n) but when confronted with Appendix B at midnight, the heart sinks. The tendency is just to insert 'This document is not yet i18n-conformant' and get on with more exciting things (like why the program crashes). In writing JUMBO I have come across a large number of these little things which I don't feel the spec resolves. I am very happy to leave the parser-related things to those people who do it better (than me). But SeanM/DavidM correctly raise the question of what a parser emits. I am still not sure what the distinction between a parser, a processor and an application is - I keep asking and have failed to get a reply. This is dangerous because (a) 'processor' is used in the spec but 'parser' isn't (b) it's quite clear from discussions on this list that: - some people think processor and parser are synonyms. ---------------------- -------------------- |Parser aka Processor| ----------> | Application | ---------------------- -------------------- - some people think parser and processor are completely separate -------- ------------ -------------------- |Parser| -------> | Processor| ----------> | Application | -------- ------------ -------------------- - some people think that a processor is a unit which contains a parser but has additional integrated facilities. --------------------------------- | Processor | | ----------- | -------------------- | | Parser | | ----------> | Application | | ----------- | -------------------- --------------------------------- *** I suggest that the first time anyone uses the word 'parser' or 'processor' in this discussion they indicate what they think a processor is. Unless we have some ideas of each other's ontologies we shall have serious problems. The problems with what a parser is, are tricky but nothing compared with the semantic difficulties of passing the output of 'a processor' to 'an application'. The spec gives no help with this, except to highlight some areas of difficulty and - effectively - to say 'this is up to you'. I'd like it to be partly 'up to XML-DEV', which is why this discussion is *so* important. Please don't think that anyone raising problems here is simply unable to understand the spec or hasn't read it properly. Those involved in writing the spec have a combined weight of perhaps 500 years of working with SGML and other document processing tools. Many of the readers of this list are coming to these discussions with different backgrounds and do not pick up the 'implied' or 'given' semantics in the spec. I'm one, and I think that if someone genuinely can't *implement* the spec because of semantic uncertainties, there is a problem. [I am also clear, and have said so all along, that many problems will *only* come to light when people try to implement them.]. However, it's also important to realise that the spec is written with very great care, very great precision and many sentences need to be read very carefully and repeatedly. [In this alone I doubt that many MCSGS can effectively understand all the concepts in the spec in less than two weeks. And most DPHs and DumbXMLBrowserHackers (like me) will miss a lot of the subtlety, through cursory reading.] >>of the enormous number of constraints imposed both explicitly and >>implicitly by the grammar (I could probably write a full SGML parser >>with about the same level of effort, especially if I limited myself to >>a single, simple SGML declaration). I think the problems are different. SGML is complex, but precise. A year or two back someone estimated on comp.text.sgml that SGML defined something like 2^16 variants. I think that XML is one such variant, and one of the simplest. Writing a full SGML parser is very hard, with the result that very few complete standalone parsers were ever written. In one sense that was very valuable because people like me would just run their document through sgmls - if it crashed, the document was wrong. [I have no idea whether there are parsers which take a semantically different view of 8879 from sgmls. However, even sgmls did not implement all the hairy options in SGML, and many of these are not covered in many textbooks]. The XML process is very different. The syntax is trivial to write a parser for. But the freedom of WF documents presents difficult and unresolved problems of semantics. Therefore the time writing an XML parser is not in coding the BNF, but worrying about what to do with the code. In particular the question of 'validity' is fuzzy and crops up repeatedly. Where features are optional in an XML document (e.g. the DOCTYPE statement) does its *presence* (not its content) imply anything about how the software should behave. I don't find this easy, but it's a very different sort of difficulty from the difficulty of coding a validating algorithm for content in full SGML. [Tim's areas of difficulty] >1. handling multiple input encodings, and >2. making it run real fast while you're doing #1. > >These don't really bother me that much as we are in the infancy of >learning what the right way is to build truly internationalized >software; for example, I can parse the UTF16 Japanese version of the >XML spec in a few seconds; then it takes the best part of a minute >to load the .ttf for the Unicode font so you can look at anything; >so we have a few problems in this area. Because this is uncharted territory it's certain to throw up problems. > >Having said that, I am now in the middle of coding up validation for >Lark, and there are a TREMENDOUS NUMBER of irritating little ^^^^^^^^^^^^^^^^^ Yup, yup, yup. Each of this is 'small'. Let's assume that 95% of people agree with your interpretation for each one in precise implementation (e.g. implementation of Name), and let's assume that you have 20 such problems. 0.95^20 is 0.35; so 35% of people will think that Lark is totally conforming and does exactly what they want. This is a possibly naughty way of addressing the problem, but it can only (IMO) be resolved by identifying those niggling problems and agreeing communally either the 'right' way, or adding a switch to the operation. Simply making personal decisions by each parser writer is a guarantee that parsers will behave differently. This is why JUMBO can use multiple parsers. DavidD suggested that it was because they had bugs. In a sense that's exactly right ('features' is probably more accurate). [It's also because no one has - yet - got a complete Java implementation of a 'parser'.] The thing that really frustrates me is that we lost the communal will to create an API for parsers. Why, why, why - can't we do this? I'm going to suggest a slightly revised approach. AElfred comes close to it. I'll write another msg, rather than make this too long. [...] > >Mind you, the validator is in a separate package and can be bypassed, so >Lark effectively need be no larger. But still; I wonder if validation >is intrinsically hard or we could have found a better 80/20 point? -Tim You're going to find out whether it's hard when you try to implement it :-). I have no idea whether it's *really hard*. I think I could do content validation in a week on a desert island. I would probably use a completely stupid approach. However I have received a gift of a validator (not in Java, but many thanks) and please keep them coming. We need more than one, precisely to see whether we all agree :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 13 14:25:43 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:28 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal In-Reply-To: <3.0.32.19971212170758.009ab470@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971213151941.3f37ed14@pop3.demon.co.uk> In case anyone has missed my postings over the last 10 months, I would like an API for XML parsers :-). JUMBO has been interfaced to 3 publicly available Java parsers (besides its Mus Michaelis one) and finds it sufficiently hard grunt work adding more because of the inconsistency of what's presented through the existing APIs. Note that all three parsers (Lark0.97, NXP97-09, AElfred1.0beta) provide EventDriven interfaces. I have not tried - and do not at present intend to try - to interface with someone else's Tree or Grove model. [Lark builds a tree if required - the others don't. NXP has classes for a CompleteGrove - I haven't used them.] *** Please understand that any apparent frustration below is NOT criticism of these three parsers and their authors - all of whom have made an extremely important contribution. Nor is the omission of MSXML, tcl-based parsers and JamesC's software anything than lack of time *** It's also clear that none of the three allow me to get at all the information in the document I want, though I think AElfred is almost there [I haven't looked at the latest version.] Let's assume I want the Name in the DOCTYPE [29] - the root elementType. In Lark097: public boolean doDoctype(Entity e, String rootType, String publicID, String systemID); OK - I can manage this, but I have no idea what the Entity class is in any of Lark's calls. "Those names Element, Attribute and Entity are obvious in their function.". This is just another example of my Dumbness, but it's a reality. I don't have time to explore precisely what it is - and I can't actually print it out. In NXP97-09-05 (I think) I can grep and find (XML.java): final public String doctypedecl(); Since the code is autogenerated by JACC I haven't the first idea what the contents of the String are (I would have to experiment). If it goes by the spec it's the whole String contents of all the subsets, I assume. In AElfred1.0beta: public abstract void doctypedecl(XmlParser parser, String name, String pubid, String sysid); This is fully documented in javadoc. [Note: javadoc is free, comes with the system, is relatively easy to use after you have fought the classpath and there is no good reason not to use it.] So three parsers, three quite different interfaces, three more midnight hacks for JUMBO. I haven't looked at MSXML but I would be amazed if there wasn't yetanotherinterface. All of this makes JUMBO very tired. There seem to be several reasons for this lethargy in producing an API - we've been at this since February. Since there is relatively little discussion I am guessing these reasons from "vibes". :-) - it's too early to do anything - the language spec has only been published this week. - it's all in the spec - if you can't work out what to do properly that's not our problem. - a proper grove plan takes care of this. Anything simpler is inadequate. - this will all be sorted out by the DOM, so let's do nothing until this happens. - parsers are unlikely to be interoperable anyway. - this is an area which should be left to the software houses - the W3C is primarily to develop markets for its members. - it's in our interests to have non-interoperability because we'll protect our markets that way. - it's too difficult and I'm not paid to spend the time thinking about it. So - as a first step - I make the following proposal and ask for constructive comments. I am quite prepared to be shown it's shallow and unworkable. *Simple* Java interfaces are usually built by identifying the objects involved and using a consistent style for naming objects, methods, interfaces and related hooks. An example is Java Beans, where getXyz() and setXyz() have semantics which the Beans reflection mechanism can identify. The XML spec has very precise definitions of the components that are required in an interface. My proposal is simply that we should use these two approaches wherever possible in naming classes and methods, and that we should list the functions in the interface. That's all :-). If I want the rootType of the document I refer to [29] and see that it is a Name. Therefore I could do all I want with code like: /** extract the string directly from the document [29] */ public String Document.getDoctypedeclName() OR: /** or have a class for Doctypedecl [29] */ public Doctypedecl Document.getDocumentdecl(); public String Doctypedecl.getName(); To get the contentspec and default attribute value for the Bar attribute name of the Element Foo: (note the differences in capitalisation of the string 'decl' in the spec); Enumeration elementdecls = Document.getElementdecls(); /*[29-30]*/ while (elementdecls.hasMoreElements()) { Elementdecl elementdecl = (Elementdecl) elementdecls.nextElement(); if (elementdecl.getName().equals("Foo")) { /*[45]*/ String contentspec = elementdecl.getContentspec(); } } Enumeration attlistdecls = Document.getAttlistDecls(); /*[29, 30]*/ while (attlistdecls.hasMoreElements()) { AttlistDecl attlistDecl = (AttlistDecl) attlistdecls.nextElement(); if (attlistDecl.getName().equals("Foo")) { Vector attDefVector = attlistDecl.getAttdefs(); /*[52]*/ for (int i = 0; i < attDefVector.size(); i++) { AttDef attDef = (AttDef) attdefVector.elementAt(i); if (attDef.getName().equals("Bar")) { /*[53]*/ String value = attDef.getDefault(); /*[54]*/ } } } } If something is defined in the spec, it has a clear place where it is defined, and a clear term. Why not use this? It should only take a few hours to go through the 82 productions and decide which of them returned anything useful (we are unlikely to require [26], for example :-); - many productions are irrelevant to the parsed, normalised document. The semantics are clear (at least as clear as the spec can provide), and can be precisely pinpointed We have to decide which components require classes and which are simply Strings. In some cases capitalisation is a problem. Java strongly urges initial caps so I would write: public Prolog getProlog()/*[23]*/ (I am not sure whether there are name collisions separated only by case). In some cases the names clash with existing java classes, so in [59] we might have to write: public jumbo.parser.Enumeration getEnumeration(); since there is a java.util.Enumeration. In some cases there are repeatable values [e.g. [58] ] where we might need: public String[] NotationType.getNames(); or we may choose to have Vector, etc. The use of many classes might make the parsers too large or slow, so maybe some other style might be useful.

This is simple, and is easy to implement. Dumb hackers like me can understand it by reading the spec - they don't need to know about groves, DOM or whatever. I expect that it's not comprehensive - there is no error model for example - but I can't see much that I need from a document that isn't in the spec. Anything else would be parser-specific flags, or perhaps retrieval of unnormalised input.

P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sat Dec 13 15:45:47 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:28 2004 Subject: Classification: XML Parser Features In-Reply-To: <3.0.1.16.19971213121125.3f37eb1e@pop3.demon.co.uk> References: <3.0.32.19971212170758.009ab470@pop.intergate.bc.ca> <3.0.1.16.19971213121125.3f37eb1e@pop3.demon.co.uk> Message-ID: <199712131544.KAA00367@unready.microstar.com> Peter Murray-Rust writes: > - some people think processor and parser are synonyms. > - some people think that a processor is a unit which contains > a parser but has additional integrated facilities. The problem is a misalignment in terminology. In SGML, an "SGML application" is a DTD together with other support information (such as documentation, conventions, etc.). And although the terms are not formally defined, SGML people often use 'parser' to describe the logical component that translates the external representation of a document into some sort of abstract internal format, and 'processor' (or 'processing software', or 'formatter', in some cases), to describe the logical component that acts on the information delivered by the parser. In XML, the spec confusingly defines 'processor' to fill the same logical role as 'parser' in normal SGML usage, and 'application' to fill the same logical role as 'processor' or 'processing software' in normal SGML usage. Of course, this confusion will exist only for people who are already used to SGML. I prefer 'parser', because it is at least unambiguous for both sides, even if slightly unfamiliar for XML-only people; if I use 'processor', I risk causing confusion for the sake of being strictly XML conformant. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Sat Dec 13 16:28:57 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:59:28 2004 Subject: XML software for Visual Basic Message-ID: <3.0.32.19971213102401.00dac840@swbell.net> At 01:42 PM 12/13/97 GMT, Robrecht Siera wrote: >It is getting more and more interesting to start programming >applications using XML. Until now Java gets the most attention to >do this programming in (for obvious reasons). > >But to have some programming routines for Visual Basic would be very >welcome also. Because we would like to develop a data management >and data exchange application where XML is used as file format. > >Is anybody capable of developing such parser routines or API usable >in Visual Basic ? Part of the Jade package (James' DSSSL Engine) is the groveoa.dll, an OLE Automation DLL that you can use easily with Visual Basic to operate on SGML documents. I don't know if James has enabled the XML parsing mode that he's putting into SP, but it probably wouldn't be too hard to hack it to do it. The grove that groveoa.dll creates reflects the SGML property set as defined in the DSSSL and HyTime standards, rather than the DOM design, although the two designs are close enough that code developed for one should be easily adapted to the other. I've created a little toy application, GroveView, that demonstrates using the groveoa.dll. You can find it at "http://www.isogen.com/demos/groveview.html". Source code is available upon request. Jade is available a "http://www.jclark.com". Cheers. Eliot --

W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com

xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sat Dec 13 17:58:18 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:28 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <3.0.32.19971213095945.0079eb00@pop.intergate.bc.ca> At 03:19 PM 13/12/97, Peter Murray-Rust wrote: I agree with Peter that we should just buckle down and get on with what used to be known as XAPI. But my approach would be quite different. I think that the first step should be the end-user's API, the kind of thing that someone using a SMIL or RDF processor would need. Such a person really doesn't want to wrestle with entities and references and PIs and marked sections; all they want is elements and attributes and the basic doctype info; they want the processor to deal with entities and refs and quote marks and white space in markup and encodings and so on. This would go a long way to address the whinings of the RDF & SMIL type people, who thought XML just meant elements and attributes. I think that from their point if view, it should be, all the other stuff in the syntax is strictly to support authoring and management convenience. It should come in event-stream flavor and tree flavor. Minimal event stream API: 1. Doctype, returns: root type, external subset system/public idents 2. Element start, returns: type, element name-value pairs, whether it's empty 3. Text 4. End Element, returns: type Minimal tree API: 1. Document, with methods: root type, system ID, public ID, root element 2. Element, with methods: parent, children, attributeValueByName, allAttributes 3. Attribute, with methods: name, value 4. Text (presumably hiding lazy evaluation) I acknowledge this is grossly insufficient for basing an editor on. You want that, use the DOM. Only a few choices have design implications: 1. How are children returned; possibilities would be to have Element and Text crammed into the same class with a method for asking which is which, or have separate Text and Element classes, then children returns an Object array or a Vector, and you can find out what kind of child each member is using the instanceof operator. I favor the latter, Lark does this 2. Whether it's worthwhile putting children into, as opposed to a native array or Vector, a special ChildList class with enumerator and indexing so you can hide a lazy-evaluation behind it. I favor the latter, the DOM does this but Lark doesn't. 3. Whether the processor should be required to coalesce adjacent Text objects. Suppose you have foo bar &ref; baz, it's immensely less work if the processor can give this to the app as 4 Text chunks. I think most of the processors do this now. If I formalized and published this, it would look a lot like part of Lark's interface, but I bet all the other parsers could implement it. Should I? -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sat Dec 13 18:46:16 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:28 2004 Subject: Error Reporting: XML vs ISO 8879 Message-ID: <199712131844.NAA00394@unready.microstar.com> This has been a fascinating discussion on what XML conformance means in an XML processor -- I think that it has helped people like me (who are not in the SIG or in the WG) to understand more of the WG's reasoning on the very strict rules for XML conformance. SGML PARALLELS -------------- I recognise James's concern that explicitly allowing non-error-reporting XML processors could cause non-conforming variants of XML to become common -- given the unfortunate history of HTML, I am not prepared to dismiss that concern lightly. It is surprising, however, that although some proponents (not James) claim XML as "a simplified form of SGML," XML is actually much more rigid than full SGML on this point. Let me quote from a (non-normative) note to the SGML standard, ISO 8879:1986, clause 15.4: NOTE -- A conforming SGML system need not have a validating SGML parser. Implementors can therefore decide whether to incur the overhead of validation in a given system. A user whose text editing system allowed the validation and correction of SGML documents, for example, would not require the validation process to be repeated when the documents are processed by a formatting system. In other words, if I have read the standard correctly (something that all of us fail to do at times), full SGML allows parsers that do not report errors, but XML does not. It is ironic that we can call PSGML a "conforming, non-validating" SGML editor, but that we must call it a "non-conforming" XML editor (even with my XML patches). CODE SIZE AND THE INTERNET -------------------------- This inflexibility on XML's part is especially surprising given that XML is designed for the Internet, where code size (whether for Java applets or ActiveX controls) is _much_ more critical than it is in a closed system. Imagine a Java programmer who has just written a 100K applet, and is considering adding XML support as an extra feature. I am concerned that we could not convince that programmer to add even a 24K XML parser like �lfred (especially after she's spent three weeks optimising for size); we certainly will not convince her to add 50K or 100K of class files for a full error-reporting XML parser, doubling the size of the applet. As it stands, however, her applet will be non-conforming unless it uses a conforming parser, so strictly speaking, the programmer will not be able to claim XML support if she uses a smaller XML parser like �lfred. Ideally, I'd like to get �lfred to under 10K to help with acceptance in the Java community; practically, I'll be thrilled if I can get it down to under 20K. I cannot justify bloating it to 40K or 50K. PRAGMATISM AND DEVIANT BEHAVIOUR -------------------------------- The strongest argument, however, comes from pragmatism. A W3C recommendation has relatively little moral force compared even to an IETF RFC, much less an International Standard, so if conformance is too difficult, most people just won't bother conforming (look at some of the widely-ignored HTML drafts that have come out). It makes sense, then, for XML to try to channel and regulate deviant behaviour rather than simply looking away and denying its existence. Instead of declaring every simple, non-error-reporting processor "non-conforming" (and thus, not regulating it at all), why not define a standard behaviour for those parsers as well, and create standard terms for labelling them? At least then, people will know what they're getting. GUARDING THE GRAIL ------------------ Like a former rebel who has just found a job, bought a house, or become a new parent, the XML WG now has something to protect, and they are naturally adapting precisely the conservatism that a vocal minority of XML supporters used to attack in the SGML establishment (and sometimes, as in the case of error-reporting, they have outdone the SGML community in their conservatism). This is a normal and expected development, but I expect that privately, at least, some of the original XML evangelists must be starting to look more sympathetically at what they used to consider unnecessary rigidity and purism in the SGML community. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Sat Dec 13 18:57:49 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:29 2004 Subject: OFE Message-ID: <199712131857.SAA31062@GPO.iol.ie> Does anyone happen to know if OFE-Open Financial Exchange (currently SGML) will be XML in the future? I cannot find anything in the OFE spec or websites about it yet it is linked to from a number of XML resource pages. Sean Mc Grath sean@digitome.com Digitome Electronic Publishing http://www.digitome.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 13 19:18:11 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:29 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal In-Reply-To: <3.0.32.19971213095945.0079eb00@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971213193641.3af79528@pop3.demon.co.uk> At 09:59 13/12/97 -0800, Tim Bray wrote: >At 03:19 PM 13/12/97, Peter Murray-Rust wrote: > >I agree with Peter that we should just buckle down and get on with what used >to be known as XAPI. > >But my approach would be quite different. I think that the first step I'm missing something :-) Your approach below seems almost identical to what I was suggesting. >should be the end-user's API, the kind of thing that someone using a SMIL >or RDF processor would need. Such a person really doesn't want to wrestle >with entities and references and PIs and marked sections; all they want Agreed - and they wouldn't be in what I wanted as well. >is elements and attributes and the basic doctype info; they want the yup yup yup >processor to deal with entities and refs and quote marks and white space in >markup and encodings and so on. > >This would go a long way to address the whinings of the RDF & SMIL type >people, who thought XML just meant elements and attributes. I think that >from their point if view, it should be, all the other stuff in the syntax >is strictly to support authoring and management convenience. > >It should come in event-stream flavor and tree flavor. > >Minimal event stream API: > >1. Doctype, returns: root type, external subset system/public idents I would like the elements as well. If the parser doesn't do them, we just return null. But if it does... >2. Element start, returns: type, element name-value pairs, whether it's empty is "type" the elementType? This is the sort of terminological problem we have. >3. Text >4. End Element, returns: type > >Minimal tree API: > >1. Document, with methods: root type, system ID, public ID, root element >2. Element, with methods: parent, children, attributeValueByName, allAttributes >3. Attribute, with methods: name, value >4. Text (presumably hiding lazy evaluation) Sounds OK. > >I acknowledge this is grossly insufficient for basing an editor on. You want I don't want much for an editor. Just the attribute stuff and contentspec. I don't want PE's, comments, marked sections and so on. >that, use the DOM. Only a few choices have design implications: > >1. How are children returned; possibilities would be to have Element and > Text crammed into the same class with a method for asking which is which, > or have separate Text and Element classes, then children returns an Object > array or a Vector, and you can find out what kind of child each member > is using the instanceof operator. I favor the latter, Lark does this I'm easy - **as long as we all agree** > >2. Whether it's worthwhile putting children into, as opposed to a native > array or Vector, a special ChildList class with enumerator and indexing > so you can hide a lazy-evaluation behind it. I favor the latter, the which is 'the latter'? :-) > DOM does this but Lark doesn't. > >3. Whether the processor should be required to coalesce adjacent Text > objects. Suppose you have foo bar &ref; baz, > it's immensely less work if the processor can give this to the app > as 4 Text chunks. I think most of the processors do this now. I don't have a problem here... > >If I formalized and published this, it would look a lot like part of >Lark's interface, but I bet all the other parsers could implement it. >Should I? -Tim I bet they could. It is very important, however, that everyone agrees on the terminology. I have never seen this as a difficult problem. I think it would take a week to come up with a reasonable working draft. I hope that XML-DEVers will see the value of a simple interface and not - as has happened before - keep getting more and more complex. the three parsers we have are simple - it's a slightly depressing situation that we haven't got an interface for them to use. I suggest that Tim goes ahead, but I'll also produce my interface from the spec. After all, that will show what the *consumer* (i.e. JUMBO) would like. As always I shall be happy to junk anything I do if it helps us make progress :-) It might also be useful for us to set ourselves a deadline. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sat Dec 13 20:30:28 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:29 2004 Subject: OFE Message-ID: <3.0.32.19971213122935.009a1950@pop.intergate.bc.ca> At 07:27 PM 13/12/97 +0000, Sean Mc Grath wrote: >Does anyone happen to know if OFE-Open Financial Exchange (currently SGML) >will be XML in the future? It's normally acronymed OFX I think. Good question. I believe there have public statements of intent to go XML, I'd think that Microsoft would be in the leadership position on this one. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Sat Dec 13 22:05:36 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:29 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <005801bd0812$a1af84b0$0100007f@localhost> Tim and Peter, From: Tim Bray >It should come in event-stream flavor and tree flavor. > >Minimal event stream API: > >1. Doctype, returns: root type, external subset system/public idents >2. Element start, returns: type, element name-value pairs, whether it's empty >3. Text >4. End Element, returns: type > >Minimal tree API: > >1. Document, with methods: root type, system ID, public ID, root element >2. Element, with methods: parent, children, attributeValueByName, allAttributes >3. Attribute, with methods: name, value >4. Text (presumably hiding lazy evaluation) IMHO, it would be major mistake to combine XML parser client API and service provider API. I would much rather see something like Swing's TreeModel interface used as XML parser service provider API with opaque objects. public interface XmlTreeModel { public Object getRoot (); public Object getParent (Object child); ... } public interface XmlEventModel { public String getElementName (Object event); ... } public interface XmlEventProducer { public void addConsumer (XmlEventConsumer c); public void removeConsumer (XmlEventConsumer c); ... } public interface XmlEventConsumer { public void elementStarted (XmlElementEvent evt); public void elementEnd ed (XmlElementEvent evt); ... } XmlEvent is part of the client API which is mostly convenience class framework: public class XmlEvent extends EventObject { protected XmlEventModel model; protected Object object; ... } public class XmlElementEvent extends XmlEvent { public String getElementName () { return model.getElementName(object); ... } >I acknowledge this is grossly insufficient for basing an editor on. You want >that, use the DOM. Only a few choices have design implications: I think editing should be supported with another layer of interfaces so that basic interface can remain simpler. public interface MutableXmlTreeModel { public Object newElement (String name, ...); public void addAttribute(Object elem, String name, String value); ... } XML parser service provider API is mostly just interfaces and deals with opaque objects returned by XML parser implementations. XML parser client API consists of DOM classes uses opaque objects to drive parsers implementations (see XmlElementEvent above). Don Park xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sat Dec 13 22:45:50 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:29 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <3.0.32.19971213144717.009ba3f0@pop.intergate.bc.ca> At 02:00 PM 13/12/97 -0800, Don Park wrote: >IMHO, it would be major mistake to combine XML parser client API and service >provider API. I would much rather see something like Swing's TreeModel >interface used as XML parser service provider API with opaque objects. Hmm, your proposal is coherent, but why is it better? It's certainly a bit more complex than what I proposed, and I'd need to see evidence that my proposal fails to meet the needs of the basic application programmer. One of the things I did with Lark was hook it up to the Swing Tree Renderer/ JTree package, got a nice little XML document tree-walker, even works with Unicode fonts; I only needed calls like the ones I outlined and it was no big deal. - Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sat Dec 13 22:55:26 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:29 2004 Subject: XML vs the Dreaded Whitespace Message-ID: <3.0.32.19971213145704.0096f9d0@pop.intergate.bc.ca> At 03:00 AM 11/12/97 -0500, Chris Smith wrote: >Part of this work requires that these documents carry document >authentication information. This, in turn, requires that some regions >of an XML document must be transported *exactly*, and must be received >and checked identically so that the message authentication actually >works. That fact that we are considering the idea of including email >as a transport mechanism doesn't help matters. So your proposal is: (1) transcode into UTF-16 if necessary (2) digitally sign what you get after (1). I think this is a sensible way to go. Obviously, there are anomalies; will not be the same as which is surprising, but trying to find solutions may well not be cost-effective. You *might* want to consider losing the prologue and start checking just at the root element. You *might* want to consider normalizing namespace prefixes. You *might* want to normalize whitespace in markup. You *might*, etc etc etc etc; unless you are willing to commit to a full grove/propert-set model a la SGML's extended facilities, you may well be better off signing the instance as it sits. In particular, I think there are lots of things that would be easier and less trouble-prone to work around than line-breaking, which is well known to be highly error-prone. For example, in the line-break HERE-> how many space characters that you can't see follow the ">"? There might be a useful halfway point as follows; run it through an XML processor and sign just the combination of element type, attribute name-value pairs, and textual content that the processor emits; this allows you to finesse a lot of quoting/white-space/line-end issues; also it allows authors to use tricks like default attributes and internal entities that don't "really" change the content. On the other hand, I'd say that off the top, just digitally signing the UTF-i-fied characters as they sit is a reasonable way to go. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sat Dec 13 23:00:16 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:29 2004 Subject: XML Architectural Forms Message-ID: <199712132258.RAA00384@unready.microstar.com> I don't remember seeing an announcement here (apologies if I'm mistaken), but Eliot Kimber and James Clark have announced on comp.text.sgml a proposed ammendment to ISO 10744 that will make it possible to use Architectural Forms in XML. You can find the text of the ammendment at the following URL: http://www.ornl.gov/sgml/wg8/document/1957.htm Here's Eliot's example of a simple, well-formed XML document that uses the base architecture "isobase": This is very exciting, because if accepted, the ammendment will make it possible to solve the XML namespace problem with an International Standard, instead of forcing the W3C to throw together a consortium standard. Base architectures also provide a simple and elegant solution to multiple inheritance; for example, here's Eliot's example modified to implement _two_ base architectures: The element corresponds to in the isobase namespace and to in the mslbase namespace at the same time. Even more interesting is the ability to embed the architectural attributes in a DTD, so that they do not appear in the document instance at all. For example, you can create an external DTD like this: Now, every XML document that uses this DTD will implement the two architectures automatically, with no additional markup required: Authors won't even have to know that they're using architectural forms. Congratulations are due to Eliot and James for taking the time to start this process. David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 13 23:27:09 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:29 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal In-Reply-To: <005801bd0812$a1af84b0$0100007f@localhost> Message-ID: <3.0.1.16.19971214000336.3af777c6@pop3.demon.co.uk> I am listing the main calls from Lark and AElfred that I find useful. As you can see there is a great similarity - I confess that I find the AElfred ones slightly easier to understand. I suggest that Tim, David, Norbert if he's free, me and *anyone else who has written a java parser* decide on a synthesis of this lot. I think everyone has to be slightly flexible. I if I were to suggest, I like the AElfred model for accessing the DOCTYPE stuff - its simple and fairly close to the spec. I'd change the names where possible to be spec-compliant. I think Lark may have more precision on Entities. There is nothing difficult here - we don't need anything more - we just need to do it. I don't see why we can't iterate on these and come up with something in a week. I will undertake to hack JUMBO do it uses the resultant interface by choice. Let's get our act together! P. AElfred - document instance related stuff attribute(XmlParser, String, String, boolean) data(XmlParser, String) doctypeDecl(XmlParser, String, String, String) error(XmlParser, String, String, String, URL, int) processingInstruction(XmlParser, String, String) resolveEntity(XmlParser, String, String, URL) startDocument(XmlParser, String, URL) endDocument(XmlParser, int) startElement(XmlParser, String) endElement(XmlParser, String) XmlParser() XmlParser(String, URL) ------ Lark public boolean doAttlist(Entity e, Object[] parts) public boolean doDoctype(Entity e, String rootType, String publicID, String systemID) public boolean doEntityReference(Entity e, String name) public boolean doETag(Entity e, Element element) public boolean doInternalEntity(Entity e, String name, char[] value) public boolean doPI(Entity e, String PI) public boolean doSTag(Entity e, Element element) public boolean doSyntaxError(Entity e, String message, int c) public boolean doSystemBinaryEntity(Entity e, String name, String extID, String notation) public boolean doSystemTextEntity(Entity e, String name, String extID) public boolean doText(Entity ent, Element el, char[] text, int length) public boolean doWarning(Entity e, String message) public Element element() public class Attribute public Attribute(String name, String value) public Attribute(String name, Text text) public String name() public void setName(String name) public String value() public void setValue(String value) public void setValue(Text text) } public class Element public String type(); public Attribute[] allAttributes() public void setAllAttributes(Attribute[] attributes) public Attribute attribute(String name) public void setAttribute(String name, String value) public Vector children() public Element parent() } class Text public void addSegment(Segment segment) public Vector segments() { return mSegments; } public String string() } ---------------------- AElfred - DTD related stuff declaredAttributes(String) declaredElements() declaredEntities() declaredNotations() getAttributeDefaultValue(String, String) getAttributeDefaultValueType(String, String) getAttributeEnumeration(String, String) getAttributeExpandedValue(String, String) getAttributeType(String, String) getElementContentModel(String) getElementContentType(String) getEntityNotationName(String) getEntityPublicId(String) getEntitySystemId(String) getEntityType(String) getEntityValue(String) getNotationPublicId(String) getNotationSystemId(String) getProcessor() getPublicId() getSystemId() run() run(XmlProcessor) setProcessor(XmlProcessor) setPublicId(String) setSystemId(URL) Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 13 23:32:03 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:29 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal In-Reply-To: <005801bd0812$a1af84b0$0100007f@localhost> Message-ID: <3.0.1.16.19971214001236.5b0f1eac@pop3.demon.co.uk> At 14:00 13/12/97 -0800, Don Park wrote: >Tim and Peter, [...] > >IMHO, it would be major mistake to combine XML parser client API and service >provider API. I would much rather see something like Swing's TreeModel >interface used as XML parser service provider API with opaque objects. I think it's clear that we are not going to see just one API. Your suggestion, the grove plan, Xapi-J are all viable ways forward. The point is that Tim, DavidM, Norbert and I have all - independently - come up with fairly simple models for APIs which have a large degree of communality. They have the merit of being fairly simple for newcomers. None are required to be tree-structured. > >public interface XmlTreeModel { > public Object getRoot (); > public Object getParent (Object child); > ... >} > >public interface XmlEventModel { > public String getElementName (Object event); > ... >} > >public interface XmlEventProducer { > public void addConsumer (XmlEventConsumer c); > public void removeConsumer (XmlEventConsumer c); > ... >} > >public interface XmlEventConsumer { > public void elementStarted (XmlElementEvent evt); > public void elementEnd ed (XmlElementEvent evt); > ... I have looked at TreeModel in Swing and even implemented a simple JUMBO display on it. I have to confess that, being a Dumb Browser Hacker, I found it quite tough going. If the only interfaces to XML parsers are based on this level of abstraction a lot of people will find them hard. WE have been part way down this road before - look through XML-DEV discussions 6+ months ago. I think it's essential we home in on a moderately simple parser NOW - we know what we need to do - we simply need to agree on the precise components and the terminology. [...] > >>I acknowledge this is grossly insufficient for basing an editor on. You >want >>that, use the DOM. Only a few choices have design implications: > All I want is to get the DOCTYPE stuff from the file. AElfred now provides exactly what I want - we just need to agree it. > P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sun Dec 14 00:05:36 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:29 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <3.0.32.19971213160700.00970410@pop.intergate.bc.ca> At 12:03 AM 14/12/97, Peter Murray-Rust wrote: >I am listing the main calls from Lark and AElfred that I find useful. As >you can see there is a great similarity - I confess that I find the AElfred >ones slightly easier to understand. OK, let's get concrete. I think that the AElfred callbacks each having an XMLParser argument is a good idea. Also AElfred's names are better, the "Do*" prefix in Lark is silly. So on the event-stream stuff, I'd go with the AElfred model modulo the following changes: > attribute(XmlParser, String, String, boolean) It seems completely wrong to have an attribute event separate from start-element events. To start with, it suggests that the order of attributes is significant, which it is incorrect. Secondly, since much element-specific processing depends on what attributes are there, it is less convenient for the application programmer. Third, if the processor (as it must) does defaulting, he's going to have to do some attribute list wrangling anyhow, so it can't really be extra work. What's the boolean? I don't think the application author should to have to deal with anything but the name and value of attributes. Anyhow, I'd go with startElement(XmlParser processor, String type, Attribute[] attributes); and lose the attribute() method. > data(XmlParser, String) I feel that the 2nd argument should not be a String. It is a recipe for disastrous inefficiency if the processor has to cook up a java.lang.String object for every little chunk of text. Lark uses two arguments, a char[] array and a character count; the app can make a String if it needs to. If you find this awkward, create a new data type called Text so that if you need a String you can make it with lazy-evaluation in Text.toString(), but if you don't need it you don't build it. Also, it shouldn't be named "data" - it should be named characterData or charData or text or some such term that can be mapped directly to the spec. > resolveEntity(XmlParser, String, String, URL) I don't think entities have any place in the first cut of this interface. The processor exists to make these problems go away. Generalities: Lark has a thing where if any callback returns 'true', the parser drops out of its loop... which is awfully useful and easy I think. Lark will also re-enter, but this need not be a requirement. Also, for application programmers, especially dealing with smallish objects, a tree interface is very natural. I've written both event-stream and tree apps using Lark, and the trees are a lot easier to use for anything even moderately complex. So the API should have Element, Attribute, and Text classes. And it shouldn't (sorry Peter) be called YAXPAPI - how about SAX, Simple API for XML? Maybe SAX-J for the Java bindings. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Sun Dec 14 00:09:37 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <000f01bd0824$0c981420$0100007f@localhost> >Hmm, your proposal is coherent, but why is it better? It's certainly a >bit more complex than what I proposed, and I'd need to see evidence that >my proposal fails to meet the needs of the basic application programmer. >From the parser writer's point of view, they would rather create their own object model than changing the code to produce W3C DOM objects which will be incompatible with the version deployed in IE 4.0. My proposal allows parsers like MSXML to remain unchanged and still support W3C DOM. Furthermore it allows application programmers to access MSXML objects for features not supported by W3C DOM. public class XmlObject { Object peer; public Object getPeer () { return peer; } } public class XmlDocument extends XmlObject { ... } XmlDocument obj; Object peer = obj.getPeer(); if (peer instanceof com.ms.xml.om.Document) { com.ms.xml.om.Document elem = (com.ms.xml.om.Document)peer; elem.setOutputStyle(XMLOutputStream.PRETTY); ... My proposal makes it easier for parser writers to support the standard API and it does not limit applications programmers to the functionalities in the standard API. I have designed object-oriented software for fifteen years and I have learned from past mistakes that, while what I propose might seem more complex, it will meet the harsh reality of the marketplace better. >One of the things I did with Lark was hook it up to the Swing Tree Renderer/ >JTree package, got a nice little XML document tree-walker, even works with >Unicode fonts; I only needed calls like the ones I outlined and it was >no big deal. - Tim The reason I mentioned Swing's TreeModel was to point out the way it allows any tree structure to be used as model for JTree. It is true that you can use JTree's default model but then you end up with two models: XML document tree and JTree's default model tree which is resource intensive. Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Sun Dec 14 00:42:32 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <002201bd0828$a85d4200$0100007f@localhost> Peter, >I think it's clear that we are not going to see just one API. Your >suggestion, the grove plan, Xapi-J are all viable ways forward. The point >is that Tim, DavidM, Norbert and I have all - independently - come up with >fairly simple models for APIs which have a large degree of communality. >They have the merit of being fairly simple for newcomers. None are required >to be tree-structured. First, I do not see the need for simple API. Having a simple API now will definitely help control propliferation of proprietary XML parser API but, in the long run, it will restrict application programmers to the set of functionalities supported by the simple API. Second, the cat is already out of the bag. For example, MSXML is already in IE 4.0 and it is being used by JScript and Java applet programmers. >I have looked at TreeModel in Swing and even implemented a simple JUMBO >display on it. I have to confess that, being a Dumb Browser Hacker, I found >it quite tough going. If the only interfaces to XML parsers are based on >this level of abstraction a lot of people will find them hard. My proposal was mainly for the parser writers and not the application writers. Application writers will not be using XmlTreeModel but DOM objects. My point was that interfaces like XmlTreeModel should be used to write DOM framework so that the framework can support all existing and future XML parsers. >WE have been part way down this road before - look through XML-DEV >discussions 6+ months ago. I think it's essential we home in on a >moderately simple parser NOW - we know what we need to do - we simply need >to agree on the precise components and the terminology. I was not here 6+ months ago and I do not believe that just because there has been previous discussions makes my proposal any less worthy. Frankly, I am disappointed by the fact that there was no immediate understanding of the advantages my proposal offers. It is partly my fault since I am pretty bad at explaining things. However, I am disturbed that, while there is a wealth of SGML and XML knowledge present in this mailing list, there seem to be a lack of object-oriented design knowledge. I do not say this insultingly but with concern. I appologize if anyone took my opinion negatively. >All I want is to get the DOCTYPE stuff from the file. AElfred now provides >exactly what I want - we just need to agree it. All one wants is not necessarily what everyone wants and will want. Design of a standard API should be approached more carefully and with future in mind. I am sorry if my comments upset you in anyway. It was not my intention. Sincerely, Don Park xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sun Dec 14 00:59:06 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <3.0.32.19971213170037.009997e0@pop.intergate.bc.ca> At 04:38 PM 13/12/97 -0800, Don Park wrote: >First, I do not see the need for simple API. That's where we part company. Generations of hypertext theorists saw no need for anything as simple as HTML/HTTP, then generations of SGML implementors saw no need for anything as simple as XML. I agree that in the general case, you need something quite a bit more sophisticated than what we're proposing; that's what the DOM is for. We're getting a lot of static in the XML project from people who feel that XML is already too complicated and they want to see elements 'n' attributes and that's all they want to see. I happen to think they're right; when I'm writing XML apps, that's all I care about 99% of the time. So why not create a simple API that will give them what they want? I should point out that what we're talking about could be implemented on top of the DOM in about 15 minutes. And on top of the MS IE4 machinery. And as for those who are currently tying themselves to Microsoft's proprietary interfaces, especially given that Microsoft is saying in public that they plan on full DOM compatibility (even if at the same time they are encouraging everyone to starting using "Dynamic HTML" right now) they'll get what they deserve. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sun Dec 14 02:03:00 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal In-Reply-To: <3.0.32.19971213160700.00970410@pop.intergate.bc.ca> References: <3.0.32.19971213160700.00970410@pop.intergate.bc.ca> Message-ID: <199712140201.VAA00351@unready.microstar.com> Tim Bray writes: > > attribute(XmlParser, String, String, boolean) > > It seems completely wrong to have an attribute event separate from > start-element events. I have worried about this myself. My design goal with ?lfred has been to limit myself to two class files: one for the parser itself, and one for the interface for the callbacks -- hence the separate event for attributes. This decision has forced some pretty severely hacked-up internal code accompanied by very careful documentation. I could send a hashtable of attribute names and values with the startElement() callback, and let users look up types (etc.) with my query methods, but I would have to lose a bit on two counts: 1) Allocating a new hashtable for every start tag will slow down the parser a fair bit. 2) I'd have no way to show which attributes were specified and which were defaulted (see below). > What's the boolean? I don't think the application author should > to have to deal with anything but the name and value of attributes. The boolean tells whether the attribute was specified or defaulted. I include this to allow people to do useful XML-to-XML transformations. > > data(XmlParser, String) > > I feel that the 2nd argument should not be a String. It is a recipe > for disastrous inefficiency if the processor has to cook up a > java.lang.String object for every little chunk of text. The overhead isn't that bad with ?lfred because I coalesce my data into the largest chunks possible before allocating the String. I think that returning a char[] array would be confusing for users, and would lead to many bugs in their code as they ignored our warnings not to rely on the value in the char[] array outlasting the callback. > Lark uses two > arguments, a char[] array and a character count; the app can > make a String if it needs to. If you find this awkward, create > a new data type called Text so that if you need a String you > can make it with lazy-evaluation in Text.toString(), but if you > don't need it you don't build it. Again, I'm reluctant to create new classes beyond XmlParser and XmlProcessor. > Also, it shouldn't be named "data" - it should be named > characterData or charData or text or some such term that can > be mapped directly to the spec. Agreed. I will not change ?lfred now, but I think that this is a good idea. > > resolveEntity(XmlParser, String, String, URL) > > I don't think entities have any place in the first cut of this > interface. The processor exists to make these problems go away. Normally, you should just return the URL argument; however, this callback gives users a chance to do public-identifier resolution, URL substitution, etc., and to return a different URL if desired. For example, if we had a DTD at http://www.microstar.com/XML/msldoc.dtd and you had a local copy, you could substitute a local URL on your own computer. Likewise, you could do a catalogue lookup on the public identifier "-//microstar//DTD Microstar Sample Document//EN" and choose a different system identifier than the default supplied in the document. That said, I agree that this probably doesn't belong in the common event API. > Generalities: > Lark has a thing where if any callback returns 'true', the > parser drops out of its loop... which is awfully useful and easy > I think. Lark will also re-enter, but this need not be a requirement. Awfully easy with a DFA-driven parser, but trickier with a recursive-descent parser like ?lfred. I'd probably have to throw an exception, and could not allow any kind of re-entry. > Also, for application programmers, especially dealing with smallish > objects, a tree interface is very natural. I've written both > event-stream and tree apps using Lark, and the trees are a lot > easier to use for anything even moderately complex. So the API > should have Element, Attribute, and Text classes. Perhaps -- I may have to give in an allow ?lfred to use more than one class file; or alternatively, these would be an optional extra, along with the SAX-J layer. > And it shouldn't (sorry Peter) be called YAXPAPI - how about SAX, Simple > API for XML? Maybe SAX-J for the Java bindings. -Tim How about RUSTY? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sun Dec 14 02:21:24 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <3.0.32.19971213182146.0095b780@pop.intergate.bc.ca> At 09:01 PM 13/12/97 -0500, David Megginson wrote: >I have worried about this myself. My design goal with ?lfred has been >to limit myself to two class files: one for the parser itself, and one >for the interface for the callbacks -- hence the separate event for >attributes. This decision has forced some pretty severely hacked-up >internal code accompanied by very careful documentation. Hmm, isn't this what JAR and so on are for? Seems like an awfully severe design constraint. I certainly agree with "small" as a design goal, but it seems like limiting class file count carries a pretty high price. - Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Sun Dec 14 02:39:32 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <3.0.32.19971213184107.009a3d60@pop.intergate.bc.ca> At 09:01 PM 13/12/97 -0500, David Megginson wrote: > > What's the boolean? I don't think the application author should > > to have to deal with anything but the name and value of attributes. >The boolean tells whether the attribute was specified or defaulted. I >include this to allow people to do useful XML-to-XML transformations. No. Not of interest to people who just want to see elements and attributes. The whole point of using an XML processor is that it takes care of these details for the application programmer. Leave it out for now. If you want XML-to-XML you need a lot more, go use the DOM. > > > data(XmlParser, String) > > I feel that the 2nd argument should not be a String. It is a recipe > > for disastrous inefficiency if the processor has to cook up a > > java.lang.String object for every little chunk of text. > >The overhead isn't that bad with ?lfred because I coalesce my data >into the largest chunks possible before allocating the String. I >think that returning a char[] array would be confusing for users that's a fair point; the correct solution per design principles is to have a Text class that could give you a String if you asked it; since many applications will ignore the comment of many elements, it seems vital not to have an interface that makes lazy evaluation impossible. So I think you have to go for either the char[] trick or another class. > > Lark has a thing where if any callback returns 'true', the > > parser drops out of its loop... which is awfully useful and easy > > I think. Lark will also re-enter, but this need not be a requirement. > >Awfully easy with a DFA-driven parser, but trickier with a >recursive-descent parser like ?lfred. But it seems completely unreasonable, if I call the parser mainline, not to have a way to get control back. I guess you could get the client callback to throw an exception... blecch. If exceptions are going to be thrown, it's better to hide all this stuff within the processor and not make each application do it. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Dec 14 07:34:39 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal In-Reply-To: <3.0.32.19971213160700.00970410@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971214083114.5b3f006e@pop3.demon.co.uk> At 16:07 13/12/97 -0800, Tim Bray wrote: >At 12:03 AM 14/12/97, Peter Murray-Rust wrote: >>I am listing the main calls from Lark and AElfred that I find useful. As >>you can see there is a great similarity - I confess that I find the AElfred >>ones slightly easier to understand. > >OK, let's get concrete. I think that the AElfred callbacks each having >an XMLParser argument is a good idea. Also AElfred's names are better, >the "Do*" prefix in Lark is silly. So on the event-stream stuff, I'd >go with the AElfred model modulo the following changes: This seems eminently reasonable - if DavidM is listening I suggest we can get this sorted very quickly. > >> attribute(XmlParser, String, String, boolean) > >It seems completely wrong to have an attribute event separate from >start-element events. To start with, it suggests that the order of >attributes is significant, which it is incorrect. Secondly, since much >element-specific processing depends on what attributes are there, it is >less convenient for the application programmer. Third, if the processor >(as it must) does defaulting, he's going to have to do some attribute >list wrangling anyhow, so it can't really be extra work. I cut the documentation out to save space on the list. boolean isSpecified (although this doesn't match with the documentation for the Parameters, David...) > >What's the boolean? I don't think the application author should >to have to deal with anything but the name and value of attributes. > >Anyhow, I'd go with > >startElement(XmlParser processor, String type, Attribute[] attributes); So would I. > >and lose the attribute() method. > >> data(XmlParser, String) > >I feel that the 2nd argument should not be a String. It is a recipe >for disastrous inefficiency if the processor has to cook up a >java.lang.String object for every little chunk of text. Lark uses two >arguments, a char[] array and a character count; the app can >make a String if it needs to. If you find this awkward, create >a new data type called Text so that if you need a String you >can make it with lazy-evaluation in Text.toString(), but if you >don't need it you don't build it. Seems reasonable. > >Also, it shouldn't be named "data" - it should be named >characterData or charData or text or some such term that can >be mapped directly to the spec. > >> resolveEntity(XmlParser, String, String, URL) > >I don't think entities have any place in the first cut of this >interface. The processor exists to make these problems go away. Lark has entities: public boolean doSystemTextEntity(Entity e, String name, String extID) and two others... > >Generalities: >Lark has a thing where if any callback returns 'true', the >parser drops out of its loop... which is awfully useful and easy >I think. Lark will also re-enter, but this need not be a requirement. > >Also, for application programmers, especially dealing with smallish >objects, a tree interface is very natural. I've written both >event-stream and tree apps using Lark, and the trees are a lot >easier to use for anything even moderately complex. So the API >should have Element, Attribute, and Text classes. I won't quarrel with this. I would be very happy for a tree interface, because JUMBO is based on trees. However I didn't want to subclass Lark's trees if we decided on a different one, because unlike an event stream, that could take a major rewrite of JUMBO. IFF we can standardise now, I'll be very happy. > >And it shouldn't (sorry Peter) be called YAXPAPI - how about SAX, Simple Of course it shouldn't - I would second the use of Simple somewhere in it. >API for XML? Maybe SAX-J for the Java bindings. -Tim > Sounds great. let's make sure we get 100% of the way this time. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Dec 14 09:37:54 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:30 2004 Subject: XML-DEV (was Re: YAXPAPI) In-Reply-To: <002201bd0828$a85d4200$0100007f@localhost> Message-ID: <3.0.1.16.19971214102740.5b0ff64a@pop3.demon.co.uk> In replying to Don I'm taking the opportunity to re-iterate and refine some ideas about the role of XML-DEV. Thanks Don, I think I know how you feel and will try to address it. There is no suggestion that your ideas are not valuable. Since I try to develop this list as a collaborative communal arena, I'll outline my underlying ideas. There is only one formal route for XML discussion - XML-WG, with about 10 members chosen from the W3C. They are supported by a larger virtual body of about 100 experts (XML-SIG). The WG asks the SIG to consider proposals for XML and related things (XLL, XSL), listens very carefully to what the XML-SIG says, makes changes, and has regular votes to firm up on the spec. This culminated in last week's PR. One of the important ideas of XML was that it should be *simple*. Design goal 4 in the spec is: "It shall be simple to write programs which process XML documents". This was exemplified by the 'Mythical CompSci grad student' who could hack a non-validating XML parser in 2 weeks. [This person is still quite mythical :-)] There has also been assumption that the 'desperate Perl hacker' (DPH) is an important feature of the emerging XML scene. This person doesn't necessarily use XML tools to manage XML documents - if they wanted to change a tag they'd just use: s///g and most of the time this type of approach works. I was invited to be part of the XML-SIG although I am not an SGML expert and have never read 8879. My role has emerged as representing the DPH (or worse). I have been described as a 'bellwether' and a Dumb XML Browser Hacker in both of which I take pride as it legitimises my self-appointed role. This is, very simply, to represent the 99% of future XML users who know nothing about SGML, objects, DTDs, parameter entities, etc. BUT who (at least in my vision) want to be more than passive consumers of shrink-wrapped systems. I felt that HTML (actually HTTP) was an enormous liberating force because it allowed people to publish for the first time. The great success of HTML was that anyone could play - you could create HTML documents after a few hours' experimentation. It was easy - we have discovered that ease has its price - I feel it's an acceptable one. XML also has the capability to make publishing available for 'everyone', but only if it is made simple enough to be a self-replicating idea ('meme'). So I - as 'webhacker' - have consistently argued for simplicity in XML. At the other end of the spectrum are SGML experts who want XML to provide WWW support for any current SGML application. The WG has to find a practicable way forward, and we are accustomed to 'disappointment'. Personally I think XML is too complex and too difficult to understand - I have made my views known here :-) [I have argued for the removal of dual quoting, , NOTATION (which I *still* do not fully understand). I have argued that the WG should address whitespace more proactively. I have said that XLL is too abstract and needs further elaboration.] I know that the WG considers all suggestions and perhaps 1% of what I say has some effect on the final spec. I'll settle for that. It has been made very clear that the WG will not address implementation issues. They understand them, and make decisions based on them, but they do not want to constrain how people use XML. I applaud this, because XML will not be the vision of what its creators have now (in 1997) but the accumulated experimentation of the world over the next few years. What the WG has addressed is a language which is both robust and flexible - two extremely difficult things to bring together. I am sure that everyone involved in the XML process thinks "they have got bits wrong" but we are all prepared to work with what emerges. So - to XML-DEV. There is a clear vacuum between the spec and working applications of XML, and XML-DEV was offered as a way to fill it. It has no formal status - it's supported by the goodwill of Henry Rzepa and myself (both molecular scientists - Henry does theoretical calculations on molecules and my 'day job' is to help people learn how to design new drugs). We have a not very hidden agenda in wishing XML to prosper, but we feel we represent an average vertical XML community in the future. Personally I find SGML very hard. Perhaps this is because I don't use it every day and because I think in concrete terms (being an experimental scientist). Words like 'entity' do not bring immediate enlightenment. I do not fully understand XLL, I do not understand groves, I do not understand formal design of interfaces, I do not understand the DSSSL spec, I do not (at least yet) understand the DOM. But I represent 99% of future XML users. I do not feel I and others should be disenfranchised - that may be unrealistic and Quixotic, but at least I enjoy the windmills. In setting up XML-DEV I assumed that lots of people would be developing software (initially prototypes) for XML, and would need a discussion forum. I've been surprised how little software there has so far been. Not disappointed - I'm never disappointed in the virtual arena - what happens, happens. But personally I think the ratio of talk to action is too high - maybe that's my scientific background. I get a small amount of private mail that suggests that XML_DEV has a useful role, and that continuing to highlight the simple approach is valuable. There is also general support for a public collaborative forum. My ideal is to see communal activities arise out of XML-DEV - rather like the tcl, Linux, LaTeX, Perl and other efforts. I see the WWW as a biological system - lots of new species evolve and only a very few survive. Not always the apparently 'best'. We've had several goes at creating an API on this list. Take it as axiomatic that everyone has slightly different ideas - some are radically different. We catalysed the formation of Xapi-J (from John Tigue) - unfortunately no-one uses it because (I think) they are all waiting for the DOM. I am too impatient to wait for the DOM I am revising JUMBO and want to get out the next snapshot. Those of use who have written simple systems feel we have an urgent need to rationalise their interfaces. What we (or at least JUMBO) don't want is yet 6 more incompatible parsers. We believe that this is achievable in a short time. If so, it will give impetus to the communal approach. History will tell whether this is valuable :-) At 16:38 13/12/97 -0800, Don Park wrote: [...] >First, I do not see the need for simple API. Having a simple API now will ^^^^^^^^^^^^ I do. Remember, I'm Dumb :-) >definitely help control propliferation of proprietary XML parser API but, in >the long run, it will restrict application programmers to the set of >functionalities supported by the simple API. There was never any suggestion it would be the only API. Let's assume there are 3 APIs. - simple - Object based - grove based JUMBO uses the first. If someone says "I would really like JUMBO to sit on top of groves", I will appeal to the world for someone to have JumboGroves. [JUMBO is offered as a public communal project.] If no one comes to the party, too bad :-) > >Second, the cat is already out of the bag. For example, MSXML is already in >IE 4.0 and it is being used by JScript and Java applet programmers. I am publicly neutral about any software produced by commercial organisations. There have been some very good de facto standards in the past, a lot of adequate ones, and some awful ones. History will decide. My ideal - as stated above - is to provide an environment where the general mass of XML users have a chance to affect the design and implementation of XML systems. Maybe this is unrealistic? Please feel free to join in the software effort :-) > >>I have looked at TreeModel in Swing and even implemented a simple JUMBO >>display on it. I have to confess that, being a Dumb Browser Hacker, I found >>it quite tough going. If the only interfaces to XML parsers are based on >>this level of abstraction a lot of people will find them hard. > > >My proposal was mainly for the parser writers and not the application >writers. Application writers will not be using XmlTreeModel but DOM >objects. My point was that interfaces like XmlTreeModel should be used to *This* application writer uses NXP, Lark and AElfred because the DOM ain't ready and because he doesn't yet understand it :-) >write DOM framework so that the framework can support all existing and >future XML parsers. > >>WE have been part way down this road before - look through XML-DEV >>discussions 6+ months ago. I think it's essential we home in on a >>moderately simple parser NOW - we know what we need to do - we simply need >>to agree on the precise components and the terminology. > >I was not here 6+ months ago and I do not believe that just because there The list is archived on http://www.lists.ic.ac.uk/hypermail/xml-dev. I am not suggesting that it's all worth reading, but you might find the stuff about API useful. >has been previous discussions makes my proposal any less worthy. Frankly, I No one has doubted the worthiness of your proposal :-). If you can find people on XML-DEV who wish to take it up and implement it, I'd be *delighted*. Really. All that has happened is that three parser writers have decided to propose a particular way forward. >am disappointed by the fact that there was no immediate understanding of the >advantages my proposal offers. It is partly my fault since I am pretty bad No, Don. It's the inertia and the time pressures. For me, it would take me a week to understand. I don't understand the Consumers, etc. in the rest of java very well. I don't see where an EventConsumer is required in what I want to do. I understand the proposal strategically because it has the same look and feel of other things in Java. In a similar way I didn't understand John Tigue's API with ParserFactorys and so on - but those who did seemed to think they were a good way to do things. So - hope that someone less Dumb than me picks up on your idea :-) >at explaining things. However, I am disturbed that, while there is a wealth >of SGML and XML knowledge present in this mailing list, there seem to be a >lack of object-oriented design knowledge. I do not say this insultingly but We all have concerns. My concern is that there aren't enough people who are actively writing code and making it publicly available. My advice would be to go out and write something that you think does something useful and show people that it's a GoodThing. That's what I have done with JUMBO - very much the Dumb persons tool (you wouldn't like to look inside JUMBO - no Factories, no Consumers, etc.). If you or anyone would like to rewrite JUMBO properly I'd be *delighted* :-) >with concern. I appologize if anyone took my opinion negatively. One of the very positive aspects of XML/SGML is the incredible patience and politeness of people. There are no flamewars. If people get things formally wrong they are gently educated in a better way to do it. If their ideas are way off beam, they often won't get a response of any kind, but if they do it will be polite and helpful. > >>All I want is to get the DOCTYPE stuff from the file. AElfred now provides >>exactly what I want - we just need to agree it. > > >All one wants is not necessarily what everyone wants and will want. Design >of a standard API should be approached more carefully and with future in >mind. I don't disagree with this :-) You have your opportunity to convince people, right here. My own suggestion is that working software is a useful part of an argument. > >I am sorry if my comments upset you in anyway. It was not my intention. I don't get upset in virtual environments :-). [I did once :-), in a situation so bizarre it could have come straight out of a Shakespearean comedy. It's not polite to retell it.] Passion is important. People's ontologies are very dear to them. Flame wars arise from colliding ontologies. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From h.rzepa at ic.ac.uk Sun Dec 14 09:44:00 1997 From: h.rzepa at ic.ac.uk (Rzepa, Henry) Date: Mon Jun 7 16:59:30 2004 Subject: XML-DEV list errors on weekends Message-ID: As the person receiving all the list errors (undelivered mail etc) I try my best to delete all the ones that seem permanent (a significant proportion of people who try to subscribe do so with mail addresses that subsequently bounce). But increasingly, I notice that a large number of errors (undelivered mail) seem to occur only on weekends. I get perhaps 200-300 such errors each weekend, but fewer on weekdays Coming form a university background where we run 7 days a week, I am wondering whether in commerce, companies might implement policies where mail routers etc perhaps are taken down over weekends? Is there anyone out there who thinks there might be such a reason why weekends are problematic? Henry Rzepa. +44 171 594 5774 (Office) +44 594 5804 (Fax) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sun Dec 14 11:37:02 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal In-Reply-To: <3.0.32.19971213182146.0095b780@pop.intergate.bc.ca> References: <3.0.32.19971213182146.0095b780@pop.intergate.bc.ca> Message-ID: <199712141135.GAA00310@unready.microstar.com> Tim Bray writes: > At 09:01 PM 13/12/97 -0500, David Megginson wrote: > >I have worried about this myself. My design goal with ?lfred has been > >to limit myself to two class files: one for the parser itself, and one > >for the interface for the callbacks -- hence the separate event for > >attributes. This decision has forced some pretty severely hacked-up > >internal code accompanied by very careful documentation. > > Hmm, isn't this what JAR and so on are for? Seems like an awfully > severe design constraint. I certainly agree with "small" as a design > goal, but it seems like limiting class file count carries a pretty > high price. - Tim It is a painfully high price, especially in terms of coding difficulty; if NS 3.*, NS 4.*, MSIE 3.*, MSIE 4.*, and HotJava all accepted the JAR files (or any other archive format), then I wouldn't worry. As it stands, however, that is not the case, and it is essential that ?lfred be easy to use in existing browsers as well as future ones. That is the same reason that I didn't use any JDK 1.1 features, despite the fact that I _like_ JDK 1.1. I am willing to be convinced that an extra couple of class files won't make a difference to Java applet writers (with no special interest in XML), but I will need to hear that from them. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sun Dec 14 12:18:20 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:30 2004 Subject: Corrected Examples: XML Architectural Forms Message-ID: <199712141216.HAA00392@unready.microstar.com> Here are corrected examples for XML architectural forms, using the proposed amendment (note also the corrected spelling) to ISO 10744: Simple XML document with one base architecture: Simple XML document with two base architectures: DTD for simple XML document with two base architectures: Simple XML document two base architectures hidden in DTD: (Note that I have added quotation marks, in line with XML's handling of attribute values). The rest of my original message still applies. Thank you to Robin Cover for gently pointing out my first mistake, and for being too genteel to point out my second (my second-year Medieval English teacher told me that if I studied too much Medieval English, I'd never be able to spell again). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Sun Dec 14 12:18:41 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI and DOM Message-ID: I am a bit confused by the recent statements about the "complexity of DOM" and the proposed simpler alternatives for an object model. The DOM model seems as simple and direct as all the proposed alternatives. I could see suggesting changes (I have myself) but it would seem these should be relative to the DOM as it is or have some significantly new features. The core DOM 'content' information classes [with the read part of their interfaces[*1] ] are: public interface Node { public int /*NodeType*/ getNodeType(); public Node getParentNode(); public NodeList getChildren(); } public interface Element extends Node { public String getTagName(); public NodeList getAttributes(); } public interface Attribute extends Node { public String getName(); public NodeList getValue(); public boolean isSpecified(); }; public interface Text extends Node { public String getData(); public boolean isIgnorableWhitespace(); }; I can't see how you could get much simpler in the number of classes and the concept for each class[*2]. So if we have the Grove model and the DOM, of what value is another similar, less-standard standard object model? -------- I can see a different problem though: it may be that no model will be useful to standardize for the actual interfaces. Each application will want slightly different object models that have very small changes that are very significant to it. Two examples I have in the above are both from the same type of problem: restricted Typing. In the above interfaces I would much rather have NodeList->List [the JDK 1.2 interface for a general indexed collection] because I have many more implementations and functionality to use for manipulating lists than I do for NodeList [I could wrapper and delegate all the functionality but that is much more effort and less maintainable for no real benefit]. Likewise I would rather have Attribute's value be an Object or a String than a NodeList. These minor changes make the DOM interfaces themselves impossible to use: I can have interfaces just like them but they will have to be my own version. I suspect this may always be the case. I have helped build many large and small information system models and none of them committed to using exactly somebody else's code for the DomainModel[*3]. Having control over the model of the information your application works with is crucial to both good design and good/maintainable implementations. This isn't to say you can't use someone else's designs: that works excellently (e.g. Design Patterns and Analysis Patterns). You can even start with someone else's code but you will almost certainly need to modify that model ever so slightly (or majorly) at some point. An approach that works better than defining an exact ObjectModel (i.e. exact Types) to implement is to think from outside the Model: to the client and supplier points of views. From the outside people only care about limited interfaces and protocols that a DomainModel must support to work with them. This is how Swing's TreeModel works (as long as you support the TreeModel interface you are worthy) and other 'M's in the MVC pattern. This is also how Java Beans work, but with a runtime signature-binding approach. In all these cases, the client/supplier requirements come first and you can decide if you want to work with them by suitably designing and implementing your DomainModel. So I suspect all of the following are true: (1) The DOM interfaces will be exactly suitable to some applications (2) There are many applications that the DOM interfaces (as exact code) will not be suitable (3) The DOM model is a good design model and template for a good number of these applications (4) It would be good to suggest possible modifications to the DOM to either make it better or as possible alternatives for people in situation (2) (5) There are many good reasons to start defining the possible clients and services that (Document) DomainModels may want to use. [*4] (6) There is no reason to have a similar model to the DOM and make it a semi-standard (7) Frequently (2) will turn to using (3), (4), and (5) to make a suitable model, so these will be very valuable. So it would seem good to focus on all of (1)-(5) in the above but not on (6) except as it helps to understand the others[*5]. --Mark mark.fussell@chimu.com [1] I made a couple minor stylistic/convention changes (e.g 'is' for booleans) to these interfaces. [2] I coded a skeleton implementation (able to construct, inspect, and print objects) of the level-1 DOM model (i.e. including the DocumentType classes) in a part of an evening and offered to provide it as source in a previous email. [3] Except, for a while, when the model can be extended without changing the source (a Smalltalk/ENVY feature). [4] As an example of (5) in DOM, the DOM interfaces are generally Java Bean compatible. This is very useful in Java: the MONDO DOM ObjectBuilder had exactly one line of code to specify how to take a recipe for a (for example) ModelGroup and build a ModelGroup object: addBeanFactoryFor_toBuilder(ModelGroupClass.class,builder); [5] In the above I am not referring to an event oriented API, but will respond to that in a different email. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Dec 14 15:29:54 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal In-Reply-To: <199712141135.GAA00310@unready.microstar.com> References: <3.0.32.19971213182146.0095b780@pop.intergate.bc.ca> <3.0.32.19971213182146.0095b780@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971214152950.315f45c6@pop3.demon.co.uk> At 06:35 14/12/97 -0500, David Megginson wrote: >Tim Bray writes: [...] > > > > Hmm, isn't this what JAR and so on are for? Seems like an awfully > > severe design constraint. I certainly agree with "small" as a design > > goal, but it seems like limiting class file count carries a pretty > > high price. - Tim The following assertions are based on ignorance and hearsay... As I understand it, if Java wants a method in a class, it loads the whole class into the virtual machine. Therefore if you have a large complex class you have a constant large overhead in terms of (a) HTTP connections (b) JVM space. I have a number of very large classes (e.g. > 100 member functions, some quite crunchy) so I have been thinking of doing the exact reverse to DavidM - i.e. splitting up my classes into smaller bits. Thus my MOLNode implements Drawable routines, Linkable (XLL), Editable, Validatable at least. If I have a very simple application it will still download all these functions (am I right?) and also keep them in the JVM so long as there is a re ference to an object of the class (am I still right?) So I am thinking of splitting these into smaller chunks, such as DrawableMethods, etc which don't need to be loaded if not used. Would the same apply to AElfred? Thus if you had two chunks - DTD.class and Instance.class (or whatever) and the document instance had no DTD, you'd never need to load the DTD class, right? Poor old JUMBO comes to 500 Kbytes at least if it's all there. That includes things like matrix.diagonalise(), ProteinSequence.Align() and Bivariate.display(Axes). I am assuming that (a) things will speed up (b) classes can be cached client-side (c) the excitement of finally getting the display will hold the reader in her seat long enough. I'm certainly assuming that JAR files will happen (or equivalent). IOW I'm not designing for speed, but functionality. P. > >It is a painfully high price, especially in terms of coding >difficulty; if NS 3.*, NS 4.*, MSIE 3.*, MSIE 4.*, and HotJava all >accepted the JAR files (or any other archive format), then I wouldn't >worry. As it stands, however, that is not the case, and it is >essential that ?lfred be easy to use in existing browsers as well as >future ones. That is the same reason that I didn't use any JDK 1.1 >features, despite the fact that I _like_ JDK 1.1. I would assume it's possible to re-route the client to a non-JAR applet if required. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Sun Dec 14 17:13:57 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:30 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <002201bd08b3$2bf38630$0100007f@localhost> Peter, >Poor old JUMBO comes to 500 Kbytes at least if it's all there. That >includes things like matrix.diagonalise(), ProteinSequence.Align() and >Bivariate.display(Axes). I am assuming that (a) things will speed up (b) >classes can be cached client-side (c) the excitement of finally getting the >display will hold the reader in her seat long enough. I'm certainly >assuming that JAR files will happen (or equivalent). IOW I'm not designing >for speed, but functionality. Problem with relying on cached Java classes is that a typical browser user will flush the cache quite frequently (everyday in my case because one day of work leaves me with about 25 to 50 meg of useless web pages and images in my cache). I would prefer to leave the Java classes in the cache but current crop of browsers offers little control when it comes to cache content. My advice is to solve the download problem from user perception angle. Users expect applets to download fast (1 to 5 minutes) because they are expecting to see the applet as part of a web page. Their focus is on the content and not the code. They do not realize emotionally that content must be rendered by applets and applets take time to download. On the other hand, when they are asked to manually download something and install it, they display more patience because they know they are downloading software and not content. They are already familiar with the timescale of getting and installing new software so wait of 10 minutes to 1 hour is not going to tick them off. One added bonus is that, since you can install into browser's classpath, you get higher security clearance. If you really need to go the download-on-demand applet route, you can divide up your classes into two parts. First part is a small set of classes with following objectives: 1. Put something up to grab user's attention. Amuse him with something or render non-editable view. 2. Prefetch resources such as XML files and the second part. The second part is the full set of classes. The point is that something like your XML browser applet will usually display some XML files which are not fetched until all the classes are downloaded unless they are prefetched using a scheme like the above. Hope this helps, Don Park xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From simeons at allaire.com Sun Dec 14 21:55:22 1997 From: simeons at allaire.com (Simeon Simeonov) Date: Mon Jun 7 16:59:31 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <01bd08db$a2559660$4a15b5cd@sim.allaire.com> I come to this discussion late (4:30pm EST on Sunday :) so my set of assorted notes is addressed at no one in particular. I like the acronym SAX. It's short and sweet. In principle I agree with the idea that an API simpler than what DOM exposes will be useful. This is especially true in the short run--until fully DOM compliant implementations with a variety of language bindings become readily available. I absolutely agree with the need for both event-driven and a tree-based interfaces. My product, the Cold Fusion Application Server, needs both. And it really only needs to know about text, elements, and attributes. All else is currently of no interest to the tens-of-thousands of web application developers that use CFAS. A note of caution. I hope that in your mind SAX is not the same as SAX-J. Some of the API proposals I have seen have a very strong Java flavor. For example, I see the need for an API that does not require runtime type information. The equivalent of instanceof in C++ is the dynamic_cast() operator. It requires the enabling of RTTI which imposes an immediate and quite noticeable size and performance penalty. IMHO, runtime type information is necessary only when the object model of a system is undergoing continuous change. I don't see this being the case with SAX. I cannot invest the time in writing an XML parser in C++ right now, but I'd be more than happy to contribute to this discussion to make sure that SAX is a C++-friendly API. Regards, Simeon Simeonov Allaire xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Dec 14 22:57:18 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:31 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal In-Reply-To: <01bd08db$a2559660$4a15b5cd@sim.allaire.com> Message-ID: <3.0.1.16.19971214234149.51f7a68e@pop3.demon.co.uk> At 17:00 14/12/97 -0500, Simeon Simeonov wrote: Thanks very much Simeon, >I come to this discussion late (4:30pm EST on Sunday :) so my set of >assorted notes is addressed at no one in particular. > >I like the acronym SAX. It's short and sweet. So do I. > [...] > >A note of caution. I hope that in your mind SAX is not the same as SAX-J. >Some of the API proposals I have seen have a very strong Java flavor. For I agree with your point - personally I have no idea how to write a language independent API, but for this one I suspect it's fairly straightforward because of the relative simplicity. >example, I see the need for an API that does not require runtime type >information. The equivalent of instanceof in C++ is the dynamic_cast() >operator. It requires the enabling of RTTI which imposes an immediate and >quite noticeable size and performance penalty. IMHO, runtime type >information is necessary only when the object model of a system is >undergoing continuous change. I don't see this being the case with SAX. This seems to make sense. I think the main area where this might be used is in children, where a child could be either an Element or PCDATA, and you found out which by asking it. I assume it can be managed with strong typing as well. > >I cannot invest the time in writing an XML parser in C++ right now, but I'd >be more than happy to contribute to this discussion to make sure that SAX is >a C++-friendly API. I think that's a very useful offer :-) I have been thinking as we go how we manage other languages like tcl and Perl (I know tcl, but not Perl). I assume some parts of the interface can almost be translated algorithmically, but others may be tricky. [Even I am not going to ask for a FORTRAN interface :-)] P. > >Regards, > >Simeon Simeonov >Allaire > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From eliot at isogen.com Sun Dec 14 22:58:07 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:59:31 2004 Subject: XML Architectural Forms Message-ID: <3.0.32.19971214165523.006a86d0@swbell.net> At 05:58 PM 12/13/97 -0500, David Megginson wrote: >I don't remember seeing an announcement here (apologies if I'm >mistaken), but Eliot Kimber and James Clark have announced on >comp.text.sgml a proposed ammendment to ISO 10744 that will make it >possible to use Architectural Forms in XML. You can find the text of >the ammendment at the following URL: Dave, Thanks for the announce. Unfortunately, my original post contained an error, which was inadvertently carried forward into your post. The examples should read: And And finally, I appologize for any confusion my original error has caused. Cheers, Eliot --

W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com

This is a para

> > is invalid. Think about what the code above means in *HTML*:

This is a para

Now I suspect you understand why the docTypeDeclName exists and in XML must always be the same as the type of the explicitly tagged root element. Since XML has no minimization, it is redundant of course. WebSGML allows you to use the keyword #IMPLIED (but XML does not) to remove that redundancy. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Tue Dec 16 17:00:28 1997 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 16:59:33 2004 Subject: CharData Message-ID: <01bd0a43$f276cb00$1e09e391@mhklaptop.bra01.icl.co.uk> -----Original Message----- From: Paul Prescod To: xml-dev@ic.ac.uk Date: 16 December 1997 16:30 Subject: Re: CharData >Chris Maden wrote: >> You are correct; ']]>' is forbidden in element content, as it should >> be. > >The reason for this is not just SGML compatibility, it is robustness. A >floating MDC is almost certainly an error in the document. I don't think you're being particularly user-friendly here. The most likely reason for a floating "]]>" is that the software-writer was lazy and forgot to escape it. If we assume that most XML will be software-generated, then it appears the only purpose of CDATA is to allow the software-writer to copy in a chunk of text without bothering to convert the <'s and &'s to < and &. But since he still has to check for any "]]>" in the text, and has no clear course of action if he finds one, it's not at all clear that it achieves this aim. As one who is currently writing software to generate XML, I have no intention of deliberately generating CDATA, and the need to avoid doing so by mistake is a complication I could do without. In practice I will just get round it by escaping all my >'s as well as my <'s. Mike Kay, ICL xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Tue Dec 16 17:21:26 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:59:33 2004 Subject: CharData In-Reply-To: <01bd0a43$f276cb00$1e09e391@mhklaptop.bra01.icl.co.uk> (M.H.Kay@eng.icl.co.uk) Message-ID: <199712161725.MAA20422@geode.ora.com> [Paul Prescod] > Chris Maden wrote: > > You are correct; ']]>' is forbidden in element content, as it > > should be. > > The reason for this is not just SGML compatibility, it is > robustness. A floating MDC is almost certainly an error in the > document. That's true; but I find it easier to argue from standards than from philosophy, especially in cases like this where others will disagree: [Michael Kay] > I don't think you're being particularly user-friendly here. The most > likely reason for a floating "]]>" is that the software-writer was > lazy and forgot to escape it. Then the writer *must* be warned. See _The SGML FAQ Book_, question 2.9; the potential messy ramifications of stray marked section end delimiters are many, and the potential damage quite high. -Chris -- http://www.oreilly.com/people/staff/crism/ +1.617.499.7487 90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From liamquin at interlog.com Tue Dec 16 17:43:31 1997 From: liamquin at interlog.com (Liam Quin) Date: Mon Jun 7 16:59:33 2004 Subject: CharData and escaping ]]> In-Reply-To: <01bd0a43$f276cb00$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: Michael Kay at ICL wrote: > If we assume that most XML will be software-generated, then it > appears the only purpose of CDATA is to allow the software-writer > to copy in a chunk of text without bothering to convert the <'s and > &'s to < and &. But since he still has to check for any "]]>" > in the text, and has no clear course of action if he finds one, > it's not at all clear that it achieves this aim. I'd say firstly that if you are writing software that works a character at a time, it is generally easier to avoid CDATA marked sections and to escape every < and & directly. If you use a marked section, you need up to 3 characters of lookahead, and you need to make sure that all of the following sequences pass through unscathed: ]]]]]]]]]]] ]> a]b]]c]]]d Secondly, the simplest way to escape ]]> is to insert a Unicode zero-width non-printing non-combining space between the ] and the >. This might be a pain for some applications, though. > In practice I will just get round it by escaping all my >'s > as well as my <'s. That's what I would do too. Lee -- Liam Quin -- the barefoot typographer -- Toronto lq-text: freely available Unix text retrieval IRC: Learn about XML/SGML/XSL/XLL/DSSSL on irc.dragonnet.org in #xml email address: l i a m q u i n, at host: i n t e r l o g dot c o m xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ser at javalab.uoregon.edu Tue Dec 16 18:01:58 1997 From: ser at javalab.uoregon.edu (Sean Russell) Date: Mon Jun 7 16:59:33 2004 Subject: Any XSL tool! References: <3496458B.4C47@bd748.pku.edu.cn> Message-ID: <3496C3A7.7438332A@javalab.uoregon.edu> Chang Ming wrote: > I think XSL is not off-topic in this list. > > I would like to know if there is any work done on XSL ,something like a > interpreter. > The only known tool seems the converter from XSL to DSSSL. Which converter are you talking about? Have you looked at docproc? http://javalab.uoregon.edu/ser/software/docproc_2/docs/index.xml I've been having a nighmarish time with the Java Web Server, for some reason, which doesn't want to stay running for more than 24 hours at a time. If the above link is down when you try it, please try back later. I have to go in and restart the server every once in a while. --- SER -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971216/0cd7bc7b/attachment.htm From mecom-gmbh at mixx.de Tue Dec 16 19:02:53 1997 From: mecom-gmbh at mixx.de (james anderson) Date: Mon Jun 7 16:59:33 2004 Subject: Message Length vs Processing Speed References: <01BD0190.2C9A9AD0@gren-exch-1.kpscal.org> Message-ID: <3496D174.A62E26A0@mixx.de> i'm not sure whether it falls within the lists scope to pose questions of the sort "what's the rational for this ?", but i hope so. i'm not sure where else would be more appropriate and as someone implementing a parser, when i discover stipulations which are non-intuitive i'm at least curious about the rational for some of the stipulated "conforming parser" behaviour and welcome the opportunity to at least ask why things are the way they are. today's question concerns dtd compactness Dolin,Robert H wrote: > Greetings XML-DEV list, > > We've been working on an SGML (?XML) syntax for HL7 messages,... i've read through the related hl7sgm3 document and discovered one concern which we share. among other things the document discusses the whether attribute definitions should be repeated as necessary or should be attached to an intermediate "type" element. where sgml permitted something like [53x] AttlistDecl ::= ' [53y] Nameopt ::= Name (S '|' S Name)* xml allows only [53] AttlistDecl ::= ' which forces one, as noted below (i trust the excerpt is, for discussion purposes, permitted.) to introduce extraenous elements. when i consider the relative effort of getting a parser to accept a name list and coding applications to treat the interposed elements as transparent, i don't undertstand why this sgml feature was not carried over? OPTION 1 OPTION 2 COMMENTS ? Example DTDs are currently using Option 1. ISSUES ? Option 1: ? Able to express more Required Value constraints in DTD. ? Easier to parse? ? Option 2: ? Define HL7 V2.3 data types just once, for all message DTDs. May be easier to maintain DTDs as data type definitions change. ? Recieving application can determine the data type of previously unknown data elements. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mecom-gmbh at mixx.de Tue Dec 16 19:24:54 1997 From: mecom-gmbh at mixx.de (james anderson) Date: Mon Jun 7 16:59:34 2004 Subject: XML syntax (was Re: external subset syntax) References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se> Message-ID: <3496D6AB.E55D721@mixx.de> greetings, perhaps it's time for a new role to complement the mcsgs, namely the npw - or niggeling parser writer - not rebelling, just niggeling. i admit to that fault. my problem is, whenever i come to a point in the proposed recommendation at which a parser is required to report an error and "must not continue normal processing" even though the result which the stream would denote would be sufficiently unambiguous if allowed, then i feel compelled to ask, "why does one have to exclude this"? which does not mean "in which production does the standard exclude or prescribe it", but rather why does the standard exclude or prescribe it. what is the useful purpose? particularly when excluding it makes the parser more complex and the document encoding more exacting. more than likely, when i've followed discussions of similar questions, the design goal #3 gets hoisted like a commandment: "XML shall be compatible with SGML". as a npw i tend to adhere more to #'s 1,4, 6, and 9: it should be easy to generate, easy to program, and easy to read. SGML processors are already pretty complex, so an argument to increase the complexity of XML in strictly order to keep SGML processors simpler is difficult to accept on logical terms. (i know i'm being naive here, and i'm ignoring the past, but i would wager that the future is going to bear me out...) the simplest thing would have been a document form which distinguished inline definitions, external references (ie XLL built-in), content, and (maybe) a declaration (autorecognition of encoding being the criteria on the latter). it is true, that that is all there, but the standard requires at least twice as many syntactic forms as are necessary. so despite having read mr murray-rust's note on background to the list itself (re: XML-DEV (was Re: YAXPAPI)) which gave me some sense of the effort which has gone into the proposed recommendation, the distance between the simple form of the denoted data and the complexity of the syntactic form often leads me to ask "why?" one such example concerns the external subset, xml declaration, doctype declaration, and text declaration. in particular, the productions [24] XMLDecl ::= '' [29] doctypedecl ::= '' [78] TextDecl ::= '' [80] ExtPE ::= TextDecl? extSubset i observe that, while one can well label the XMLDecl and TextDecl productions differently, lexically speaking they are not disjoint, and practically speaking there is no difference between their situation and that concerning the presence of a doctype form at a location analogous to that of the textdecl. yet one is "standard" and the other is "nonsense". not to a niggeling parser writer. from the stream content, the permitted case (almost) appears (by analogy to the remarks below) as one xml document within another. the other thing which is disconcerting is that the standard goes to great length to, on one hand, specify that the presence of an xml document may be introduced by a form with the (not)PI keyword 'xml' (all lower case only) but on the other hand engenders lexical ambiguity where it does not introduce a distinct keyword for the distinctly different purpose and context of specifying the encoding of the external dtd subset. why? Per-Ake Ling wrote: > > From jjc@jclark.com Mon Dec 15 11:59:21 1997 ... > > It is a requirement that the external subset *not* begin with a document > > type declaration. > > > If it were permitted, it would mean that there is a doctype declaration > within a doctype declaration, which is clearly nonsense. It is a common > misunderstanding that DTD means "document type declaration" instead of > "document type definition". > > Per-?ke > -- (as an aside, i didn't - and still don't - see that as, in itself, a sufficient explanation, since the case would comprise two instances of a "document type declaration": one in the xml document and the other in the prolog of the external portion of the "document type definition", which was referred to from the first, but is not contained in the first, and which serves to constrain the root element if so desired.) another example is the MDC (']]>') exclusion in CharData which means that one needs a state machine to scan character data. why? another example is that of [24], in itself, where the npw believes his point (in a previous posting) was misunderstood, and can only repeat the question why is a PI-close specified to be '?>' and not '>', which would be easier, or ('?>' | '>'), which would be robuster and observes (wrt to 'XML' itself) that the standard, cf #6 with irony, engenders an encoding where of the four obvious humanly legible encodings (that is, neglecting 'xMl' et.al.: ('' | '>')) only one is legitimized. why? if the precision of an encoding depends so much on uniqueness, then why does one start out with such a level of lexical complexity in the first place, only to then exclude much of it as 'malformed'? all you need is <, >, ', & and / (if you allow element recursion) - and even the distinction between < and > is more for the eye than anything else. Ingo Macherius wrote: > ... > > how about > > > > This is wrong, too. "xml" must be lower-case. > > > i've yet to understand why, but isn't that the way it needs to be? > > Why ? Productions [24] and [25] in section 2.8 ! > > [24]? XMLDecl ::= '' > [25]? VersionInfo ::= S 'version' Eq > ('"VersionNum"'|?"'VersionNum'") > > So the minimal correct PI is: > > ++im > -- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Tue Dec 16 19:48:29 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:34 2004 Subject: XML syntax (was Re: external subset syntax) In-Reply-To: <3496D6AB.E55D721@mixx.de> References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se> <3496D6AB.E55D721@mixx.de> Message-ID: <199712161946.OAA05109@unready.microstar.com> james anderson writes: > my problem is, whenever i come to a point in the proposed > recommendation at which a parser is required to report an error and > "must not continue normal processing" even though the result which > the stream would denote would be sufficiently unambiguous if > allowed, then i feel compelled to ask, "why does one have to > exclude this"? [...] > more than likely, when i've followed discussions of similar > questions, the design goal #3 gets hoisted like a commandment: "XML > shall be compatible with SGML". No, it's not SGML's fault, at least not this time. Conforming SGML parsers are allowed to continue processing if they want to, and are even allowed not to report errors at all (as long as they don't claim to be "validating parsers"). XML has gone way beyond any SGML requirements with this one. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Tue Dec 16 20:05:37 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:34 2004 Subject: Draconian error handling (was Re: XML syntax ) Message-ID: <199712162006.HAA07141@jawa.chilli.net.au> From: james anderson >my problem is, whenever i come to a point in the proposed recommendation at >which a parser is required to report an error and "must not continue normal >processing" even though the result which the stream would denote would be >sufficiently unambiguous if allowed, then i feel compelled to ask, "why does one >have to exclude this"? The requirement for "Draconian error handling" actually came from the HTML side not the SGML people. The reason was to ensure data integrity: if a document was compromised it should be clearly marked as such when passed to the application. Under no circumstances should something that is not well-formed be passed to an application as if it were. This is because XML is intended for more than just typed-text applications. It was thought that allowing all sorts of transparent error-recovery mechanisms would just reintroduce tag minimization in through the back door. Then people would start to rely on it, or at least write their XML to suit the error-recovery of particular parsers, and we would be back in HTML-land, where the effective grammar is too loose to be reliable. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Tue Dec 16 20:26:02 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:59:34 2004 Subject: XML syntax (was Re: external subset syntax) In-Reply-To: <3496D6AB.E55D721@mixx.de> (message from james anderson on Tue, 16 Dec 1997 20:31:30 +0100) Message-ID: <199712162030.PAA25069@geode.ora.com> [James Anderson] > my problem is, whenever i come to a point in the proposed > recommendation at which a parser is required to report an error and > "must not continue normal processing" even though the result which > the stream would denote would be sufficiently unambiguous if > allowed, then i feel compelled to ask, "why does one have to exclude > this"? which does not mean "in which production does the standard > exclude or prescribe it", but rather why does the standard exclude > or prescribe it. what is the useful purpose? particularly when > excluding it makes the parser more complex and the document encoding > more exacting. I am not particularly fond of this rule. However, I can explain its justification. The WG made this decision at the request of both Microsoft and Netscape. In the HTML arena, both companies spend a fair amount of their time reverse engineering the other's error- recovery behavior, since Web page authors "validate" by seeing if it looks OK in their browser of choice. By requiring parsers to fail on non-conformant documents, there is no chance that a user can think erroneous data is acceptable in a conforming browser; if a browser accepts the data, its opponent can level the charge that it is non- conforming. > more than likely, when i've followed discussions of similar > questions, the design goal #3 gets hoisted like a commandment: "XML > shall be compatible with SGML". as a npw i tend to adhere more to > #'s 1,4, 6, and 9: it should be easy to generate, easy to program, > and easy to read. SGML processors are already pretty complex, so an > argument to increase the complexity of XML in strictly order to keep > SGML processors simpler is difficult to accept on logical terms. (i > know i'm being naive here, and i'm ignoring the past, but i would > wager that the future is going to bear me out...) Rule 3 is critical for two reasons: (a) technologically, it allows easier application of existing SGML technology to the new problem space, and (b) politically, it encourages XML's adoption in rigorously standards-based arenas, like the Military-Industrial Complex. > the simplest thing would have been a document form which > distinguished inline definitions, external references (ie XLL > built-in), content, and (maybe) a declaration (autorecognition of > encoding being the criteria on the latter). it is true, that that is > all there, but the standard requires at least twice as many > syntactic forms as are necessary. so despite having read mr > murray-rust's note on background to the list itself (re: XML-DEV > (was Re: YAXPAPI)) which gave me some sense of the effort which has > gone into the proposed recommendation, the distance between the > simple form of the denoted data and the complexity of the syntactic > form often leads me to ask "why?" Many people have had discussions of the form "a markup language might ...", in which a clean, new theoretical language is designed. These discussions are useful and interesting, but completely outside of the scope of XML, whose charter was to enable the transfer of SGML over the Web. If you want to design such a language, and are successful in encouraging its adoption, many current SGMLheads would be very grateful. We use SGML because it is the best existing tool, not because it is the best possible. > (as an aside, i didn't - and still don't - see that as, in itself, a > sufficient explanation, since the case would comprise two instances > of a "document type declaration": one in the xml document and the > other in the prolog of the external portion of the "document type > definition", which was referred to from the first, but is not > contained in the first, and which serves to constrain the root > element if so desired.) And indeed, some older SGML software produces documents like this. This is a purely backwards-compatibility issue, from one point of view; disambiguation rules could easily be developed, but then that language would not be SGML. See the XML charter. > another example is the MDC (']]>') exclusion in CharData which means > that one needs a state machine to scan character data. why? This is because floating msc/mdc combos can get you later in a big way. See _The SGML FAQ Book_, and trust us on this. I'd recommend avoiding marked sections in the document instance altogether, but if you don't, *ALWAYS* escape any occurrence of ']]>' in data. > another example is that of [24], in itself, where the npw believes > his point (in a previous posting) was misunderstood, and can only > repeat the question why is a PI-close specified to be '?>' > and not '>', which would be easier, or ('?>' | '>'), which would be > robuster and observes (wrt to 'XML' itself) that the standard, cf #6 > with irony, engenders an encoding where of the four obvious humanly > legible encodings (that is, neglecting 'xMl' et.al.: (' '' | '>')) only one is legitimized. why? if the > precision of an encoding depends so much on uniqueness, then why > does one start out with such a level of lexical complexity in the > first place, only to then exclude much of it as 'malformed'? all you > need is <, >, ', & and / (if you allow element recursion) - and even > the distinction between < and > is more for the eye than anything > else. The pic *was* '>' in SGML. It was explicitly changed to '?>' for two reasons. One, there is no standardized way of escaping characters in a PI, so with pic='>' there's no way to put a greater-than in a processing instruction. '2)>' is illegal. Yes, you can use application conventions, but are authors going to buy ''? So, since '?>' is much less likely to occur *within* PIs, it makes a safer delimiter. Secondly, the symmetry is appealing, especially for new authors. Have you never seen used as a comment on Web pages? The syntax is more intuitive. Take the time to search the SGML WG archives (), which go through July of this year and are open to the public, and the XML SIG archives (address unknown). Searching them will lead to answers to many of these questions. See also the XML FAQ at . -Chris -- http://www.oreilly.com/people/staff/crism/ +1.617.499.7487 90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 16 21:40:23 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:34 2004 Subject: Any XSL tool! In-Reply-To: <3496458B.4C47@bd748.pku.edu.cn> Message-ID: <3.0.1.16.19971216223321.0fe7201a@pop3.demon.co.uk> At 17:10 16/12/97 +0800, Chang Ming wrote: Many thanks Chang Ming. >I think XSL is not off-topic in this list. It is absolutely appropriate. However: - XSL is at a very early stage. It's likely to undergo extensive changes - XSL is being discussed in the W3C process at present. Unfortunately for you the discussion cannot be made public except by the WG. - There is a discussion group for DSSSL (forget URL - at Mulberry? - someone will post this I'm sure). So that *may* be useful as well. > >I would like to know if there is any work done on XSL ,something like a >interpreter. >The only known tool seems the converter from XSL to DSSSL. This is the primary (and for many people the only) motivation for XSL (i.e. the precise and flexibly rendering of XML documents in 2D format). This is a very good question. I cannot answer for the WG, of course. All I can say is that my applications are not always textual and that I would love to have transformation facilities in an XSL-like language. So, always referring to the public spec of course, I would argue for the inclusion of additional ELEMENTs that could provide this. I'll probably experiment in JUMBO - (JUMBO doesn't do much formatting as elephants can't do joined up writing.) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 16 21:47:57 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:34 2004 Subject: XML syntax (was Re: external subset syntax) In-Reply-To: <3496D6AB.E55D721@mixx.de> References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se> Message-ID: <3.0.1.16.19971216220351.0fe77372@pop3.demon.co.uk> At 20:31 16/12/97 +0100, james anderson wrote: [... lots of "why?" appeals about XML ...] james - and others. i have enormous sympathy with your position. i have a somewhat unique role being an SGML-near-illiterate and yet being part of the SIG (was WG) process. i can't divulge any of the last 4 months material - it's confidential; the earlier stuff is archived. however i think it's allowable to say that enormous care has gone into this process. for example the case-sensitivity involved a huge amount of discussion with expert knowledge of many non-anglophone countries. similarly the DTD-stuff has had a huge amount of discussion. my own naive questioning about whitespace generated a large amount of material. what i have come to accept from a year on the SIG (was WG) is the precision of the process and the need for discipline. i - as do many SIG members - raise things they don't feel happy about, but when they are decided agree to try to make them work. my own personal concerns are littered publicly on XML-DEV :-). like you i find the different syntaxes very tedious because JUMBO has to read and parse both. of course i really enjoy writing parsers especially past midnight, and the best bit is tracking down the bugs, but others are different. so i sigh, and hack it. fwiw i translate all the non-XML syntax into XML internally because XML is superb to work with. (if anyone hasn't discovered that yet, it's because they don't have a full xml system.) xml is incredible. i can do things with JUMBO in a few hours that would have taken months before. it is very tough to have to ask you to take this on trust - i understand. at least i have had my say - or shout - and accept that i *have* shouted where necessary. *everything* has been listened to - not a sparrow chirps without the WG taking it on board (or some other poetic phrase - i probably misquote). it's important to realise that xml is part of a historical process. it was by no means certain that by 1997q4 we should have xml hyped throughout the world. it wouldn't have happened without a *huge* effort from the sgml community and we have them to thank. if, as a result, we have sgml-compatibility in xml that is an acceptable price for me. what i *hope* is that as a community we make the job of writing parsers as easy as possible. to do this we need APIs, communal libraries, test data, etc. so james(a) should be able to borrow a DTD-parser *off the shelf* in which case it's no big deal. p. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From norbert at datachannel.com Tue Dec 16 21:48:46 1997 From: norbert at datachannel.com (Norbert Mikula) Date: Mon Jun 7 16:59:34 2004 Subject: Any XSL tool! References: <3.0.1.16.19971216223321.0fe7201a@pop3.demon.co.uk> Message-ID: <3496F6FD.CE9FA249@datachannel.com> Peter Murray-Rust wrote: > - There is a discussion group for DSSSL (forget URL - at Mulberry? - > someone will post this I'm sure). So that *may* be useful as well. http://www.mulberrytech.com/dsssl/dssslist it is. -- Norbert H. Mikula Sr. Online Information Architect Norbert@DataChannel.com DataChannel, 155 108th Avenue NE Ste 400, Bellevue, WA 98004 Phone: 425.462.1999 Fax: 425.637.1192 http://www.datachannel.com -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 428 bytes Desc: Card for Norbert Mikula Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971216/8e99bb4f/vcard.vcf From kvale at phy.ucsf.EDU Tue Dec 16 21:56:41 1997 From: kvale at phy.ucsf.EDU (Mark Kvale) Date: Mon Jun 7 16:59:34 2004 Subject: Two typos in and a suggestion for the XML Proposal Message-ID: <199712162156.NAA09886@phy.ucsf.EDU> In updating my parser to the XML Proposal of 8 December, I find that there seems to be two typos in the EBNF production rules: 1) The encoding declaration [81] EncodingDecl ::= S 'encoding' Eq '"' EncName '"' | "'" EncName "'" should have parentheses around the quoted names: [81'] EncodingDecl ::= S 'encoding' Eq ('"'EncName '"' | "'" EncName "'") 2) The version info production [25] VersionInfo ::= S 'version' Eq ('"VersionNum"' | "'VersionNum'") Here VersionNum is a nonterminal, not a literal string, and I think what was meant was [25'] VersionInfo ::= S 'version' Eq ('"' VersionNum '"' | "'" VersionNum "'") I also have one suggestion for improvement of the proposal. The notation type production is [58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' Name)* S? ')' It allows for space before the the alternation '|' but not after. It would be more symmetric to have [58'] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' as in the enumeration production. Comments? -Mark xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 16 21:58:43 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:34 2004 Subject: external subset syntax In-Reply-To: <199712151630.RAA27288@uabs19c27.eua.ericsson.se> Message-ID: <3.0.1.16.19971216222503.21cfcd26@pop3.demon.co.uk> At 17:30 15/12/97 +0100, Per-Ake Ling wrote: [...] >Not only that, it is an underexploited feature in SGML that this is the >case. The only indication of real use of this feature in SGML comes from >Eliot Kimber, but I believe that it would be even more valuable in XML. > I agree. I have only just discovered a week ago that something like:
Hello world!
could be an allowable use of SGML. If I had realised this earlier I could have saved weeks of work in my CML DTDs. I have to say that the SGML community is *not* good at marketing the language - I don't *think* it deliberately keeps it opaque. It has proved extremely difficult to get hold of good newbie information on (say) architectural forms, HyTime, etc. Pleased to see some postings on XML-DEV about it but I don't appreciate things normally till I see a piece of software doing useful work :-) [No criticism to James Clark and those who have implemented everything, but there aren't many household applications yet.] P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 16 22:03:05 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:34 2004 Subject: General comments on parsers In-Reply-To: <199712151602.LAA13427@geode.ora.com> References: <3.0.1.16.19971211024212.2d87bafc@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971216221636.5477ed8a@pop3.demon.co.uk> At 11:02 15/12/97 -0500, Chris Maden wrote: >[Peter Murray-Rust] > >Not the Chris you were looking for, but the DOM is standardizing >access to XML DTDs, according to Lauren Wood's presentation at >SGML/XML '97. > Chris - and Lauren - this is excellent news. As always, I shall defer/convert to the official way of doing things when it comes. Any formally published timescale for this (or any summary of the SGML/XML 97?) On that last point - some of weren't able to get to the mtg - any feedback on this list would be very much appreciated. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 16 22:40:19 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:34 2004 Subject: LISTRIVIA (was Re: XML application) In-Reply-To: <34965B31.4F97F345@mixx.de> References: <8625656E.005D4668.00@Corpnotes.JCI.Com> <199712151804.TAA03231@sinfonix.rz.tu-clausthal.de> Message-ID: <3.0.1.16.19971216231631.0fe7f528@pop3.demon.co.uk> At 11:44 16/12/97 +0100, [... someone ...] wrote: [... stuff clipped to avoid identification ...] AND an unnecessary mail attachment which appeared to duplicate the posting and for which I have to pay for personally. > >Attachment Converted: "c:\eudora\attach\ReXMLapp.htm" PLEASE can you avoid mail attachments. I have received private mail in support of this view and I shall be very boring in pursuing this. It's not difficult to avoid, and for most people it's a waste of time and money. P. > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 16 22:55:39 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:34 2004 Subject: Common event-based parser API In-Reply-To: <199712161212.HAA00516@unready.microstar.com> Message-ID: <3.0.1.16.19971216231808.54779b84@pop3.demon.co.uk> At 07:12 16/12/97 -0500, David Megginson wrote: >Tim and I have taken some of the gritty details of our discussion >offline, and we have not yet managed to agree on how to return Wonderful! I wish you both well and the strength to persevere till it's finally caught and bottled. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 17 01:16:04 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:34 2004 Subject: external subset syntax In-Reply-To: <3.0.1.16.19971216222503.21cfcd26@pop3.demon.co.uk> References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se> <3.0.1.16.19971216222503.21cfcd26@pop3.demon.co.uk> Message-ID: <199712170113.UAA00333@unready.microstar.com> Peter Murray-Rust writes: > At 17:30 15/12/97 +0100, Per-Ake Ling wrote: > >Not only that, it is an underexploited feature in SGML that this is the > >case. The only indication of real use of this feature in SGML comes from > >Eliot Kimber, but I believe that it would be even more valuable in XML. > > > I agree. I have only just discovered a week ago that something like: > > >
Hello world!
> > > could be an allowable use of SGML. If I had realised this earlier I could > have saved weeks of work in my CML DTDs. Actually, this is by no means an underexploited technique in SGML; on the contrary, it's standard practice in larger projects. Some industry-standard DTDs like DocBook even repeat inclusion exceptions on many different element types (book, chapter, section, glossary, etc) so that any one of them can be used as the document element with identical results. Of course, each application (in the SGML sense) has its own rules. For example, Microstar is valid SGML, but it is not correct HTML. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Dec 17 01:18:46 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:34 2004 Subject: XML syntax (was Re: external subset syntax) References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se> <3.0.1.16.19971216220351.0fe77372@pop3.demon.co.uk> Message-ID: <34971A9E.8BA8ED5F@technologist.com> Peter Murray-Rust wrote: > > my own personal concerns are littered publicly on XML-DEV :-). like you i > find the different syntaxes very tedious because JUMBO has to read and > parse both. of course i really enjoy writing parsers especially past > midnight, and the best bit is tracking down the bugs, but others are > different. so i sigh, and hack it. fwiw i translate all the non-XML syntax > into XML internally because XML is superb to work with. I'm not sure what you mean. Do you really take (e.g.) an ELEMENT declaration and map it to a textual string ? Or do you mean that internally you represent it using the same data structure that you use to represent XML elements. If the latter, then you have just re-discovered the concept of a grove, and have also discovered why you can standardize processing software and data models without necessarily standardizing notation. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Dec 17 01:19:11 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:34 2004 Subject: CharData References: <01bd0a43$f276cb00$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: <34971C6A.CAE42A7C@technologist.com> Michael Kay wrote: > I don't think you're being particularly user-friendly here. The most > likely reason for a floating "]]>" is that the software-writer was > lazy and forgot to escape it. > > If we assume that most XML will be software-generated, Sure, if we make that assumption then we can make lots of "simplifications" to XML to make it harder to type and easier to generate. Then it can be as popular to end users as TeX, PDF or PostScript instead of as popular as HTML. Personally, I am not willing to make that assumption and I'm glad that the ERB did not. SGML would be just another forgotten technology if it had made that assumptions. Once we reject that assumption, the restriction on MDC is reasonable. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 17 02:41:37 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:34 2004 Subject: RFC: Simple XML Event-Based API for Java Message-ID: <199712170238.VAA00829@unready.microstar.com> After careful thought, I am fairly certain that I would be willing to accept the following simple event-driven API for �lfred. It meets most of Tim Bray's concerns (I'd love to hear from Norbert Mikula and from Chris Lovett), and it requires me to use only one extra class file (the new XmlProcessor interface -- see below): XmlApplication.java: ====================8<====================8<==================== import java.net.URL; import java.util.Dictionary; public interface XmlApplication { public void startDocument (XmlProcessor processor, String pubid, URL sysid); public void endDocument (XmlProcessor processor); public void startProlog (XmlProcessor processor); public void endProlog (XmlProcessor processor); public void startElement (XmlProcessor processor, String elname, Dictionary attributes); public void endElement (XmlProcessor processor, String elname); public void characters (XmlProcessor processor, char ch[], int start, int length); public void processingInstruction (XmlProcessor processor, String target, String data); public void error (XmlProcessor processor, String message, URL url, int line); } // end of XmlApplication.java ====================8<====================8<==================== The processor itself could implement the following interface (very Thread-oriented and Bean-like): XmlProcessor.java: ====================8<====================8<==================== import java.lang.Runnable; import java.net.URL; public interface XmlProcessor extends Runnable { public void setPublicId (String publicId); public String getPublicId (); public void setSystemId (URL systemId); public URL getSystemId (); public void setUserData (Object data); public Object getUserData (); public void addApplication (XmlApplication application); public void removeApplication (XmlApplication application); public void run(); } // end of XmlProcessor.java ====================8<====================8<==================== I would lose �lfred's resolveEntity() callback, the isSpecified boolean for attributes and the simple String argument for character data. Tim would lose the ability to return a boolean to stop the parse (the user would have to throw an exception), and would have to rename more of his callbacks. On the positive side, this interface would let you hang more than one application off the same parse, which could be very interesting. The userData property also gives users a chance to pass extra information to the processor easily, if they wish. This new XmlProcessor interface (actually a parser, but I'm using the XML spec's terminology here) does not preclude additional functionality -- I'll keep all of �lfred's DTD-query methods -- but neither does it standardise that functionality. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From antony at n-space.com.au Wed Dec 17 03:14:53 1997 From: antony at n-space.com.au (Antony Blakey) Date: Mon Jun 7 16:59:34 2004 Subject: RFC: Simple XML Event-Based API for Java References: <199712170238.VAA00829@unready.microstar.com> Message-ID: <34974325.6314AD85@n-space.com.au> Skipped content of type multipart/mixed-------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 4250 bytes Desc: S/MIME Cryptographic Signature Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971217/9ab0ef32/smime.bin From donpark at quake.net Wed Dec 17 04:48:14 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:34 2004 Subject: Simple XML Event-Based API for Java Message-ID: <000b01bd0aa6$753cce10$0100007f@localhost> David, Looks good in general. I have only a few comments and a couple of questions. I would rename XmlApplication and XmlProcessor to XmlConsumer and XmlProducer. It is just matter of current Java API tradition. Additionally, I would write a helper class XmlFilter. Producer/Filter/Consumer arrangement is a well known design pattern and it would be confusing to rename it. I would rename startProlog, endProlog, and processingInstruction to something more friendly. To most beginner XML programmers, they wouldn't know what PI is nor would they care. I would group all "abnormal" tags (with the exception of comments) as special elements and have a separate pair of start/end for them. I would add a separate method for comments text. Renaming characters() to content() might make it more clear to programmers about what the method does. I would also rename xetPublicId and xetSystemId to xetPublicID and xetSystemID. I usually change acronyms when they are used as prefix (XML to Xml) but not when they are used to postfix a name. It tend to look more legible. Would entities be resolved by XmlProcessor er, XmlProducers? Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Wed Dec 17 05:20:07 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:34 2004 Subject: RFC: Simple XML Event-Based API for Java References: <199712170238.VAA00829@unready.microstar.com> Message-ID: <34975FEE.1A5406F4@jclark.com> David Megginson wrote: > After careful thought, I am fairly certain that I would be willing to > accept the following simple event-driven API for ?lfred. I don't see the point of the XmlProcessor first argument. What's wrong with having the implementation of XmlApplication store the XmlProcessor in the member variable? (This is what SP typically does.) > public void > startDocument (XmlProcessor processor, String pubid, URL sysid); What do the pubid and sysid arguments represent? The document entity? > public void > startProlog (XmlProcessor processor); > > public void > endProlog (XmlProcessor processor); Why do you need startProlog() and endProlog()? > public void > startElement (XmlProcessor processor, String elname, > Dictionary attributes); > > public void > endElement (XmlProcessor processor, String elname); > > public void > characters (XmlProcessor processor, char ch[], int start, int length); > > public void > processingInstruction (XmlProcessor processor, String target, String data); The one major omission I see here is absense of information about the location (URL, byte offset, line number etc) of the events. It would be very nice to be able to implement validation as just as an XmlApplication (that wraps around another XmlApp). In others to to run without validation you would use: processor.run(new MyXmlApplication()); and to run with validation you would use processor.run (new ValidateXmlApplication(new MyXmlApplication)); In order to make this work the application needs to be able to get information about the location of start/end tags and of data. This is also useful for all kinds of application-specific validation. This could be done by having the app ask the processor for the location of the last event in some non-standardized way, but that's kind of kludgy. On the other hand, maybe this is just too fancy for a "simple" API. > public void > error (XmlProcessor processor, String message, URL url, int line); I don't think having simply "String message" is going to internationalize well. It's also desirable to know exactly what character number/column number the error occurred at. Also XML distinguishes fatal errors (which the parser must not continue processing after) from other errors. On the whole I would be inclined to handle fatal errors as an exception, and not try to deal with non-fatal errors at all in this simple interface. > On the positive side, this interface would let you hang more than one > application off the same parse, which could be very interesting. I don't think this is a good idea. It adds complexity and it's likely to impose a performance cost, but it doesn't buy you anything, because you can achieve that functionality with a MultipleXmlApplication class that implements the XmlApplication interface, and provides addApplication and removeApplication methods, and then forwards each event to the applications that have been added to it. > The > userData property also gives users a chance to pass extra information > to the processor easily, if they wish. Surely there are cleaner ways to do this sort of thing. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 07:09:48 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:34 2004 Subject: XML syntax (was Re: external subset syntax) In-Reply-To: <34971A9E.8BA8ED5F@technologist.com> References: <199712151630.RAA27288@uabs19c27.eua.ericsson.se> <3.0.1.16.19971216220351.0fe77372@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971217080357.18b7149e@pop3.demon.co.uk> At 19:19 16/12/97 -0500, Paul Prescod wrote: >Peter Murray-Rust wrote: >> >> my own personal concerns are littered publicly on XML-DEV :-). like you i >> find the different syntaxes very tedious because JUMBO has to read and >> parse both. of course i really enjoy writing parsers especially past >> midnight, and the best bit is tracking down the bugs, but others are >> different. so i sigh, and hack it. fwiw i translate all the non-XML syntax >> into XML internally because XML is superb to work with. > >I'm not sure what you mean. Do you really take (e.g.) an ELEMENT >declaration and map it to a textual string ? Or do you mean Just once - i.e. which I use in the "DTD" for the DTD. But this is a unique case. >that internally you represent it using the same data structure that you >use to represent XML elements. Yes! Yes!! > >If the latter, then you have just re-discovered the concept of a grove, >and have also discovered why you can standardize processing software and >data models without necessarily standardizing notation. Wow! This is a glorious day! I have been told I am using (very simple) groves *and* (very simple architectural forms) without realising! "Good Heavens! For more than [two years] I have been speaking [grove] without knowing it". I am clearly on a lifetime voyage to re-invent HyTime in my own fashion :-) Many thanks for this enlightenment. All I have to do is work out how to implement it sufficiently generically in JUMBO. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 07:10:34 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:35 2004 Subject: LISTRIVIA (was Re: RFC: Simple XML Event-Based API for Java) In-Reply-To: <34974325.6314AD85@n-space.com.au> References: <199712170238.VAA00829@unready.microstar.com> Message-ID: <3.0.1.16.19971217080814.49c74e72@pop3.demon.co.uk> At 13:42 17/12/97 +1030, [... a first-time poster on XML-DEV...] wrote: [... some useful stuff...] and then spoiled it by attaching 5 Kbytes (sic) of non-ASCII files that I, the majordomo software, the hypermail system and the rest of the world don't want and sometimes get high blood pressure about. I expect of you think - what a boring old person I am to keep on about this. After all what's 5 Kbytes? I had a colleague in Bratislava who some years ago was charged ONE US DOLLAR (yes, real grey green greasy money) for ONE KILOBYTE by his ISP. The price may have changed, but it expect it still costs more than he can afford. I was privileged to hear about scientific computing recently in the recently independent ex-USSR states. Some of these countries have a SINGLE 64KB LINE FOR THE WHOLE COUNTRY. I imagine that in most African countries it's even worse. Thoughtless attachments and quoting are a serious disadvantage to people who are really struggling. The XML community has taken great pains to try to make the language accessible to every country in the world. Let's not send them junk content. >Attachment Converted: "c:\eudora\attach\vcard39.vcf" > >Attachment Converted: "c:\eudora\attach\smime1.p7s" > P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 07:11:52 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:35 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <199712170238.VAA00829@unready.microstar.com> Message-ID: <3.0.1.16.19971217073359.52afe65e@pop3.demon.co.uk> At 21:38 16/12/97 -0500, David Megginson wrote: [... a really simple and understandable interface ...] If it helps the deliberations of the closeted experts, this looks exactly the sort of level of interface I would like and can work with. I assume that somewhere will be all the calls to the "DTD" stuff. Keep at it! P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 07:29:18 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:35 2004 Subject: LISTRIVIA (was Re: Any XSL tool!) Message-ID: <3.0.1.16.19971217082735.2dc7e69e@pop3.demon.co.uk> The following (private) e-mail has just arrives and confirms exactly what I have just posted about attachments. The quoted mail comes from someone in a country *much* poorer than the one I live in. > >Somebody mailed me a attachment with extension :-vcf. >How do i open it? >regards Dear [name omitted for privacy] A VCF is a "vcard" usually with personal details of the sender (such as address, e-mail, title, etc.) I think it's in ASCII. I can't help you on how to read it, since it depends on the mailer that you have. If this is not a recent mailer, you may not be able to access it at all. You will probably be able to save it to disk. My Eudora mailer (on Windows 3.1) automatically saves these to a directory C:\eudora\attach. It then gives them memorable names like vcard39.vcf. They can be opened as an ASCII file. I have about 100K of accumulated attachments, many from XML-DEV. In my opinion it is unnecessary to attach any files, including *.vcf, to postings to XML-DEV and I have asked the posters if they would take the trouble not to. I'm sure they will take note of this. Best wishes with XML :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Dec 17 08:17:33 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:35 2004 Subject: Simple XML Event-Based API for Java References: <000b01bd0aa6$753cce10$0100007f@localhost> Message-ID: <349780FD.D468CAEB@technologist.com> Don Park wrote: > > I would rename XmlApplication and XmlProcessor to XmlConsumer and > XmlProducer. I would interpret those as classes that create and consume XML (text strings). Perhaps they should be called EventProducer and EventConsumer or XMLEventProducer and XMLEventConsumer. The former would depend on the package mechanism to avoid clashes with other kinds of Event systems. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Dec 17 09:06:48 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:35 2004 Subject: Simple XML Event-Based API for Java Message-ID: <003a01bd0aca$9ddf4a90$0100007f@localhost> >I would interpret those as classes that create and consume XML (text >strings). Perhaps they should be called EventProducer and EventConsumer >or XMLEventProducer and XMLEventConsumer. The former would depend on the >package mechanism to avoid clashes with other kinds of Event systems. XMLEventBlahBlah implies something that has to do with XMLEvent objects which does not exist. The fact that the API being worked is said to be event-based does not imply that central product of the API are events. It could have just as well been described as callback-based XML parser. Furthermore, I do not see how XmlConsumer and XmlProducer imply that they work with XML text string. Those names imply only that they are interfaces for classes that consume and produce XML data. As far as reducing dependency on package mechanism, there is a point of balance where class names are unique enough without requiring package specification for most of the situations. I do not see how XmlEventBlah is significantly better than XmlBlah. If there is any confusion, it is cleared up by import statements or prefixing package names. org.w3c.xml.XmlConsumer is not very long and is needed only in the instantiation call. BTW, some attention should be paid to JavaBeans method signatures if you are planning on having simple XML event-based parser packaged as beans. My comments are just comments, pure and simple. Any effort on the XML parser is a movement in the right direction no matter whether I have a bone to pick with its design. I sure do appreciate the effort you guys are putting in. Sincerely, Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Dec 17 09:28:27 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:35 2004 Subject: RFC: Simple XML Event-Based API for Java Message-ID: <003f01bd0acd$a3266da0$0100007f@localhost> >I don't see the point of the XmlProcessor first argument. What's wrong >with having the implementation of XmlApplication store the XmlProcessor >in the member variable? (This is what SP typically does.) XmlApplication can not store the XmlProcessor in the member variable because it is an interface. I am very happy to see that XmlProcessor and XmlApplication are interfaces rather than classes. Of course, it would help to have some sort of Factory or Manager. >The one major omission I see here is absense of information about the >location (URL, byte offset, line number etc) of the events. It would be >very nice to be able to implement validation as just as an >XmlApplication (that wraps around another XmlApp). In others to to run >without validation you would use: This is exactly why I proposed XmlFilter. XmlValidator derived from XmlFilter can be used to add validation at runtime. Each class and interfaces should have a clearly intended role. Stringing XmlApplications along like some kind of Unix app is not something I would like to see people do. I would rather see folks developing XmlFilters to be intentionally used as converters or by-product producers. >> On the positive side, this interface would let you hang more than one >> application off the same parse, which could be very interesting. > >I don't think this is a good idea. It adds complexity and it's likely >to impose a performance cost, but it doesn't buy you anything, because >you can achieve that functionality with a MultipleXmlApplication class >that implements the XmlApplication interface, and provides >addApplication and removeApplication methods, and then forwards each >event to the applications that have been added to it. Support of multiple event listeners is the norm in the Java world. As they say "When in Texas, wear cowboy boots". I have no concern about performance cost since Java loops are not very expensive compared to method invocations and object instantiations. If we were really concerned about performance, I would recommend giving up the use of String. Pool of marker/cursor into a string buffer will improve performance by a factor. >> userData property also gives users a chance to pass extra information >> to the processor easily, if they wish. >Surely there are cleaner ways to do this sort of thing. I do not think so. Just as every Mac developer loved having RefCon to hang thing onto, I like userData. Could I have get/setStudData methods?;-) Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Dec 17 09:28:38 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:35 2004 Subject: RFC: Simple XML Event-Based API for Java Message-ID: <004001bd0acd$a405aa10$0100007f@localhost> >If it helps the deliberations of the closeted experts, this looks exactly >the sort of level of interface I would like and can work with. I assume >that somewhere will be all the calls to the "DTD" stuff. Perhaps we should have DtdConsumer interface and add/removeDtdConsumer methods in XmlProcessor? I would advice keeping it empty for now as a placeholder and keep moving. > Keep at it! David is a workaholic. You can't pull him off it. I am a spectaholic ;-p Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From anders at rcs.urz.tu-dresden.de Wed Dec 17 10:19:07 1997 From: anders at rcs.urz.tu-dresden.de (Andrea Anders) Date: Mon Jun 7 16:59:35 2004 Subject: inclusions/exclusions/named groups Message-ID: I am a amateur in xml and hope anyone can help me. I try to transform a SGML-DTD into XML (I use MSXML-parser). My questions are: 1) Neither SGML-inclusions nor -exclusions are allowed in XML!? How can I express this in XML? my sgml-dtd: ... 2) I tried it to bypass with named groups, but it failed. Named groups are not allowed too. There are any ideas? Thanks. ____________________________________________________________ Andrea Anders ------------- eMail: anders@rcs.urz.tu-dresden.de WWW: http://rcswww.urz.tu-dresden.de/~anders xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tms at ansa.co.uk Wed Dec 17 10:49:28 1997 From: tms at ansa.co.uk (Toby Speight) Date: Mon Jun 7 16:59:35 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: Antony Blakey's message of "Wed, 17 Dec 1997 13:42:37 +1030" References: <199712170238.VAA00829@unready.microstar.com> <34974325.6314AD85@n-space.com.au> Message-ID: Antony> Antony Blakey > In article <34974325.6314AD85@n-space.com.au>, Antony wrote: Antony> [1 ] Antony> Antony> [1.1 ] Antony> David Megginson wrote: >> I would lose ?lfred's resolveEntity() callback Antony> One of the major pains we have had using the available XML Antony> tools is the lack of a resolveEntity() callback. Originally Antony> we wanted to use PUBLIC identifiers and resolve them using a Antony> catalog, but now we use SYSTEM urls and have a dedicated http Antony> host to resolve resources. Unfortunately we need to ship tools Antony> to customers who may not be able to resolve the URL. It is Antony> not feasible to change the SYSTEM identifiers. What we need Antony> to do is change the URL on the fly (ie redirect through a Antony> proxy or a lookup), or actually provide the input stream from Antony> within the program ie. the entity is stored as a string, or Antony> accessed through ClassLoader.getResourceAsStream(). This is Antony> also neccessary if you want to store resources in a versioned Antony> object base and have the version number implicit in the Antony> processing, rather than explicitly mentioned in the URL Antony> (although we have in fact done exactly this :) ISTM that there's no difficulty in bolting on an XmlEntityResolver interface to the design, and a method in XmlProcessor to register it (just one small interface, David!). An XmlApplication could implement the resolver interface, so it doesn't necessarily imply a proliferation of classes. However, I say it should be kept as simple as possible (but no simpler) to start with, and goodies like the resolver can be added once there are some implementations. Perhaps we'll want a "Level 2" API that extends the interfaces in the Level 1 API? [8-line sig snipped] Antony> [1.2 Card for Antony Blakey ] Antony> Antony> [2 S/MIME Cryptographic Signature ] I agree with PMR's comments on this lot (why can't people just include an URL to their personal information, like my X-Author-Info header?) -- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Wed Dec 17 10:51:48 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:35 2004 Subject: RFC: Simple XML Event-Based API for Java References: <003f01bd0acd$a3266da0$0100007f@localhost> Message-ID: <3497ADBF.27C4F668@jclark.com> Don Park wrote: > >I don't see the point of the XmlProcessor first argument. What's wrong > >with having the implementation of XmlApplication store the XmlProcessor > >in the member variable? (This is what SP typically does.) > > XmlApplication can not store the XmlProcessor in the member variable because > it is an interface. I am very happy to see that XmlProcessor and > XmlApplication are interfaces rather than classes. I didn't suggest XmlApplication should should store XmlProcessor in a member variable. I suggested that implementations of XmlApplication could (if they needed to make callbacks to XmlProcessor) store XmlProcessor in a member variable. > >I don't think this is a good idea. It adds complexity and it's likely > >to impose a performance cost, but it doesn't buy you anything, because > >you can achieve that functionality with a MultipleXmlApplication class > >that implements the XmlApplication interface, and provides > >addApplication and removeApplication methods, and then forwards each > >event to the applications that have been added to it. > > Support of multiple event listeners is the norm in the Java world. As they > say "When in Texas, wear cowboy boots". I don't think it's appropriate to carry over patterns from GUI events and apply them to XML events just because we happen to use the word "event" to describe them both. I believe performance is important for XML processing, and an interface shouldn't impose an unnecessary performance cost. The real merit of this interface is that it's simple; unless there's a really compelling need for a feature, I think it should be left out. > If we were really concerned about performance, I > would recommend giving up the use of String. It's (rightly in my view) done that already for character data (which I think is right). It's not a problem for element type names, because an implementation can maintain a hash table of names and thus only allocate a String for each distinct element type. > >> userData property also gives users a chance to pass extra information > >> to the processor easily, if they wish. > > >Surely there are cleaner ways to do this sort of thing. > > I do not think so. Just as every Mac developer loved having RefCon to hang > thing onto, I like userData. Could you explain a typical case where you need this? Are there any standard Java classes that do this? It feels very wrong to me; it's the sort of thing I would try hard to avoid in my own programming, but maybe this is my strongly-typed C++ prejudices showing through. To me it seems like a feature that one can easily manage without. James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 11:09:35 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:35 2004 Subject: inclusions/exclusions/named groups In-Reply-To: Message-ID: <3.0.1.16.19971217120821.30af6000@pop3.demon.co.uk> At 10:20 17/12/97 +0100, Andrea Anders wrote: >I am a amateur in xml and hope anyone can help me. You are very welcome, Andrea, and this is exactly the sort of question that needs addressing. I can't help you myself, but I know that it has been addressed before - it would be nice if someone has posted guidelines. [I'm not sure whether there is a general approach - my suspicion is that you can end up with quite a complex XML-DTD sometimes.] Best of luck. P. > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From arjan.loeffen at let.ruu.nl Wed Dec 17 11:19:35 1997 From: arjan.loeffen at let.ruu.nl (Arjan Loeffen) Date: Mon Jun 7 16:59:35 2004 Subject: inclusions/exclusions/named groups References: Message-ID: <3497B4CA.79710B80@let.ruu.nl> Andrea Anders wrote: > 1) Neither SGML-inclusions nor -exclusions are allowed in XML!? How can I > express this in XML? Inclusions and exclusions cannot be expressed by model group constructs (except for a very few cases). As model groups describe and therefore affect an element's content, and therefore are a DTD-based concept, exceptions describe and affect the complete element subtree, and therefore are a document-instance-based concept. Best you can do is to merge inclusions into the model groups of all elements it 'intends to affect' (typically by defining parameter entities), which would extent over all elements occurring in the model of the element you intended the inclusion to work on (and elements in the model of those elements, etc.). To alter the model group for exclusions requires you to re-think the complete set of parameter entities used in the original DTD; you have to make certain that the element you want excluded does not occurr in any model after entities are resolved. Unsupporting exceptions is the toll we pay for allowing standard parser generators to be used to build XML systems. Arjan. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 17 11:54:17 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:35 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <34975FEE.1A5406F4@jclark.com> References: <199712170238.VAA00829@unready.microstar.com> <34975FEE.1A5406F4@jclark.com> Message-ID: <199712171151.GAA00342@unready.microstar.com> James Clark writes: > I don't see the point of the XmlProcessor first argument. What's wrong > with having the implementation of XmlApplication store the XmlProcessor > in the member variable? (This is what SP typically does.) The advantage is that the same XmlApplication object can work with more than one XmlProcessor at the same time (though it is not required to be able to do so). > > public void > > startDocument (XmlProcessor processor, String pubid, URL sysid); > > What do the pubid and sysid arguments represent? The document entity? Yes. I suppose that they are redundant, given XmlProcessor.getPublicId() and XmlProcessor.getSystemId(), so they could go if the XmlProcessor argument stayed. > > public void > > startProlog (XmlProcessor processor); > > > > public void > > endProlog (XmlProcessor processor); > > Why do you need startProlog() and endProlog()? Convenience only: users could infer the end of the prolog from the start of the document element. The end of the prolog (or at least, of the document type declaration) is important for ?lfred, because that is the first point when ?lfred's DTD query routines will return useful results. > The one major omission I see here is absense of information about the > location (URL, byte offset, line number etc) of the events. It would be > very nice to be able to implement validation as just as an > XmlApplication (that wraps around another XmlApp). In others to to run > without validation you would use: > > processor.run(new MyXmlApplication()); > > and to run with validation you would use > > processor.run (new ValidateXmlApplication(new MyXmlApplication)); > > In order to make this work the application needs to be able to get > information about the location of start/end tags and of data. This is > also useful for all kinds of application-specific validation. > > This could be done by having the app ask the processor for the location > of the last event in some non-standardized way, but that's kind of > kludgy. On the other hand, maybe this is just too fancy for a > "simple" API. I think that it probably is too fancy. > > public void > > error (XmlProcessor processor, String message, URL url, int line); > > I don't think having simply "String message" is going to > internationalize well. It's also desirable to know exactly what > character number/column number the error occurred at. Also XML > distinguishes fatal errors (which the parser must not continue > processing after) from other errors. On the whole I would be inclined > to handle fatal errors as an exception, and not try to deal with > non-fatal errors at all in this simple interface. > > > On the positive side, this interface would let you hang more than one > > application off the same parse, which could be very interesting. > > I don't think this is a good idea. It adds complexity and it's likely > to impose a performance cost, but it doesn't buy you anything, because > you can achieve that functionality with a MultipleXmlApplication class > that implements the XmlApplication interface, and provides > addApplication and removeApplication methods, and then forwards each > event to the applications that have been added to it. A wise suggestion. > > The > > userData property also gives users a chance to pass extra information > > to the processor easily, if they wish. > > Surely there are cleaner ways to do this sort of thing. Perhaps -- it would be most useful, again, when an XmlApplication was being used with more than one XmlProcessor. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 17 11:56:43 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:35 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <3.0.1.16.19971217073359.52afe65e@pop3.demon.co.uk> References: <199712170238.VAA00829@unready.microstar.com> <3.0.1.16.19971217073359.52afe65e@pop3.demon.co.uk> Message-ID: <199712171154.GAA00352@unready.microstar.com> Peter Murray-Rust writes: > At 21:38 16/12/97 -0500, David Megginson wrote: > [... a really simple and understandable interface ...] > > If it helps the deliberations of the closeted experts, this looks exactly > the sort of level of interface I would like and can work with. I assume > that somewhere will be all the calls to the "DTD" stuff. Yes, but we are not looking at standardising these right now. They will still be available in ?lfred, but outside of the interface. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 12:40:26 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:35 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <3497ADBF.27C4F668@jclark.com> References: <003f01bd0acd$a3266da0$0100007f@localhost> Message-ID: <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> At 17:47 17/12/97 +0700, James Clark wrote: > >The real merit of this interface is that it's simple; unless there's a >really compelling need for a feature, I think it should be left out. Yes. Let's please get this bus into the air. If it needs tweaking or junking later, it's not the end of the world :-). I couldn't bear it if we go down the same road as we have done 2-3 times before, drawing out the process and finally running out of steam. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mecom-gmbh at mixx.de Wed Dec 17 13:06:03 1997 From: mecom-gmbh at mixx.de (james anderson) Date: Mon Jun 7 16:59:35 2004 Subject: EntityDef v/s PEDef Message-ID: <3497CF51.3FE0F8F9@mixx.de> greetings, todays question from out of the blue: do i follow PR-XML-19971208 correctly, that the only difference between a general entity definition and a parameter entity definition (syntactically modulo the '%') is that the general entity definition permits a notation? [70] EntityDecl ::= GEDecl | PEDecl [71] GEDecl ::= '' [72] PEDecl ::= '' [73] EntityDef ::= EntityValue | ExternalDef [74] PEDef := EntityValue | ExternalID [75] ExternalDef ::= ExternalID NDataDecl? [76] ExternalID ::= 'SYSTEM' S SystemLiteral | 'PUBLIC' S PubidLiteral S SystemLiteral [77] NDataDecl ::= S 'NDATA' S Name (nb. spurious (?) '|' removed from [72]) what is the significance of ExternalDef? i found it referenced nowhere else in the document. wouldn't [73'] EntityDef ::= EntityValue | ExternalID NDataDecl? [74'] PEDef := EntityValue | ExternalID [75x] make the similarity clearer? xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Dec 17 14:38:56 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:35 2004 Subject: Simple XML Event-Based API for Java References: <003a01bd0aca$9ddf4a90$0100007f@localhost> Message-ID: <3497E3DD.C3980FB4@technologist.com> Don Park wrote: > > XMLEventBlahBlah implies something that has to do with XMLEvent objects > which does not exist. The fact that the API being worked is said to be > event-based does not imply that central product of the API are events. I think that the concept of events are implicit in the interfaces that are being defined and may well be explicit in the documentation for it. The only reason that we don't call them startElementEvent, endElementEvent, endPrologEvent etc. is because it would be redundant. ON THE OTHER HAND -- should we actually using Event Objects as SP does? The nice thing about event objects is that they can be subclassed to add more information. An example would be James' request for line number information. That means that an XAPI "level 2" producer could easily produce data for a "level 1" consumer without a problem. They can also be "lazy" in the sense that they don't have to construct (e.g.) a dictionary object for attributes unless the start-element ASKS for attributes. Is Java object construction too slow for us to use real objects? > It > could have just as well been described as callback-based XML parser. > Furthermore, I do not see how XmlConsumer and XmlProducer imply that they > work with XML text string. Those names imply only that they are interfaces > for classes that consume and produce XML data. I'm not religious on this issue, but the only definition of "XML Data" I know of is PR-xml-971208. "This specification describes the required behavior of an XML processor in terms of how it must read XML data and the information it must provide to the application." In other words, XML Data is angle-bracketed text that conforms to PR-xml-971208. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Dec 17 14:39:13 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:35 2004 Subject: RFC: Simple XML Event-Based API for Java References: <199712170238.VAA00829@unready.microstar.com> <34975FEE.1A5406F4@jclark.com> <199712171151.GAA00342@unready.microstar.com> Message-ID: <3497DE8B.7B6F9C28@technologist.com> David Megginson wrote: > The advantage is that the same XmlApplication object can work with > more than one XmlProcessor at the same time (though it is not required > to be able to do so). If you use the name "Application" then it makes sense to require a single application to support multiple processors. Jade is an example of an application that supports multiple processors. If we use the word ***Consumer, then it makes sense that there should be a single consumer per Producer. > > This could be done by having the app ask the processor for the location > > of the last event in some non-standardized way, but that's kind of > > kludgy. On the other hand, maybe this is just too fancy for a > > "simple" API. > > I think that it probably is too fancy. Maybe, but it also seems very important. A processor that can't tell you where your errors are is very frustrating. Perhaps there should immediately be a "level 2" that supports this. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 17 14:44:44 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:35 2004 Subject: Failure Criteria: Simple XML Event-Based API for Java In-Reply-To: <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> References: <003f01bd0acd$a3266da0$0100007f@localhost> <3497ADBF.27C4F668@jclark.com> <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> Message-ID: <199712171442.JAA00570@unready.microstar.com> Peter Murray-Rust writes: > Yes. Let's please get this bus into the air. If it needs tweaking > or junking later, it's not the end of the world :-). I couldn't > bear it if we go down the same road as we have done 2-3 times > before, drawing out the process and finally running out of steam. Any project should have measurable failure criteria. Here are my suggestions. The Simple XML Event-Based API initiative will have failed if either of the following is true: 1) By Monday 12 January 1998, at least three Java parser writers have not agreed to support a specific set of common interfaces. 2) By Monday 12 January 1998, at least three Java applet or application authors have not agreed to use the same set of common interfaces that the parser writers have agreed to support. In other words, we need at least one other parser writer on board besides Tim and me (a duopoly is almost as bad as a monopoly), and at least two other applet/application writers besides Peter. If we don't have that agreement, and a working beta interface, by 12 January, I won't want to spend any more of my time on this issue (I have other projects that I'd like to pursue). DOM --- Another interesting question is the DOM. I have not taken the time yet to see if this interface provides enough information to construct the most basic DOM nodes -- if it does (or at least, can), then we could have a single DOM module maintained separately (using the common event interface) instead of requiring each parser writer to create a separate one. A separate DOM module with its own maintainer would be much more likely to stay up to date and robust. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 17 14:51:42 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:35 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <3497DE8B.7B6F9C28@technologist.com> References: <199712170238.VAA00829@unready.microstar.com> <34975FEE.1A5406F4@jclark.com> <199712171151.GAA00342@unready.microstar.com> <3497DE8B.7B6F9C28@technologist.com> Message-ID: <199712171449.JAA00603@unready.microstar.com> Paul Prescod writes: > > The advantage is that the same XmlApplication object can work with > > more than one XmlProcessor at the same time (though it is not required > > to be able to do so). > > If you use the name "Application" then it makes sense to require a > single application to support multiple processors. Jade is an example of > an application that supports multiple processors. If we use the word > ***Consumer, then it makes sense that there should be a single consumer > per Producer. I'm using the XML terminology, where "processor" actually means "parser" (ick). > > > This could be done by having the app ask the processor for the location > > > of the last event in some non-standardized way, but that's kind of > > > kludgy. On the other hand, maybe this is just too fancy for a > > > "simple" API. > > > > I think that it probably is too fancy. > > Maybe, but it also seems very important. A processor that can't tell you > where your errors are is very frustrating. Perhaps there should > immediately be a "level 2" that supports this. These are two separate things. Adding a "col" argument to the error() callback is not so tricky, but providing the exactly location of every start and end tag or data chunk is too complicated. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 15:22:12 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:35 2004 Subject: Failure Criteria: Simple XML Event-Based API for Java In-Reply-To: <199712171442.JAA00570@unready.microstar.com> References: <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> <003f01bd0acd$a3266da0$0100007f@localhost> <3497ADBF.27C4F668@jclark.com> <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971217161409.5657bb1e@pop3.demon.co.uk> At 09:42 17/12/97 -0500, David Megginson wrote: >Peter Murray-Rust writes: [...] > >Any project should have measurable failure criteria. Here are my >suggestions. > >The Simple XML Event-Based API initiative will have failed if either >of the following is true: > >1) By Monday 12 January 1998, at least three Java parser writers have > not agreed to support a specific set of common interfaces. > >2) By Monday 12 January 1998, at least three Java applet or > application authors have not agreed to use the same set of common > interfaces that the parser writers have agreed to support. Yes - I think this is very appropriate. I will commit at this stage to do what I can for JUMBO. Given that the API will look fairly like what I'm used to from David and Tim that seems fine (the Xapi-J was a level above me). So barring the possibility that I there are bits I may not *understand* it shouldn't be too horrendous. I would be *very grateful* for a working harness like Driver.java (Lark) or the equiv in lfred. It's then trivial to make sure I've got it right. So - one more parser write, and two more applications. The applications needn't be browsers - they could be transformers, search engines, whatever. And they needn't exercise the whole API (just as JUMBO won't). It simple has to show that the approach is understandable by at least three humans not connected with the other three humans. [Actually robots can volunteer if they want, as well]. P. > >In other words, we need at least one other parser writer on board >besides Tim and me (a duopoly is almost as bad as a monopoly), and at >least two other applet/application writers besides Peter. If we don't >have that agreement, and a working beta interface, by 12 January, I >won't want to spend any more of my time on this issue (I have other >projects that I'd like to pursue). > > >DOM >--- >Another interesting question is the DOM. I have not taken the time >yet to see if this interface provides enough information to construct >the most basic DOM nodes -- if it does (or at least, can), then we >could have a single DOM module maintained separately (using the common >event interface) instead of requiring each parser writer to create a >separate one. A separate DOM module with its own maintainer would be >much more likely to stay up to date and robust. > > >All the best, > > >David > >-- >David Megginson ak117@freenet.carleton.ca >Microstar Software Ltd. dmeggins@microstar.com > http://home.sprynet.com/sprynet/dmeggins/ > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From d94-dwi at nada.kth.se Wed Dec 17 15:25:38 1997 From: d94-dwi at nada.kth.se (=?ISO-8859-1?Q?Douglas_Wikstr=F6m?=) Date: Mon Jun 7 16:59:35 2004 Subject: unsubscribe In-Reply-To: <3.0.1.16.19971217161409.5657bb1e@pop3.demon.co.uk> Message-ID: unsubscribe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Wed Dec 17 15:52:03 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:36 2004 Subject: YAXPAPI (Yet Another XML Parser API)- an XDEV proposal Message-ID: <3.0.32.19971217075000.00aa2634@pop.intergate.bc.ca> At 02:22 PM 17/12/97 GMT, Gavin Nicol wrote: >XAPI-J, or whatever this becomes, should be sufficient to build a DOM >representation. No no no. You are missing the point - this is the SIMPLE interface for RDF-heads and SMIL-folks and all the other people who think that XML should just be elements and attributes and have none of that SGML apparatus. From the end-user programmer's point of view, it should be. If you turn your assertion around, then it's correct: you should be able to build SAX on top of the DOM. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at cogsci.ed.ac.uk Wed Dec 17 16:36:11 1997 From: richard at cogsci.ed.ac.uk (Richard Tobin) Date: Mon Jun 7 16:59:36 2004 Subject: XML syntax (was Re: external subset syntax) In-Reply-To: David Megginson's message of Tue, 16 Dec 1997 14:46:22 -0500 Message-ID: <199712171635.QAA21002@stevenson.cogsci.ed.ac.uk> > No, it's not SGML's fault, at least not this time. Conforming SGML > parsers are allowed to continue processing if they want to, and are > even allowed not to report errors at all (as long as they don't claim > to be "validating parsers"). XML has gone way beyond any SGML > requirements with this one. Always remember that your software doesn't have to be a conforming XML processor unless you want it to be. There are several applications where you certainly *don't* want to be a conforming processor, such as an XML editor. -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Dec 17 16:41:29 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:36 2004 Subject: IDL? References: <003f01bd0acd$a3266da0$0100007f@localhost> <3497ADBF.27C4F668@jclark.com> <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> <199712171442.JAA00570@unready.microstar.com> Message-ID: <3497F877.977948B9@technologist.com> David Megginson wrote: > 1) By Monday 12 January 1998, at least three Java parser writers have > not agreed to support a specific set of common interfaces. What about a Python parser writer? We are, after all, on the brink of the 21st century. It would be really nice to stop the cycle of "crowning" one language the be-all and end-all of programming languages. Could we specify the interfaces in terms of IDL instead of Java (or perhaps agree to make an IDL version soon after the Java one)? The only extra work I see is that we must explicitly define the interfaces for URL and Dictionary so that other languages can implement Java-compatible versions. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Wed Dec 17 18:14:25 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:36 2004 Subject: Failure Criteria: Simple XML Event-Based API for Java In-Reply-To: <199712171442.JAA00570@unready.microstar.com> Message-ID: On Wed, 17 Dec 1997, David Megginson wrote: > In other words, we need at least one other parser writer on board > besides Tim and me (a duopoly is almost as bad as a monopoly), and at > least two other applet/application writers besides Peter. On the application side (or meta-application), I will commit to having MONDO and mindo on the API within a couple days of when you release it. As a semi-application, this includes a DOM builder that I will be releasing early tomorrow. Is this acceptable as an application? > DOM > --- > Another interesting question is the DOM. I have not taken the time > yet to see if this interface provides enough information to construct > the most basic DOM nodes -- if it does (or at least, can), then we > could have a single DOM module maintained separately (using the common > event interface) instead of requiring each parser writer to create a > separate one. A separate DOM module with its own maintainer would be > much more likely to stay up to date and robust. Well, I have an architecture, APIs and code that handle building arbitrary object models, which includes both the type and content of a DOM Document. This loosely couples the XML events to the DOM object construction, so the DOM model can be maintained independently of the parser. You could also have multiple DOM implementation models if you want (and I suspect people will). When I move the mindo release up I will let people know so they can look at it and try it out. (This mindo release is much, much smaller than the MONDO-J release although it is based on the same concepts and code base). --Mark mark.fussell@chimu.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Wed Dec 17 18:32:35 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java Message-ID: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> At 09:38 PM 16/12/97 -0500, David Megginson wrote: >After careful thought, I am fairly certain that I would be willing to >accept the following simple event-driven API for ?lfred. I'd be willing to commit to signing up to do this for Lark, given the following changes: > public void > startDocument (XmlProcessor processor, String pubid, URL sysid); Question: what if there's no public void > startProlog (XmlProcessor processor); > public void > endProlog (XmlProcessor processor); Lose these; they have no place in this API. You want this kind of stuff, use Lark or AElfred or whatever. > public void > processingInstruction (XmlProcessor processor, String target, String data); Lose this. > public void > error (XmlProcessor processor, String message, URL url, int line); >} Have to add the entity ID as an argument. No point giving the line number if you don't know what it's in. >The processor itself could implement the following interface (very >Thread-oriented and Bean-like): And one last thing: if you use URL, then you have to do a new URL() which does (I think) at least some syntax checking... is this appropriate? Why not just pass it as a string? -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Wed Dec 17 18:35:09 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java Message-ID: <3.0.32.19971217103526.00b4151c@pop.intergate.bc.ca> At 09:38 PM 16/12/97 -0500, David Megginson wrote: >After careful thought, I am fairly certain that I would be willing to >accept the following simple event-driven API for ?lfred. I'd be willing to commit to signing up to do this for Lark, given the following changes: Oops; and I forgot the IMPORTANT one: I don't see any point in doing this if there isn't also an ultra-simple tree interface supporting only Element, Attribute, and Text classes. Because this is what most people will use, especially given that a high proportion of XML transmissions will be small flattish documents; why should everyone have to build their own tree. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 17 19:05:45 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> Message-ID: <199712171903.OAA04014@unready.microstar.com> Tim Bray writes: > I'd be willing to commit to signing up to do this for Lark, given > the following changes: > > > public void > > startDocument (XmlProcessor processor, String pubid, URL sysid); > > Question: what if there's no well throw in the root doctype. Agreed. We can take it out, since the same information is available using getPublicId() and getSystemId() in the XmlProcessor interface. > > public void > > startProlog (XmlProcessor processor); > > public void > > endProlog (XmlProcessor processor); > > Lose these; they have no place in this API. You want this kind of stuff, > use Lark or AElfred or whatever. Agreed. > > public void > > processingInstruction (XmlProcessor processor, String target, String data); I disagree -- processing instructions are an essential part of a document (especially for architectural forms). > > public void > > error (XmlProcessor processor, String message, URL url, int line); > >} > > Have to add the entity ID as an argument. No point giving the line > number if you don't know what it's in. The URL argument will show you where it is. > And one last thing: if you use URL, then you have to do a new URL() > which does (I think) at least some syntax checking... is this appropriate? > Why not just pass it as a string? -Tim For starting ?lfred, I found using a string awkward, since I needed a base URL to resolve relative URLs (like file names). Since XML mandates URIs anyway, and Java supports them pretty transparently, I thought that it made sense to use them directly instead of using a lot of Url.toString() and new URL(String) calls (it will also allow the use of '==' with system identifiers). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 17 19:09:03 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <3.0.32.19971217103526.00b4151c@pop.intergate.bc.ca> References: <3.0.32.19971217103526.00b4151c@pop.intergate.bc.ca> Message-ID: <199712171905.OAA04027@unready.microstar.com> Tim Bray writes: > At 09:38 PM 16/12/97 -0500, David Megginson wrote: > >After careful thought, I am fairly certain that I would be willing to > >accept the following simple event-driven API for ?lfred. > > I'd be willing to commit to signing up to do this for Lark, given > the following changes: > > Oops; and I forgot the IMPORTANT one: I don't see any point in doing > this if there isn't also an ultra-simple tree interface supporting > only Element, Attribute, and Text classes. Because this is what most > people will use, especially given that a high proportion of XML > transmissions will be small flattish documents; why should everyone > have to build their own tree. -Tim I see no reason not to use the DOM for this. The Node, Document, Element, AttributeList, Attribute, and Text classes look easy enough to use, and people can simply ignore what they do not need. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 19:56:14 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:36 2004 Subject: IDL? In-Reply-To: <3497F877.977948B9@technologist.com> References: <003f01bd0acd$a3266da0$0100007f@localhost> <3497ADBF.27C4F668@jclark.com> <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> <199712171442.JAA00570@unready.microstar.com> Message-ID: <3.0.1.16.19971217192211.406f8f60@pop3.demon.co.uk> At 11:06 17/12/97 -0500, Paul Prescod wrote: >David Megginson wrote: >> 1) By Monday 12 January 1998, at least three Java parser writers have >> not agreed to support a specific set of common interfaces. > >What about a Python parser writer? We are, after all, on the brink of >the 21st century. It would be really nice to stop the cycle of >"crowning" one language the be-all and end-all of programming languages. > >Could we specify the interfaces in terms of IDL instead of Java (or >perhaps agree to make an IDL version soon after the Java one)? The only >extra work I see is that we must explicitly define the interfaces for >URL and Dictionary so that other languages can implement Java-compatible >versions. Please can I very gently suggest that we stick precisely to what David has suggested. It has the merit that we all understand it. [Strange as it may seem I have never seen any Python or IDL, so it would make my job a lot harder.] The interface has to be simple enough for people like me to understand and to tell my friends what it's about. I would prefer to limit the Consumers, Factories and the rest to as few as possible. On the main goals is to show that we can actually accomplish something communally. That in itself will be a big achievement, because after that it should get simpler. We choose java because it's one of the main languages of the WWW, it's free and the majority of the programs reported here are in Java. 26 days and counting. In some countries some of the people will be on holiday for some of the time. There are three more bodies to recruit. We need people to hack code. P. > > Paul Prescod > > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 20:25:23 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:36 2004 Subject: Failure Criteria: Simple XML Event-Based API for Java In-Reply-To: References: <199712171442.JAA00570@unready.microstar.com> Message-ID: <3.0.1.16.19971217211309.468fb3b0@pop3.demon.co.uk> At 10:12 17/12/97 -0800, Mark L. Fussell wrote: [... offer of MONDO on top of API...] > >On the application side (or meta-application), I will commit to having >MONDO and mindo on the API within a couple days of when you release it. >As a semi-application, this includes a DOM builder that I will be >releasing early tomorrow. Is this acceptable as an application? sounds great to me :-) I'll leave the others to comment on the DOM stuff. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 20:28:16 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <3.0.32.19971217103526.00b4151c@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971217211949.574fed64@pop3.demon.co.uk> At 10:35 17/12/97 -0800, Tim Bray wrote: > >Oops; and I forgot the IMPORTANT one: I don't see any point in doing >this if there isn't also an ultra-simple tree interface supporting >only Element, Attribute, and Text classes. Because this is what most >people will use, especially given that a high proportion of XML >transmissions will be small flattish documents; why should everyone >have to build their own tree. -Tim Yes - this is really important, because it fixes the terminology. We also know whether we have a Vector of children or some other model. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Dec 17 20:35:45 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:36 2004 Subject: IDL? References: <003f01bd0acd$a3266da0$0100007f@localhost> <3497ADBF.27C4F668@jclark.com> <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> <199712171442.JAA00570@unready.microstar.com> <3.0.1.16.19971217192211.406f8f60@pop3.demon.co.uk> Message-ID: <3498363A.5143B885@technologist.com> Peter Murray-Rust wrote: > > The interface has to be simple enough for people like me to understand and > to tell my friends what it's about. I would prefer to limit the Consumers, > Factories and the rest to as few as possible. An IDL interface implies no extra complication in the Java interface. It merely describes the Java interface in terms that are more universal than Java itself -- it is like a DTD for interfaces. So far nobody has yet proposed anything that would make an IDL description impossible. All I ask is that: a) nobody do so later (e.g. require runtime lookup of Java class objects or do something simiarly brain-dead) and b) implementations in other languages be considered "successes" in terms of the success/failure of this project. I don't think that either of these constraints endanger the success of the Java-specific part of the project or make the Java-specific part more difficult. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 17 20:44:42 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <199712171903.OAA04014@unready.microstar.com> References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> Message-ID: <3.0.1.16.19971217211714.468f7f82@pop3.demon.co.uk> At 14:03 17/12/97 -0500, David Megginson wrote: [...] > > > > public void > > > processingInstruction (XmlProcessor processor, String target, String data); > >I disagree -- processing instructions are an essential part of a >document (especially for architectural forms). > I'd tend to agree on keeping PIs in as well. Both Lark and lfred do them at present. They are used in namespaces (which JUMBO is able to do something with) and there are also other local uses. It's going great :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From neil at bradley.co.uk Wed Dec 17 21:22:22 1997 From: neil at bradley.co.uk (Neil Bradley) Date: Mon Jun 7 16:59:36 2004 Subject: inclusions/exclusions/named groups Message-ID: <199712172122.VAA29298@andromeda.ndirect.co.uk> > I am a amateur in xml and hope anyone can help me. > > I try to transform a SGML-DTD into XML (I use MSXML-parser). > My questions are: > > 1) Neither SGML-inclusions nor -exclusions are allowed in XML!? How can I > express this in XML? > > my sgml-dtd: > > > > > ... > First, let's simplify your SGML DTD, which has too many brackets in it: ... Putting the PCDATA in the right place for XML, and removing the minimization tokens, we get: ... So you want f, g and h to be accessible in a, b, c, d and e, but also in l, m and n, but only f and g in i, j and k. Of course, you may not want any of these directly in LE, c and/or e, though inclusions automatically allow this. Only you can decide. Assuming that you do want them... ... In one sense you are lucky in this example, because you do not have the same element having different content depending on its context. Suppose the following: Here, the para element may have an xref, but only if it apepars inside a section element. To do this in XML requires the definition of a new element, perhaps called sect_para Neil. ----------------------------------------------- Neil Bradley - Author of The Concise SGML Companion. neil@bradley.co.uk www.bradley.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From simeons at allaire.com Wed Dec 17 21:32:53 1997 From: simeons at allaire.com (Simeon Simeonov) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java Message-ID: <01bd0b33$fd605b30$4a15b5cd@sim.allaire.com> A few comments related to a few posts: I. Multiplicity of Application - Processor relationship The "one app, multiple processors argument" is not convincing in my opinion: (a) I don't think this use of the simple API would be common, and (b) it is trivial to implement a solution that does this outside the API. I feel the same about the argument "one processor, multiple applications". If we make the multiplicity of the relationship between XmlApplication and XmlProcessor 1 to 1 we can eliminate the XmlProcessor arguments to XmlApplication methods AND the get/set methods for user data in XmlProcessor. Additionally, we won't need both addApplication() and removeApplication(). I see the removal of at least three methods in XmlProcessor and the removal of XmlProcessor as an argument to XmlApplication methods a substantial gain for the simple API. Further, I'll get immense personal satisfaction from seeing the handling of arbitrary user data removed from XMLProcessor. II. Positional information I'm somewhat surprised that parser writers claim it is difficult to extract information about the positions of elements in an XML document. Can s.o. explain why this is the case? In my work with markup languages I've always represented the position of elements with a pair of (offset in data stream, line number, column number) triplets. Providing this information will certainly result in slightly lower performance, but the functionality it enables for editing, good error reporting and validation is significant. III. Exceptions I am uncertain about the implications of exceptions leaving either the XmlProcessor or the XmlApplication objects. In particular, I am wondering what would happen if the XmlProcessor and XmlApplication are used as beans. I know that in the COM/CORBA world this is very undesirable. In general, I think it leads to a more complicated programming mechanism. S.o. mentioned that stopping the parse is difficult with top-down parsers. While this is true in principle, I there are some very simple mechanisms for stopping a top-down parse. I'd be happy to discuss these with whoever is interested. IV. IDL I did try a number of times to bring up the issue of a language independent API with little success. I do see the benefit of something being done with Java right now, so I'll just wait for the Java API to stabilize before looking at ways to express it in IDL. Regards, Simeon Simeonov Allaire xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mrc at allette.com.au Wed Dec 17 21:46:39 1997 From: mrc at allette.com.au (Marcus Carr) Date: Mon Jun 7 16:59:36 2004 Subject: inclusions/exclusions/named groups References: Message-ID: <3498481C.18CD56D3@allette.com.au> Andrea Anders wrote: > I am a amateur in xml... As are we all... > I try to transform a SGML-DTD into XML (I use MSXML-parser). My questions are: > > 1) Neither SGML-inclusions nor -exclusions are allowed in XML!? How can I > express this in XML? Inclusions and (to a lesser extent) exclusions have never really been a great idea in SGML because of the potential for them to behave incorrectly when parsing from somewhere other than the top level of the DTD. Depending on how widely they've been used and how big your data set is, I'd be inclined to process all of your documents and generate a report of the ancestors elements of the inclusions. This will give you some perspective about how they've been used - you can then make informed decisions about their handling and requirements. Exclusions can be overcome by remodelling the content models, but this could be a substantial amount of work if your DTD is large and/or complex. That's the way I wouldn't do it. I would maintain the data as SGML and call it XML as required. Does it need to be valid, or can it just be well formed? Be careful about white-space around the inclusions and exclusions if you use this approach - no matter how you slice it, they're bad news. -- Regards Marcus Carr email: mrc@allette.com.au _______________________________________________________________ Allette Systems (Australia) email: info@allette.com.au Level 10, 91 York Street www: http://www.allette.com.au Sydney 2000 NSW Australia phone: +61 2 9262 4777 fax: +61 2 9262 4774 _______________________________________________________________ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Dec 17 22:42:20 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java Message-ID: <00d101bd0b3c$88f2aab0$0100007f@localhost> >I didn't suggest XmlApplication should should store XmlProcessor in a >member variable. I suggested that implementations of XmlApplication >could (if they needed to make callbacks to XmlProcessor) store >XmlProcessor in a member variable. OOPS. Point taken. >I don't think it's appropriate to carry over patterns from GUI events >and apply them to XML events just because we happen to use the word >"event" to describe them both. I believe performance is important for >XML processing, and an interface shouldn't impose an unnecessary >performance cost. > >The real merit of this interface is that it's simple; unless there's a >really compelling need for a feature, I think it should be left out. While David suggested that add/removeApplication methods allow implementation of XmlProcessors which support multiple XmlApplications, it is completely up to the implementations to support multiple XmlApplication or only one at a time. As JavaBeans spec suggests, TooManyListenersException should be thrown if XmlProcessor supports only one XmlApplication for performance and simplicity sake. >> I do not think so. Just as every Mac developer loved having RefCon to hang >> thing onto, I like userData. > >Could you explain a typical case where you need this? > >Are there any standard Java classes that do this? userData is a cheap way to associate extra info with the XmlProcessor. For example, I can store the source URL in the userData. There are other ways to have XmlProcessors provide the URL info (i.e. Java Activation Frame has URLDataSource for this) but they are fairly expensive and would unnecessarily taint the API with URL related stuff. It should be possible to use XmlProcessor with a File and building URL out of File is not reliable in all platforms. Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Wed Dec 17 22:42:24 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:36 2004 Subject: Failure Criteria: Simple XML Event-Based API for Java Message-ID: <00d301bd0b3c$8a818810$0100007f@localhost> >In other words, we need at least one other parser writer on board >besides Tim and me (a duopoly is almost as bad as a monopoly), and at >least two other applet/application writers besides Peter. If we don't >have that agreement, and a working beta interface, by 12 January, I >won't want to spend any more of my time on this issue (I have other >projects that I'd like to pursue). If Chris does not object or respond, I can step up and provide the implementation for MSXML by 12 January. There is nothing in the license that prohibits me from implementing the simple API over MSXML. As far as James' concern over having a simple DOM, I think one of us can implement a XmlApplication that produces W3C DOM objects so programmers can just deal with DOM. Any takers? Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 18 00:05:13 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:36 2004 Subject: Failure Criteria: Simple XML Event-Based API for Java In-Reply-To: <00d301bd0b3c$8a818810$0100007f@localhost> Message-ID: <3.0.1.16.19971218005503.0f5f8ba8@pop3.demon.co.uk> At 14:36 17/12/97 -0800, Don Park wrote: [...] >If Chris does not object or respond, I can step up and provide the >implementation for MSXML by 12 January. There is nothing in the license >that prohibits me from implementing the simple API over MSXML. > Great - I would really love the have that. I assume that it is fairly stable now (1.8?) and that the various queries on this list have been resolved... P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From reast at esri.com Thu Dec 18 02:54:08 1997 From: reast at esri.com (Russell East) Date: Mon Jun 7 16:59:36 2004 Subject: a DTD as a JAR file resource [was Re: RFC: Simple XML Event-Based API for Java] Message-ID: <34988F5C.16273230@esri.com> Antony Blakey wrote: > .... What we need to do is ... provide the > input stream from within the program ie. the entity is stored as a string, or accessed > through ClassLoader.getResourceAsStream()... Yes! I would like to be able to store one or more DTDs as resources within a JAR file. Within a I'd like to be able to refer to that DTD, rather than, refering to some server-side DTD. But, I don't think we can do this now, because, we can't specify a URL for a JAR resource - well, we can't do it in a platform independent manner anyway, because JavaSoft states, at http://java.sun.com/products/jdk/1.1/docs/guide/misc/resources.html : "The method getResource() returns a URL for the resource. The URL (and its representation) is implementation-specific and may vary depending on the implementation details (it may also change between JDK1.1 and JDK1.1.1). Its protocol is (usually) specific to the ClassLoader loading the resource. If the resource does not exist, a null will be returned." It's hard to test this, firstly Netscape doesn't yet seem to support ClassLoader.getResource() and IE4 doesn't seem to support JARs as containers for resources. For instance, I have a sample applet which is placed into a JAR along with a resource named test.dtd. Within JDK 1.1.4 appletviewer, getResource() returns the URL of this resource as: appletresource:/file:/D:/Ims/z//+/test.dtd or : appletresource://gumnut/http://gumnut/ims/z//+/test.dtd depending on whether I access the HTML thru my webserver or not. It would be good to be able to specify one of these URLs in SYSTEM, and have it work in all cases - not just appletviewer. Do the XML parser developers have any suggestions on how to achieve this? Does it make sense to have a special API for the parser through which you can not only specify an xml document, but also a separate dtd ? -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- Russell East mailto:reast@esri.com _|_| Programmer phn: +1 (909) 793 2853 _|_| ESRI, 380 New York St fax: +1 (909) 307 3067 Redlands CA 92373-8100 http://maps.esri.com/ -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Thu Dec 18 03:22:49 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:36 2004 Subject: An interesting news: JDK 1.2 Beta is now public available Message-ID: <000b01bd0b63$b8af3580$0100007f@localhost> Since JavaSoft is notoriously late updating its web pages, I thought some of you might be interested to know that JDK 1.2 Public Beta is finally out at: http://developer.javasoft.com/developer/earlyAccess/jdk12/ Please do not reply to this message cause I don't want to receive another LISTRIVIA from Peter :-p Don "JStud" Park Master Consultant donpark@quake.net Come visit my XML Example Catalog at http://www.quake.net/~donpark/xmlcat.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Thu Dec 18 08:42:48 1997 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java References: <00d101bd0b3c$88f2aab0$0100007f@localhost> Message-ID: <341CEE96.DFD97141@infinet.com> Don Park wrote: > >I didn't suggest XmlApplication should should store XmlProcessor in a > >member variable. I suggested that implementations of XmlApplication > >could (if they needed to make callbacks to XmlProcessor) store > >XmlProcessor in a member variable. > > OOPS. Point taken. > > >I don't think it's appropriate to carry over patterns from GUI events > >and apply them to XML events just because we happen to use the word > >"event" to describe them both. I believe performance is important for > >XML processing, and an interface shouldn't impose an unnecessary > >performance cost. > > > > >The real merit of this interface is that it's simple; unless there's a > >really compelling need for a feature, I think it should be left out. > > While David suggested that add/removeApplication methods allow > implementation of XmlProcessors which support multiple XmlApplications, it > is completely up to the implementations to support multiple XmlApplication > or only one at a time. As JavaBeans spec suggests, > TooManyListenersException should be thrown if XmlProcessor supports only one > XmlApplication for performance and simplicity sake. > > >> I do not think so. Just as every Mac developer loved having RefCon to > hang > >> thing onto, I like userData. > > > >Could you explain a typical case where you need this? > > > >Are there any standard Java classes that do this? > > userData is a cheap way to associate extra info with the XmlProcessor. For > example, I can store the source URL in the userData. There are other ways > to have XmlProcessors provide the URL info (i.e. Java Activation Frame has > URLDataSource for this) but they are fairly expensive and would > unnecessarily taint the API with URL related stuff. It should be possible > to use XmlProcessor with a File and building URL out of File is not reliable > in all platforms. > > Don > I am not sure if this is at all relevant to this discussion, but I got some info via email from the JDC newsletter that gives an interesting tip on how to efficiently build tree structures without sucking up too much RAM. I figure, that for building XML parsers the most efficient way of storing the parsed data would be some help to the XML parser writers. Anyways, here is the tip. PERFORMANCE -- using Object to represent disparate types. This tip is a little tricky, but it recently came up in an actual application, and illustrates how Java language features are used to efficiently represent a large data structure. The application is one where a very large tree structure, consuming millions of bytes, is built up. Some of the nodes in the tree reference child nodes (non-terminals), while others are leaf nodes (terminals) and have no children, but contain String information. The application involves parsing a large Java program and representing it internally via a tree. One simple approach to this problem is to define a Node class such as the following: public class Node { private int type; private Node child[]; private String info; } If the node is a leaf node, then info is used. Otherwise, child refers to the children of the node, and child.length to the number of children. This approach works pretty well, but uses a lot of memory. Only one of child and info are used at any one time, meaning that the other field is wasted. Child is an array, with attendant overhead, for example, in storing the dimensions of the array for subscript checking. For certain large inputs, the parser program runs out of memory. The first refinement of this approach is to collapse child and info: public class Node { private int type; private Object info; } In this scheme, info can refer to either a String, for a leaf node, or to a child node array. Object is the root of the Java class hierarchy, so that for example, the following: class A {} implicitly means: class A extends Object {} An instance of a subclass of Object, such as String, can be assigned to an Object reference. An array of Nodes can likewise be assigned to an Object. The instanceof operator can be used to determine the actual type of an Object reference. In the parser application, using Object to represent both data types is not good enough because it still takes up too much memory. So a further change has been implemented. After doing some research, it was found that the child array consisted of a single Node element about 95 percent of the time. So it's possible to represent one-child cases directly using an Object reference to the child node, rather than a reference to a one-long array of child nodes. This representation is complicated, and it's useful to define a method for encapsulating the abstraction as in the following example: public class Node { private int type; private Object info; // constructors, other methods here ... // gets the i-th child reference public Node getChild(int i) { if (info instanceof String) return null; else if (info instanceof Node && i == 0) return (Node)info; else return ((Node[])info)[i]; } } getChild returns the i-th child, or null for leaf nodes. If there is exactly one child, then info is of type Node, referencing that child. If there is more than one child, info is of type Node[], and a cast to Node[] is done, followed by a retrieval and return of the child reference. In the parser application, this change is enough to tip the scales, so that the application would not run out of memory. The internal representation in this example is tricky, but it can be hidden via methods such as getChild. In general, it's wise to avoid tricky coding, but useful to know how to do it when the need arises. The example also illustrates the utility of using one Object reference to represent several different data types. In C/C++ similar techniques would use void* pointers or unions. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Dec 18 09:17:08 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> <199712171903.OAA04014@unready.microstar.com> Message-ID: <3498E043.5F764F28@technologist.com> David Megginson wrote: > > > And one last thing: if you use URL, then you have to do a new URL() > > which does (I think) at least some syntax checking... is this appropriate? > > Why not just pass it as a string? -Tim > > For starting ?lfred, I found using a string awkward, since I needed a > base URL to resolve relative URLs (like file names). XML attributes will probably have relative URLs in them and the XML Application will have to know how to resolve them. Tim is right that attributes are syntactically checked when they are created and can throw an exception if there is a mistake. I would rather leave that up to the application writer. > Since XML > mandates URIs anyway, and Java supports them pretty transparently, XML mandates URIs, but Java supports URLs. I don't think that all Java environments will allow new URL types to be installed. But if we are just passing around strings then the application can recognize URNs and Do The Right Thing. > I > thought that it made sense to use them directly instead of using a lot > of Url.toString() and new URL(String) calls I think that all we are doing is shifting the "new URL(String)" calls from the processor to the application. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 18 09:41:30 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:36 2004 Subject: a DTD as a JAR file resource [was Re: RFC: Simple XML Event-Based API for Java] In-Reply-To: <34988F5C.16273230@esri.com> Message-ID: <3.0.1.16.19971218103417.2ddf1544@pop3.demon.co.uk> At 18:50 17/12/97 -0800, Russell East wrote: > >Yes! I would like to be able to store one or more DTDs as >resources within a JAR file. Within a >I'd like to be able to refer to that DTD, rather than, refering >to some server-side DTD. But, I don't think we can do this now, because, I think this is a tremendously important subject, Russell - thanks. One of the exciting aspects of SGML/XML over the WWW is that it makes it possible to distribute a whole environment. Like you I would want to be able to "cache" some or all of these resources "client-side". One obvious reason is slow lines, another is that people are often not connected to the WWW. For example JUMBO - when used for molecular, statistical and other non-core XML operations can be over 500Kb in classes. > >It would be good to be able to specify one of these URLs in SYSTEM, >and have it work in all cases - not just appletviewer. Personally I have enormous trouble with URLs under Java. There are the following orthogonal problems: - file: versus http: - different syntaxes for files ('/' versus '\') - different compilers (jvc vs javac) - different JVMs (appletviewer, java, jview, NS (+versions), MS (+versions), hotjava). - different platforms (UNIces, Mac, Windows). Altogether there are at least 20 actual variants. For example, I contributed a JUMBO snapshot for Henry's latest CDROM on chemical publishing [1]. Henry already has to test his CDROM for operation with HTML and JavaScript (sorry ECMAScript). The CDROM has to run anywhere and for people who have no knowledge of: HTML JavaScript Who made the machine that they are viewing the CDROM on. Adding: Java XML is yet another dimension. The ability to publish packaged systems under Java/XML is tremendously exciting. I've done this in a limited way earlier this year and it seemed to work. Henry's CDROM is going out with an issue of a paper Journal from the Royal Soc of Chemistry but I don't expect a lot of feedback about JUMBO - I suspect that most people won't get that far through the distribution (the main rationale is *content* - organomettallic chemistry.) A bizarre problem has just arisen. Please help me :-). The JUMBO snapshot is arranged to run under a browser as well as a standalone interpreter. So I have packaged it as this directory structure (not horizontal as hypermail won't render it :-( demos mol.xml mol.html jumbo sgml SGMLTree.class cml MOL.class etc. This runs OK with: java jumbo.sgml.SGMLTree mol.xml or java jumbo.sgml.SGMLTree file:/C:/mydir/demos/mol.xml or (I think) java.jumbo.sgml.SGMLTree file:mol.xml and even java jumbo.sgml.SGMLTree mol.xml PARSER=AElfred mol.html contains: When mol.html was loaded this used to work fine, launching JUMBO and reading the file. Henry tells me that it still works for him under Netscape 4.04. BUT on my own PC with NS4.02 it now throws a SecurityException when it comes to read file:/C:/cdrom/demos/mol.xml saying it isn't allowed to read a local file. So it seems to be a PMR-environment-specific problem. Help would be really appreciated. Are there any browsers switches, config files etc that I might have corrupted? Or is everyone benefitting by a laxer implementation of Applet Security? [...] > >Do the XML parser developers have any suggestions on how to achieve this? I don't think it's just for parser developers - anyone can play. > >Does it make sense to have a special API for the parser through which >you can not only specify an xml document, but also a separate dtd ? I think this is part of the namespace activity. JUMBO implements namespaces experimentally (all namespace stuff is experimental!) and it involves a lot of subsidiary files (JUMBO has one for most ELEMENTs, schema files and much more). JUMBO can also use 3 parsers and will - by Jan 12 ;-) be able to use 5. As we've seen, these parsers provide additional features so that it makes sense to distribute them (authors permitting of course) with the JUMBO distribution. It's also possible - as you suggest that different DTDs (or, I suspect namespaces) might be distributed as well. For example, it could make sense to have a variety of support files for HTML4.0/XML. The reader could then choose between these at browse time. This requires something with the functionality of a JAR file. I take the concern that we shouldn't become Java-only, but I think the *experience* with JAR files for early XML adopters will be essential. So - not for Jan 12 - some communal activity here on distribution, manifests, installation, etc would be extraordinarily helpful to the success of XML. If we can reliably distribute our XML applications without worrying about what's at the other end it would be marvellous. It's a very different sort of task from writing a parser :-) P. [1] Some people may not know who Henry Rzepa is. Henry is the world's leading exponent of the use of the Internet and related technologies for chemical information and publishing. [He also does mainstream research in computational chemistry.] He has run 3 major electronic conferences on chemistry (content-driven) and published these with the Royal Society of Chemistry through an E-lib project, CLIC. This project, including Cambridge, Leeds and IC is committed to the use of SGML/XML as a publishing tool. This explains some of his and my enthusiasm for seeing XML succeed. Our primary concern is to see the link between author and reader as direct as possible without information loss or corruption. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 18 10:06:26 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <3498E043.5F764F28@technologist.com> References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> <199712171903.OAA04014@unready.microstar.com> Message-ID: <3.0.1.16.19971218110059.08df9fdc@pop3.demon.co.uk> At 03:35 18/12/97 -0500, Paul Prescod wrote: >David Megginson wrote: >> [...] Thanks Paul, >> I >> thought that it made sense to use them directly instead of using a lot >> of Url.toString() and new URL(String) calls > >I think that all we are doing is shifting the "new URL(String)" calls >from the processor to the application. I think this is right - the "application" is going to have to do a lot of additional testing for semantic validity. XLL is full of this problem. So I think it will be very valuable to have *generic* modules that can be used for this sort of thing. I see some of these as coming in a post-parser (i.e. post-processor) and pre-application area. For example, it's reasonable that an application shouldn't get passed: My XML file This is a WF element, but contains a number of semantic errors (at least if the application wishes to validate it against the XLL spec :-). java.net.URL would catch one of them :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 18 10:16:36 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <341CEE96.DFD97141@infinet.com> References: <00d101bd0b3c$88f2aab0$0100007f@localhost> Message-ID: <3.0.1.16.19971218105102.09271c36@pop3.demon.co.uk> At 04:15 15/09/97 -0400, Tyler Baker wrote: [...] > >I am not sure if this is at all relevant to this discussion, but I got some info Well *I* found it extremely valuable :-). This is exactly the sort of thing that novices will find a variety of ways of tackling. If your suggestions gets support from those who know more than me, it may be worth considering for the API. FWIW I think that the presentation of Trees in the API is the area where guidance is most valuable. If affects a lot of the downstream part of the application. Moreover, if people return Objects from a Tree, their nature has to be very carefully agreed. An Element or a PI is much more obvious by comparison. [...] >In the parser application, using Object to represent both data types is not >good enough because it still takes up too much memory. So a further change >has been implemented. After doing some research, it was found that the >child array consisted of a single Node element about 95 percent of the Is this figure just for one application, or is it likely to have a Ziff-like distribution (i.e. "most" XML applications will have only a single non-terminal child at "most" of the nodes). >time. So it's possible to represent one-child cases directly using an >Object reference to the child node, rather than a reference to a one-long >array of child nodes. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Thu Dec 18 11:36:15 1997 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 16:59:36 2004 Subject: RFC: Simple XML Event-Based API for Java Message-ID: <01bd0ba9$145a9060$1e09e391@mhklaptop.bra01.icl.co.uk> I like the way this discussion is going. I don't want to be on anyone's critical path, but I'll be trying out these interfaces (as an "application writer") if I can find the time. I've written a very simple application using AElfred: a converter from an XML-based encoding of genealogical data back to the "standard" GEDCOM encoding. (The converter the other way was in Visual Basic, I will probably rewrite it in Java now I'm getting the hang of it.) It is beautifully concise, just 17 lines of code apart from the boilerplate which was copied straight from one of the AElfred sample apps, and will be even simpler with the proposed revisions to the interface. To do anything more interesting with the data (i.e. anything that is not a single-pass operation) I need a tree representation. Yes, I don't want to build my own. The DOM seems to be the right solution for this. The idea of having a choice of parsers with the same event interface, and a choice of tree-builders that build the same DOM interface using any of the parsers, is very appealing. (What I haven't really worked out yet, and would appreciate advice on, is how to turn the XML objects into a set of genealogical objects, with methods like getFather(), getMother(), getSpouses(). Do I need to build a separate tree with the data organised differently, or should I write methods/functions that operate on the nodes in the XML tree? I guess the chemists must have similar problems.) The other thing I need, which has not really been fully addressed, is access to the DTD. (Not for this application, which I am doing just as a learning exercise, but for my real job.) I think we need some kind of extension to the DOM to provide this. Regards and thanks for all the good work, Mike Kay, ICL xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Dec 18 12:34:16 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:36 2004 Subject: Goals: XML Event Interface Message-ID: <199712181232.HAA00429@unready.microstar.com> I think that the time has come to deal with a question that we have postponed so far: the goal of a simple XML event-driven interface. Right now, there are two completely different ideas: 1. The interface will provide standardised low-level, pre-DOM functionality for parsers to implement, for programmers who do not want to incur the overhead of using the DOM; perhaps a DOM tree could be built using only these interfaces. 2. The interface will provide standardised high-level, post-DOM functionality for parsers to implement, for programmers who do not want to take the time to learn the XML concepts in the DOM; perhaps the events could be generated from a DOM tree. These two are actually quite incompatible: the first is an attempt to create a less abstract user model, while the second is an attempt to create a more abstract user model. It's only a (happy) co-incidence that we have managed a broad agreement so far. LOW-LEVEL INTERFACE ------------------- If we decided on (1), then I would consider making the interface the core interface for �lfred, and I would probably want to expand it slightly to include enough functionality to build a basic level-1 DOM tree, by adding some or all of the following information: - an event for the doctype declaration - an isSpecified flag for attributes - ignorable whitespace (�lfred should return this anyway) - comments (yech -- _WHY_ is that in the DOM???) This interface could use only JDK 1.0.2 features, since I have no intention of making �lfred incompatible with existing browsers. HIGH-LEVEL INTERFACE -------------------- If we decided on (2), then I would simply produce an optional add-on for �lfred, outside of its core interfaces (and probably in a separate package). I would probably make a pass-through class implementing (the new) XmlProcessor instead of having �lfred implement it directly, so that the core �lfred could still consist of only two class files. In this case, the simple interface would be slightly less efficient, and would include only very minimal functionality (as Tim suggests); for anything more, you would have to use each parser's native interface. You could not build a DOM tree using this interface. The question would remain open whether the simple interface could use JDK 1.1 or JDK 1.2 features. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Thu Dec 18 14:14:53 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:36 2004 Subject: Example DOM ObjectBuilder Message-ID: I released a version of mindo-j and an example DOM ObjectBuilder to: http://www.chimu.com/projects/mondo/release/ mindo-j is a minimal subset of MONDO suitable for accomplishing some particular tasks. The version above is focused on supporting DOM document building, but it can easily expand into much more functionality and has a more general perspective than might be expected. The example includes a version of the DOM interfaces and a skeleton implementation. This is very preliminary for the DOM code, but I am about to fly off for the holidays so I thought it would be good to release it before then. The current release is based on Aelfred but it was slightly modified to support InputStreams and so is included under a different package name. I will migrate mindo/MONDO to support the standard Java XML API when it is finalized. --Mark mark.fussell@chimu.com i ChiMu Corporation Architectures for Information h M info@chimu.com Object-Oriented Information Systems C u www.chimu.com Architecture, Frameworks, and Mentoring xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 18 14:29:54 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:37 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <01bd0ba9$145a9060$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: <3.0.1.16.19971218145207.344fc87a@pop3.demon.co.uk> At 11:35 18/12/97 -0000, Michael Kay wrote: >I like the way this discussion is going. I don't want to be on anyone's >critical path, but I'll be trying out these interfaces (as an "application >writer") if I can find the time. This is really great Michael. My vision of (at least one role of) the interface is precisely what you describe. An intelligent, but ignorant Java/XML application programmer. A Random Walk in Science I don't know what to assume... You may assume infinite ignorance and unlimited intelligence >I've written a very simple application using AElfred: a converter from an >XML-based encoding of genealogical data back to the "standard" >GEDCOM encoding. This counts :-) (The converter the other way was in Visual Basic, >I will probably rewrite it in Java now I'm getting the hang of it.) It is >beautifully concise, just 17 lines of code apart from the boilerplate >which was copied straight from one of the AElfred sample apps, >and will be even simpler with the proposed revisions to the >interface. This is exactly what we are after. The idea that we can develop an application in a few lines is one of the beauties of XML. After all, we are likely to get a lot more converts if they can write their app in half a page. The boilerplate lends itself to GUI tools (e.g. presenting the programmer with a dozen boxes to fill in for doEntity, etc.) > >To do anything more interesting with the data (i.e. anything that is not >a single-pass operation) I need a tree representation. Yes, I don't >want to build my own. The DOM seems to be the right solution for >this. The idea of having a choice of parsers with the same event >interface, and a choice of tree-builders that build the same DOM >interface using any of the parsers, is very appealing. Absolutely. JUMBO is essentially a tree-based tool and I expect it to either implement the DOM or to simply hand over large chunk of current code to better written stuff for tree management. As you've probably seen, the Java SwingSet has a Tree tool, which comes with an example. The major time taking is simply to find one's way around the documentation. I would have liked to use it for JUMBO and hacked a simple example, but I need quite a lot of functionality for each displayed node and I haven't yet found out how to do that (basically I need a miniPanel for each node). > >(What I haven't really worked out yet, and would appreciate advice >on, is how to turn the XML objects into a set of genealogical >objects, with methods like getFather(), getMother(), getSpouses(). Do >I need to build a separate tree with the data organised differently, >or should I write methods/functions that operate on the nodes in >the XML tree? I guess the chemists must have similar problems.) I nearly replied to your earlier posting, but was too busy. Any pure tree is extremely easy to represent in XML. So, if you simply want to trace an ancestor tree (i.e. two parents, 4 grand parents, etc.) this is trivial. If some happen to be identical you can use entities to normalise the data or hyperlinks. E.g. your father's father's father could be your mother's mother's father in most countries (cousin marriage). I can display an animal taxonomy using nothing more than XML and standard JUMBO. The difficulty comes when the graph has cycles. I am not an expert genealogist, but most 'family trees' seem to me to be Directed Acyclic Graphs (DAGs) where the arcs are isParentOf(); and is directional. DAGs are common in areas like multiple inheritance graphs (C++), multiple ontological views, etc. I would hope that some standard ways of representing DAGs might come out of XML and that there would be standard viewing tools. Note that the use of ID/IDREF may introduce additional complexity. Personally I am not clear on the value of IDREF over XLL - it's not trivial to support in a browser and I doubt that JUMBO will do it. If we include marriage or other descriptions of human liaison we have a different type of link. This results in a complex structure, which I would use XLL to represent. I'd value views on this, because we shall be encountering XLL on this list from time to time :-) One approach is to regard all nodes as disjoint, and to create every relationship in a separate database. A fictitious family might look like: Elizabeth Philip Charles Diana Camilla To represent this structure prettily, and to navigate it usefully, is almost certainly application-dependent. There are some nice graph layout tools but they cannot render every application in a meaningful manner. (Some might even be molecules :-) > >The other thing I need, which has not really been fully addressed, >is access to the DTD. (Not for this application, which I am doing >just as a learning exercise, but for my real job.) I think we need >some kind of extension to the DOM to provide this. I await other comments, but my expectations is that the DOM will actively deal with this. Experts, please? P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 18 14:33:51 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:37 2004 Subject: Example DOM ObjectBuilder In-Reply-To: Message-ID: <3.0.1.16.19971218152409.08df867e@pop3.demon.co.uk> At 06:14 18/12/97 -0800, Mark L. Fussell wrote: [...] >The current release is based on Aelfred but it was slightly modified to >support InputStreams and so is included under a different package name. >I will migrate mindo/MONDO to support the standard Java XML API when it >is finalized. Great. This is getting very close to David's 3+3 :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Thu Dec 18 14:41:10 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:37 2004 Subject: GEDCOM Model RFC: Simple XML Event-Based API for Java In-Reply-To: <01bd0ba9$145a9060$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: Michael Kay wrote: > (What I haven't really worked out yet, and would appreciate advice > on, is how to turn the XML objects into a set of genealogical > objects, with methods like getFather(), getMother(), getSpouses(). Do > I need to build a separate tree with the data organised differently, > or should I write methods/functions that operate on the nodes in > the XML tree? I guess the chemists must have similar problems.) I would strongly suggest first designing the genealogical object model from the GEDCOM definitions (and other sources) without considering XML or DOM at all. You need to first get a good model of the information you want to represent in a computer (usually called a DomainModel) before considering technological/application constraints on it. After you have the model you can consider how that information could be best constructed from an XML/GEDCOM encoding. The GEDCOM spec has a very specific model behind it, so you can decide whether to use that model, a subset of it, or some improvement to it. There is a lot of stuff in there so it may take a while to get a good DomainModel out of it and then implement that model in Java. After that, the XML should be very easy. Last time I checked (maybe a year or two ago), nobody had a publically available GEDCOM object model or implementation in Java, but maybe that has changed. I spent several days starting the process of building a model but got called off to other tasks [not sure where my notes are]. If you have not already, you may want to look at Martin Fowler's Analysis Patterns book or any of the three Amigos' books (Booch, Rumbaugh, Jacobson). Full references for these books are at: http://www.chimu.com/projects/mondo/links.html --Mark mark.fussell@chimu.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 18 14:51:48 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:37 2004 Subject: Goals: XML Event Interface In-Reply-To: <199712181232.HAA00429@unready.microstar.com> Message-ID: <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk> At 07:32 18/12/97 -0500, David Megginson wrote: >I think that the time has come to deal with a question that we have >postponed so far: the goal of a simple XML event-driven interface. Good thinking. One of the really great aspects of XML was/is the 10 goals. >Right now, there are two completely different ideas: > >1. The interface will provide standardised low-level, pre-DOM > functionality for parsers to implement, for programmers who do not > want to incur the overhead of using the DOM; perhaps a DOM tree > could be built using only these interfaces. Yes. This is needed. It will be needed after the DOM is finalised. (It might then be built on top of the DOM - I don't know). It is needed now (== Jan 12). > >2. The interface will provide standardised high-level, post-DOM > functionality for parsers to implement, for programmers who do not > want to take the time to learn the XML concepts in the DOM; perhaps > the events could be generated from a DOM tree. I understand and agree with the concept. I am not qualified to comment on whether it is needed or is different from the API to the DOM. > >These two are actually quite incompatible: the first is an attempt to >create a less abstract user model, while the second is an attempt to >create a more abstract user model. It's only a (happy) co-incidence >that we have managed a broad agreement so far. Yup. In my limited vision it is *possible* that (1) might be a subset of (2), but not necessarily. > > >LOW-LEVEL INTERFACE >------------------- > >If we decided on (1), then I would consider making the interface the >core interface for lfred, and I would probably want to expand it >slightly to include enough functionality to build a basic level-1 DOM >tree, by adding some or all of the following information: > >- an event for the doctype declaration Essential IMO >- an isSpecified flag for attributes Not quite clear what this is. I assume it is NOT the value of the Default in the ATTLIST (i.e. "#IMPLIED"). BUT this concept is required in some XLL applications. Is it the question of the return value of a non-existent attribute. IOW what does return for String s = element.getAttval("BAR"); // answer: "baz" String s = element.getAttval("BLORT");// answer "six spaces" String s = element.getAttval("XYZZY");// answer "" String s = element.getAttval("PLUGH");// could be "", or null String s = element.getAttval("Y2");// could be "", or null This is an area where I think we MUST spell out in graphic detail what is returned. If nothing else, this is a prime reason for this API. I have got this hopelessly muddled throughout JUMBO simply because there was no API. I didn't want to hardcode in anything until the semantics of all this was clear. At present JUMBO does not distinguish between a null String and "". If this is going to be important (and I suspect it might) we need to know NOW. It will be almost impossible to reprogram an application that gets it "wrong". Note for newcomers. If I add the declaration: and wave it over the document, the value of BLORT changes to "six spaces" This is always good for a laugh at XML parties, and you can probably make money out of carefully placed bets. >- ignorable whitespace (lfred should return this anyway) >- comments (yech -- _WHY_ is that in the DOM???) > >This interface could use only JDK 1.0.2 features, since I have no >intention of making lfred incompatible with existing browsers. Agreed. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Dec 18 15:01:15 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:37 2004 Subject: Goals: XML Event Interface In-Reply-To: <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk> References: <199712181232.HAA00429@unready.microstar.com> <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk> Message-ID: <199712181458.JAA00315@unready.microstar.com> Peter Murray-Rust writes: > >- an isSpecified flag for attributes > > Not quite clear what this is. I assume it is NOT the value of the Default > in the ATTLIST (i.e. "#IMPLIED"). DTD: Document instance: ... The attribute "bar" has the value "hack", and is not specified (i.e., it is a defaulted value). ... The attribute "bar" has the value "hack", and is specified. ... The attribute "bar" has the value "hello", and is specified. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Thu Dec 18 15:28:25 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:37 2004 Subject: RFC: Simple XML Event-Based API for Java Message-ID: <3.0.32.19971218072536.00acf1c0@pop.intergate.bc.ca> At 04:15 AM 15/09/97 -0400, Tyler Baker wrote: >I am not sure if this is at all relevant to this discussion, but I got some info >via email from the JDC newsletter that gives an interesting tip on how to >efficiently build tree structures without sucking up too much RAM. Lark does this now; amazing how Java, which "doesn't have pointers because they're error-prone", does have something that smells just like (void *)... in fact, one of the problems that bedevilled programmers for a generation is that lots of useful C programs were written on VAXes, where a pointer to everything was always the same size, then that wasn't true any more on 16-bit DOS boxes; looks like from that point of view, Java is back to the good old days of the VAX. -T. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 18 15:39:18 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:37 2004 Subject: A bit of fun Message-ID: <3.0.1.16.19971218163007.0a3f967e@pop3.demon.co.uk> Since some of us may have time to relax, and since the following is *very* close to what we are doing with the API, I have forwarded something I have just received. [There was no useful metadata with the message.] If anyone feels like translating the spirit into SGML/XML that could be appropriate at this time of year in some countries. [... header clipped ...] ----------------------------------------------------------- Task is to write a program that prints "Hello World" on the screen...make sure you see the last few attempts (Dilbert). High School/Jr.High ================== 10 PRINT "HELLO WORLD" 20 END First year in College ==================== program Hello(input, output) begin writeln('Hello World') end. Senior year in College ===================== (defun hello (print (cons 'Hello (list 'World)))) New professional =============== #include void main(void) { char *message[] = {"Hello ", "World"}; int i; for(i = 0; i < 2; ++i) printf("%s", message[i]); printf("\n"); } Seasoned professional ==================== #include #include class string { private: int size; char *ptr; public: string() : size(0), ptr(new char('\0')) {} string(const string &s) : size(s.size) { ptr = new char[size + 1]; strcpy(ptr, s.ptr); } ~string() { delete [] ptr; } friend ostream &operator <<(ostream &, const string &); string &operator=(const char *); }; ostream &operator<<(ostream &stream, const string &s) { return(stream << s.ptr); } string &string::operator=(const char *chrs) { if (this != &chrs) { delete [] ptr; size = strlen(chrs); ptr = new char[size + 1]; strcpy(ptr, chrs); } return(*this); } int main() { string str; str = "Hello World"; cout << str << endl; return(0); } Master Programmer :-)) ================ [ uuid(2573F8F4-CFEE-101A-9A9F-00AA00342820) ] library LHello { // bring in the master library importlib("actimp.tlb"); importlib("actexp.tlb"); // bring in my interfaces #include "pshlo.idl" [ uuid(2573F8F5-CFEE-101A-9A9F-00AA00342820) ] cotype THello { interface IHello; interface IPersistFile; }; }; [ exe, uuid(2573F890-CFEE-101A-9A9F-00AA00342820) ] module CHelloLib { // some code related header files importheader(); importheader( ); importheader(); importheader("pshlo.h"); importheader("shlo.hxx"); importheader("mycls.hxx"); // needed typelibs importlib("actimp.tlb"); importlib("actexp.tlb"); importlib("thlo.tlb"); [ uuid(2573F891-CFEE-101A-9A9F-00AA00342820), aggregatable ] coclass CHello { cotype THello; }; }; #include "ipfix.hxx" extern HANDLE hEvent; class CHello : public CHelloBase { public: IPFIX(CLSID_CHello); CHello(IUnknown *pUnk); ~CHello(); HRESULT __stdcall PrintSz(LPWSTR pwszString); private: static int cObjRef; }; #include #include #include #include #include "thlo.h" #include "pshlo.h" #include "shlo.hxx" #include "mycls.hxx" int CHello::cObjRef = 0; CHello::CHello(IUnknown *pUnk) : CHelloBase(pUnk) { cObjRef++; return; } HRESULT __stdcall CHello::PrintSz(LPWSTR pwszString) { printf("%ws\n", pwszString); return(ResultFromScode(S_OK)); } CHello::~CHello(void) { // when the object count goes to zero, stop the server cObjRef--; if( cObjRef == 0 ) PulseEvent(hEvent); return; } #include #include #include "pshlo.h" #include "shlo.hxx" #include "mycls.hxx" HANDLE hEvent; int _cdecl main( int argc, char * argv[]) { ULONG ulRef; DWORD dwRegistration; CHelloCF *pCF = new CHelloCF(); hEvent = CreateEvent(NULL, FALSE, FALSE, NULL); // Initialize the OLE libraries CoInitializeEx(NULL, COINIT_MULTITHREADED); CoRegisterClassObject(CLSID_CHello, pCF, CLSCTX_LOCAL_SERVER, REGCLS_MULTIPLEUSE, &dwRegistration); // wait on an event to stop WaitForSingleObject(hEvent, INFINITE); // revoke and release the class object CoRevokeClassObject(dwRegistration); ulRef = pCF-Release(); // Tell OLE we are going away. CoUninitialize(); return(0); } extern CLSID CLSID_CHello; extern UUID LIBID_CHelloLib; CLSID CLSID_CHello = { /* 2573F891-CFEE-101A-9A9F-00AA00342820 */ 0x2573F891, 0xCFEE, 0x101A, { 0x9A, 0x9F, 0x00, 0xAA, 0x00, 0x34, 0x28, 0x20 } }; UUID LIBID_CHelloLib = { /* 2573F890-CFEE-101A-9A9F-00AA00342820 */ 0x2573F890, 0xCFEE, 0x101A, { 0x9A, 0x9F, 0x00, 0xAA, 0x00, 0x34, 0x28, 0x20 } }; #include #include #include #include #include #include "pshlo.h" #include "shlo.hxx" #include "clsid.h" int _cdecl main( int argc, char * argv[]) { HRESULT hRslt; IHello *pHello; ULONG ulCnt; IMoniker * pmk; WCHAR wcsT[_MAX_PATH]; WCHAR wcsPath[2 * _MAX_PATH]; // get object path wcsPath[0] = '\0'; wcsT[0] = '\0'; if( argc 1) { mbstowcs(wcsPath, argv[1], strlen(argv[1]) + 1); wcsupr(wcsPath); } else { fprintf(stderr, "Object path must be specified\n"); return(1); } // get print string if(argc 2) mbstowcs(wcsT, argv[2], strlen(argv[2]) + 1); else wcscpy(wcsT, L"Hello World"); printf("Linking to object %ws\n", wcsPath); printf("Text String %ws\n", wcsT); // Initialize the OLE libraries hRslt = CoInitializeEx(NULL, COINIT_MULTITHREADED); if(SUCCEEDED(hRslt)) { hRslt = CreateFileMoniker(wcsPath, &pmk); if(SUCCEEDED(hRslt) hRslt = BindMoniker(pmk, 0, IID_IHello, (void **)&pHello); if(SUCCEEDED(hRslt)) { // print a string out pHello- PrintSz(wcsT); Sleep(2000); ulCnt = pHello- Release(); } else printf("Failure to connect, status: %lx", hRslt); // Tell OLE we are going away. CoUninitialize(); } return(0); } Apprentice Hacker ================== #!/usr/local/bin/perl $msg="Hello, world.\n"; if ($#ARGV = 0) { while(defined($arg=shift(@ARGV))) { $outfilename = $arg; open(FILE, " " . $outfilename) || die "Can't write $arg: $!\n"; print (FILE $msg); close(FILE) || die "Can't close $arg: $!\n"; } } else { print ($msg); } 1; Experienced Hacker ================== #include #define S "Hello, World\n" main(){exit(printf(S) == strlen(S) ? 0 : 1);} Seasoned Hacker ================== % cc -o a.out ~/src/misc/hw/hw.c % a.out Guru Hacker ================== % cat Hello, world. ^^D New Manager ================== 10 PRINT "HELLO WORLD" 20 END Middle Manager ================== mail -s "Hello, world." bob@b12 Bob, could you please write me a program that prints "Hello, world."? I need it by tomorrow. ^^D Senior Manager ================== % zmail jim I need a "Hello, world." program by this afternoon. Chief Executive ================== % letter letter: Command not found. % mail To: ^^X ^^F ^^C % help mail help: Command not found. % damn! !: Event unrecognized % logout -------------- next part -------------- begin: vcard fn: Tim Preston n: Preston;Tim org: MDIS adr;dom: ;;Boundary Way;Hemel Hempstead;Herts;HP2 7HU; email;internet: tpreston@uk.mdis.com title: Principal Consultant tel;work: +44 1442 272084 tel;fax: +44 1442 272777 x-mozilla-cpt: ;0 x-mozilla-html: FALSE version: 2.1 end: vcard -------------- next part -------------- Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg From tms at ansa.co.uk Thu Dec 18 16:33:47 1997 From: tms at ansa.co.uk (Toby Speight) Date: Mon Jun 7 16:59:37 2004 Subject: Unspecified #IMPLIED attributes in Java (was: Goals: XML ...) In-Reply-To: Peter Murray-Rust's message of "Thu, 18 Dec 1997 15:21:22" References: <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk> Message-ID: Peter> Peter Murray-Rust > In article <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>, > Peter wrote: Peter> Is it the question of the return value of a non-existent Peter> attribute. IOW what does Peter> Peter> Peter> Peter> Peter> Peter> return for Peter> String s = element.getAttval("PLUGH");// could be "", or null David has answered the original question (what is isSpecified() for in the Java simple API?), but I thought I'd mention that DSSSL's attribute-string function returns #f for PLUGH; the Java equivalent of this is of course, null. I think this is the Right Thing to do; it's sometimes important to tell the difference between and . The first case is often used to mean a known, empty value; the second to mean "not known" or "not applicable". Concrete example: I'm a rock climber, and I keep a record of all my climbing in XML format. Climbs are defined as climbs.dtd> climbs.dtd> grade CDATA "" climbs.dtd> stars CDATA #IMPLIED climbs.dtd> style (l|2|al|s|tr|mt) #IMPLIED climbs.dtd> with CDATA #IMPLIED climbs.dtd> > Note the "stars" attribute, which is used for a climb's star rating (an indication of quality). An instance looks like climbs.xml> with="&p-hkm;">Difficult Crack Here, the lack of stars is explicit - it's not a high-quality climb. Whereas climbs.xml> style="l">King's Chimney is a climb in a part of Britain where the star system isn't used, and so I omitted the attribute - even though it probably deserves a star or two. I would not want these two values confused! -- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Thu Dec 18 20:00:48 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:37 2004 Subject: Goals: XML Event Interface References: <199712181232.HAA00429@unready.microstar.com> <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk> <199712181458.JAA00315@unready.microstar.com> Message-ID: <3499639F.CDAF9A3A@technologist.com> David Megginson wrote: > > ... > The attribute "bar" has the value "hack", and is not specified > (i.e., it is a defaulted value). > > ... > The attribute "bar" has the value "hack", and is specified. I don't think nsgmls (for example) makes this distinction and I don't remember ever wishing it did. When do you need to know this? As an author, I certainly don't think that my software is going to work differently if I use the default or specify it. I would be quite disconcerted if it did. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Thu Dec 18 20:01:05 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:37 2004 Subject: Goals: XML Event Interface In-Reply-To: <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk> References: <199712181232.HAA00429@unready.microstar.com> <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk> Message-ID: <199712181958.OAA00474@unready.microstar.com> Perhaps I should clarify my question: Should a common XML event-based API supply enough information to build a DOM representation of a document? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 18 21:02:45 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:37 2004 Subject: Unspecified #IMPLIED attributes in Java (was: Goals: XML ...) In-Reply-To: References: <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971218190703.0a3f9548@pop3.demon.co.uk> At 16:32 18/12/97 +0000, Toby Speight wrote: >Peter> Peter Murray-Rust > >> In article <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk>, >> Peter wrote: > >Peter> Is it the question of the return value of a non-existent >Peter> attribute. IOW what does >Peter> >Peter> >Peter> >Peter> >Peter> >Peter> return for >Peter> String s = element.getAttval("PLUGH");// could be "", or null > >David has answered the original question (what is isSpecified() for in the >Java simple API?), but I thought I'd mention that DSSSL's attribute-string >function returns #f for PLUGH; the Java equivalent of this is of course, >null. I think this is the Right Thing to do; it's sometimes important to >tell the difference between and . I agree that it is the Right Thing to do. If everyone else agrees it is the Right Thing to do I will be very happy. If 10% agrees and the other 90% don't know what we are on about, we need to make sure they can't Go Wrong :-) > >The first case is often used to mean a known, empty value; the second >to mean "not known" or "not applicable". > >Concrete example: I'm a rock climber, and I keep a record of all my >climbing in XML format. Climbs are defined as How exciting - I used to be (not a very good one). [...] > >climbs.xml> climbs.xml> with="&p-hkm;">Difficult Crack > >Here, the lack of stars is explicit - it's not a high-quality climb. >Whereas > >climbs.xml> climbs.xml> style="l">King's Chimney > >is a climb in a part of Britain where the star system isn't used, and >so I omitted the attribute - even though it probably deserves a star >or two. Not according to MacInnes' star system; he gives it zero stars :-). Seriously, if we adopt this system then we should make every effort to promote it. One difficult area is in editors - how do you signal the difference between "" and null when entering a value in a box? You either have to get them to input NULL (yukk) or add another button for "IMPLIED" (Ugh). I'd like some other expert opinion on this. It's a tricky area if we get it wrong. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Thu Dec 18 21:05:45 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:37 2004 Subject: Goals: XML Event Interface Message-ID: <3.0.32.19971218130511.00a98cc4@pop.intergate.bc.ca> At 02:58 PM 18/12/97 -0500, David Megginson wrote: > Should a common XML event-based API supply enough information to > build a DOM representation of a document? Maybe, maybe not, depending what you mean by "common". For the simple interface we're trying to build, this cannot be remotely a goal. The only goal should be to give application authors access to the elements, attributes and character data of a document in the most transparent possible way. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 18 21:54:57 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:37 2004 Subject: LISTRIVIA: A bit of fun In-Reply-To: <199712181939.LAA29002@mehitabel.eng.sun.com> Message-ID: <3.0.1.16.19971218223117.344fc9ca@pop3.demon.co.uk> At 11:39 18/12/97 -0800, Murray Altheim wrote: >PLEASE can you avoid having any fun. I have received private mail in >support of this view and I shall be very boring in pursuing this. It's not >difficult to avoid, and for most people it's a waste of time and money. My apologies to anyone who was offended, or whose manager was offended. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jharmon at telecnnct.com Thu Dec 18 23:21:29 1997 From: jharmon at telecnnct.com (Jim Harmon) Date: Mon Jun 7 16:59:37 2004 Subject: LISTRIVIA: A bit of fun References: <3.0.1.16.19971218223117.344fc9ca@pop3.demon.co.uk> Message-ID: <3499A963.5656AEC7@telecnnct.com> Peter Murray-Rust wrote: > > At 11:39 18/12/97 -0800, Murray Altheim wrote: > >PLEASE can you avoid having any fun. I have received private mail in > >support of this view and I shall be very boring in pursuing this. It's not > >difficult to avoid, and for most people it's a waste of time and money. > > My apologies to anyone who was offended, or whose manager was offended. I've been lurking on this list for months now. Peter's post is the first one I've actually copied to friends. In sight of the holiday(s), I think it's appropriate to lighten up a little. Thankyou, Peter for the very entertaining post in a very staid topic forum. (And thank you, everyone else, for having this forum. I learn from you all, every time I scan a message.) > P. > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic > net connection > VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary > http://www.venus.co.uk/vhg > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) -- Jim Harmon The Telephone Connection jim@telecnnct.com Rockville, Maryland xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Dec 19 01:09:41 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:37 2004 Subject: Jade and isSpecified In-Reply-To: <349990ED.64CBCE41@technologist.com> References: <199712181232.HAA00429@unready.microstar.com> <3.0.1.16.19971218152122.29672990@pop3.demon.co.uk> <199712181458.JAA00315@unready.microstar.com> <3499639F.CDAF9A3A@technologist.com> <199712182004.PAA00496@unready.microstar.com> <349990ED.64CBCE41@technologist.com> Message-ID: <199712190106.UAA00332@unready.microstar.com> Paul Prescod writes: > Could you help me find it? I can see that "implied" boolean > characteristic on "attributes", but it only seems to mean really > implied, not defaulted. Sorry, my mistake -- it's Omnimark, not Jade, that tells you whether an attribute was specified. With groves, I think that you'd need the basesds1 module to get that information. That said, the information _is_ available in SP itself, using Boolean Attribute::specified() in SP's native interface (see include/Attribute.h). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 19 08:37:57 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:37 2004 Subject: LISTRIVIA: A bit of fun In-Reply-To: <3499A963.5656AEC7@telecnnct.com> References: <3.0.1.16.19971218223117.344fc9ca@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971219093411.09d7b57e@pop3.demon.co.uk> At 17:53 18/12/97 -0500, Jim Harmon wrote: >Peter Murray-Rust wrote: >> >> At 11:39 18/12/97 -0800, Murray Altheim wrote: >> >PLEASE can you avoid having any fun. I have received private mail in >> >support of this view and I shall be very boring in pursuing this. It's not >> >difficult to avoid, and for most people it's a waste of time and money. >> >> My apologies to anyone who was offended, or whose manager was offended. > >I've been lurking on this list for months now. > >Peter's post is the first one I've actually copied to friends. > >In sight of the holiday(s), I think it's appropriate to lighten up a >little. > >Thankyou, Peter for the very entertaining post in a very staid topic >forum. > >(And thank you, everyone else, for having this forum. I learn from you >all, every time I scan a message.) Lets' put people out of their misery! And not let it escalate :-) Murray's post was intended in the same spirit and contained enough allusions to be interpreted that way. However, I wasn't *absolutely* sure and couldn't afford to post a humorous reply if it *were* genuine. So my reply was deadpan and covered all eventualities. [It is remarkable how easy it is to get entangled in farcelike situations in the virtual world. One of mine, which I dare not repeat, arose from a 1:10000 chance and let to an almost Shakespearean comedy.] Seriously, many SGML documents *do* look very similar to the middle of the posting. With catalogs, SGML declarations, entity sets, DTDs, parameter entities, etc. it is possible to obfuscate SGML documents pretty well. XML self-denies itself the first two, but the spec itself gets off to a good start with the "tricky" entity replacement. XLL adds the ability of Xpointers and SHOW="EMBED" to do some interesting transclusion. And there is always Unicode :-) So some limited examples could be educational.... P. > >> P. >> >> Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic >> net connection >> VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary >> http://www.venus.co.uk/vhg >> >> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >> To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >> (un)subscribe xml-dev >> To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >> subscribe xml-dev-digest >> List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > >-- > Jim Harmon The Telephone Connection >jim@telecnnct.com Rockville, Maryland > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 19 10:08:58 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:37 2004 Subject: XML as a programming tool Message-ID: <3.0.1.16.19971219110605.54ef09d6@pop3.demon.co.uk> This message is probably trivial for those with a lsp+ gene, but it may open new horizons for those like me. It has come as a revelation to me that XML *with its assorted toolkit* is a powerful programming aid for many applications. Most (non-textual) applications of XML will come with a Tree tool including editing, display, searching (a la TEI Xpointer), and transformation. These facilities are extremely useful in program development and maintenance. Since JUMBO implements all of these I have started to use these *in creating JUMBO itself*, and potentially as library routines for other non-Jumboid applications. For example, I have been revising the menu structure in JUMBO under java.awt. It's easy to make the mistake of hardcoding this, so it needs a flexible data structure. Moreover the menus may easily be changed at runtime (e.g. a new DTD or namespace may be loaded). Java menus (presumably like many other systems) are tree-structured with a number of different terminals (e.g. addSeparator();). I have therefore created the data structure as an XML document, which is built into a tree at startup. This is very easily extensible, both in structure (e.g. adding new MenuItems or Menus) or adding properties to individual parts (e.g. ** Because of the Xpointer I don't have to remember the structure of the tree!! **. I can just search for a DESCENDANT(ALL,MENUITEM,TITLE,Print)ANCESTOR(1,FILE), for example to get all instances of the "Print" command in the menu (and, say, SGMLNodeSet.addSGMLAttribute("ENABLED", "T") An amusing byproduct is that the menu itself is available as a tree, and so can be navigated or edited. It's trivial to attach HELP to the nodes of this tree. So it's a really efficient re-use of tools. As I may have mentioned before, I am converting all *external* files to XML so that a JUMBO application can rely on namespace schemas, mimetypes/helpers, Classloaders, DTDs, Help, semantic validation, etc. all being manageable through XML technology. The benefits of this (at least for me) are enormous! Obviously all of this is in Java for JUMBO, but I assume that people will convert or develop tools for other languages such as C and UNIX. I hope that we may see man xmltree or man teisearch on UNIX systems in the near future and that people will be able to use the treetools that these provide. Obvious extensions to other environments. I am sure that the original proposers of XML saw and knew all of this, but it must be very clear that XML has much more to offer than 2D paper technology :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Fri Dec 19 10:41:05 1997 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 16:59:37 2004 Subject: XML as a programming tool References: <3.0.1.16.19971219110605.54ef09d6@pop3.demon.co.uk> Message-ID: <341E5BC5.1830BD4F@infinet.com> Peter Murray-Rust wrote: > This message is probably trivial for those with a lsp+ gene, but it may > open new horizons for those like me. > > It has come as a revelation to me that XML *with its assorted toolkit* is a > powerful programming aid for many applications. Most (non-textual) > applications of XML will come with a Tree tool including editing, display, > searching (a la TEI Xpointer), and transformation. These facilities are > extremely useful in program development and maintenance. Since JUMBO > implements all of these I have started to use these *in creating JUMBO > itself*, and potentially as library routines for other non-Jumboid > applications. > In case everyone does not already know, JDK 1.2 beta 2 is out on SUN's web site at http://java.sun.com It now has unsynchronized collection classes which should significantly improve the performance of any parser which uses these features since most parsers only use one thread anyways. Hashtable and Vector have everything synchronized which slows things down a lot. I just thought this might be of use to anyone developing XML parsers. Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From M.H.Kay at eng.icl.co.uk Fri Dec 19 11:50:46 1997 From: M.H.Kay at eng.icl.co.uk (Michael Kay) Date: Mon Jun 7 16:59:37 2004 Subject: GEDCOM model in XML Message-ID: <01bd0c74$3f1f56c0$1e09e391@mhklaptop.bra01.icl.co.uk> Mark L. Fussell: >I would strongly suggest first designing the genealogical object model >from the GEDCOM definitions (and other sources) without considering XML >or DOM at all. Thanks, yes. I agree absolutely. Fortunately my background is in data modelling so I'm happy with this side of things. My design problem is whether to implement the genealogical objects as pointers to XML DOM objects or as copies/conversions of data extracted from DOM objects. Of course the choice can be hidden behind the interface. Peter Murray-Rust: >Any pure tree is extremely easy to represent in XML. So, if you simply want >to trace an ancestor tree (i.e. two parents, 4 grand parents, etc.) this is >trivial.... >The difficulty comes when the graph has cycles. I am not an expert >genealogist, but most 'family trees' seem to me to be Directed Acyclic >Graphs (DAGs) where the arcs are isParentOf(); and is directional. DAGs >are common ... > Unfortunately the "family tree" is not isomorphic with the XML tree. There is no hierarchic relationship between a husband and wife. It isn't even a DAG, (because I can record relationships like "A is-the-godfather-of B" and "B is-the-executor-of A" ). >Note that the use of ID/IDREF may introduce additional complexity. >Personally I am not clear on the value of IDREF over XLL - it's not trivial >to support in a browser and I doubt that JUMBO will do it. > I am currently using ID/IDREF to represent these relationships, because it maps directly to the current GEDCOM standard. I still feel uncomfortable that this is unrelated to the XLL linking model. I do recognise that displaying information in a genealogically-useful way is going to require application logic, and won't be achieved by general purpose XML tools; though it would certainly be nice if the general tools made it easy to follow ID/IDREF relationships. Mike Kay xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 19 12:10:40 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:37 2004 Subject: JDK 1.2 (was Re: XML as a programming tool) In-Reply-To: <341E5BC5.1830BD4F@infinet.com> References: <3.0.1.16.19971219110605.54ef09d6@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971219124142.0c4fe670@pop3.demon.co.uk> At 06:13 16/09/97 -0400, Tyler Baker wrote: Thanks very much Tyler - this was news to me. > >In case everyone does not already know, JDK 1.2 beta 2 is out on SUN's web site >at http://java.sun.com > JUMBO is 1.02 and I was planning to go to 1.1.4. Is there any reason (other than brain overload) why I shouldn't now jump straight to 1.2? i.e. is the beta reasonably stable? P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Fri Dec 19 13:01:45 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:37 2004 Subject: RFC: Simple XML Event-Based API for Java References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> <199712171903.OAA04014@unready.microstar.com> <3498E043.5F764F28@technologist.com> Message-ID: <349A6636.7D7E28AF@jclark.com> Paul Prescod wrote: > XML attributes will probably have relative URLs in them and the XML > Application will have to know how to resolve them. This reminds me of another reason why you need positional information even in a simple interface. Suppose you have a document doc.xml that references an external parsed entity chapters/3.xml and suppose chapters/3.xml contains some element with an attribute that is a relative URL "4.xml". I would claim that the appropriate URL to use as the base for resolving that relative URL is the URL of the resource that contains the URL, so that relative to the document URL, that relative URL should be interpreted as chapters/4.xml rather than 4.xml. But unless the parser passes through positional information, there's no way an application can do this. I think apps are going to need at least: startExternalEntity(URL url) endExternalEntity() James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tyler at infinet.com Fri Dec 19 13:34:20 1997 From: tyler at infinet.com (Tyler Baker) Date: Mon Jun 7 16:59:37 2004 Subject: JDK 1.2 (was Re: XML as a programming tool) References: <3.0.1.16.19971219110605.54ef09d6@pop3.demon.co.uk> <3.0.1.16.19971219124142.0c4fe670@pop3.demon.co.uk> Message-ID: <341E8472.4A859008@infinet.com> Peter Murray-Rust wrote: > At 06:13 16/09/97 -0400, Tyler Baker wrote: > > Thanks very much Tyler - this was news to me. > > > > >In case everyone does not already know, JDK 1.2 beta 2 is out on SUN's web > site > >at http://java.sun.com > > > > JUMBO is 1.02 and I was planning to go to 1.1.4. Is there any reason (other > than brain overload) why I shouldn't now jump straight to 1.2? i.e. is the > beta reasonably stable? > > Well yah most of it is stable. It has no new language features other than weak references (which really are not a language feature like inner classes), but a lot new API's including Swing. Tyler xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Fri Dec 19 14:20:47 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:37 2004 Subject: GEDCOM model in XML In-Reply-To: <01bd0c74$3f1f56c0$1e09e391@mhklaptop.bra01.icl.co.uk> Message-ID: On Fri, 19 Dec 1997, Michael Kay wrote: > Mark L. Fussell: > >I would strongly suggest first designing the genealogical object model > >from the GEDCOM definitions (and other sources) without considering XML > >or DOM at all. > > Thanks, yes. I agree absolutely. Fortunately my background is in data > modelling so I'm happy with this side of things. > > My design problem is whether to implement the genealogical objects as > pointers to XML DOM objects or as copies/conversions of data extracted from > DOM objects. Of course the choice can be hidden behind the interface. There is another choice: build your DomainObjects directly from the XML Event stream. This is what MONDO/mindo supports doing and could also be done in several other ways. With that change in focus you then have (at least) three choices: (1) Provide the DOM interfaces onto existing Domain classes. This would work if your Domain Model is easily represented as a simple containment hierarchy and you only have one such view. (2) Generate a DOM specific view when it is asked for and link the generated objects to the original domain objects. This allows multiple DOM perspectives on the same DomainModel and enables some transformation between the classes (collapsing of associations into simple attributes). (3) Provide one or more DOM Adapters onto the Domain classes, which provide similar functionality as (2) but do not maintain a seperate "cache" of DOM specific state. This is basically the same approach as Tim Howard's DomainAdapter except using document terminology instead of general GUI terms. You can also combine these approaches in various ways. Effectively (3) is the most general since it simply says: you can functionaly transform the Domain into a DOM model. (2) Caches that result [and allows intermediate transitions]. (1) Says the transform is trivial: 1-1. So these are just gradations in function and state transforms. --Mark mark.fussell@chimu.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From markb at iosphere.net Fri Dec 19 14:34:20 1997 From: markb at iosphere.net (Mark Baker) Date: Mon Jun 7 16:59:37 2004 Subject: XML as a programming tool In-Reply-To: <3.0.1.16.19971219110605.54ef09d6@pop3.demon.co.uk> Message-ID: On Fri, 19 Dec 1997, Peter Murray-Rust wrote: > It has come as a revelation to me that XML *with its assorted toolkit* is a > powerful programming aid for many applications. Yes! It's part of a shift away from Turing completeness and towards declarative programming. Curiously enough, it's been approached from two different angles by two different camps. The Web/Hypertext camp has, to my knowledge, had this vision for ages. But only recently has the distributed object camp been leaning in this direction. There's a project at PARC called "Aspect Oriented Programming", that's attempting to evolve component software to widen the scope of interface declarations (even beyond contracts). Basically, the many "aspects" of a typical program are separated out into a minimal Turing complete core, plus lots of declarative documents specifying such information as concurrency, data flow, compositional structure, etc..). All of this is run through a "weaver" to produce your end product. http://www.parc.xerox.com/spl/projects/aop/ You might also be interested in a paper that Adam Rifkin and Rohit Khare have submitted to WWW7; http://www.cs.caltech.edu/~adam/papers/www/origin-of-species.html Since this is a little off-topic, I'd recommend that any followups be taken off-list. Then again, Peter did start it ... 8-) MB -- Mark Baker, Ottawa Ontario CANADA. Java, CORBA, XML, Beans http://www.iosphere.net/~markb distobj@acm.org ICQ:5100069 Will distribute business objects for food. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From fussellm at alumni.caltech.edu Fri Dec 19 14:52:02 1997 From: fussellm at alumni.caltech.edu (Mark L. Fussell) Date: Mon Jun 7 16:59:37 2004 Subject: Unspecified #IMPLIED attributes in Java (was: Goals: XML ...) In-Reply-To: Message-ID: On 18 Dec 1997, Toby Speight wrote: > David has answered the original question (what is isSpecified() for in the > Java simple API?), but I thought I'd mention that DSSSL's attribute-string > function returns #f for PLUGH; the Java equivalent of this is of course, > null. I think this is the Right Thing to do; it's sometimes important to > tell the difference between and . I certainly agree that it is useful to tell the difference between these two cases, but it does bring up the issue that Peter said: do all users understand the issue? Also, null can only be used for 'notSpecified' if null is not an acceptable value. Frequently it is, so it is better to have a seperate 'notSpecified' marker or attribute. > The first case is often used to mean a known, empty value; the second > to mean "not known" or "not applicable". Standardizing on a particular interpretation is unfortunately much more difficult. Relational databases have generally failed at this (SQL is broken because of it) and Codd now uses multiple "marks" in his view of the Relational model. The problem is that there are many possible and useful interpretations of "missing information": (1) Uninitialized (2) Inapplicable (3) NotYetKnown (4) NotEntered (5) FunctionallyUncomputable (6) OutOfDomainBounds and so on... See C.J. Date's writings for good descriptions of the above. It is always [yes, I believe always] better to be explicit about what is known (which can include explicitly what is not known) than it is to rely on a meaning for something that is "missing". So: <... stars="0" > <... noStarRating="true" > are all better than to just leave 'stars' off and imply an application meaning. But it can be convenient to not be so "wordy". In which case the application will have to be very explicit and consistent about what 'notSpecified' means (and, for XML, how that relates to #IMPLIED when there is a DTD). For MONDO, this can be very consistent because 'notSpecified' and #IMPLIED are both treated exactly equivalent to the parameter not existing. But other applications may have difficulty with this. But, in general, defaults seem to be easily understood and anything else is on the brink of infinite possibilities. --Mark mark.fussell@chimu.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From smith at interlog.com Fri Dec 19 15:05:24 1997 From: smith at interlog.com (Chris Smith) Date: Mon Jun 7 16:59:37 2004 Subject: XML as a programming tool In-Reply-To: Message-ID: On Fri, 19 Dec 1997, Peter Murray-Rust wrote: > It has come as a revelation to me that XML *with its assorted toolkit* is a > powerful programming aid for many applications. See http://www.cam.org/~pierlou/prototype/ for an app, "Prototype". I haven't actually tried this yet, but it appears to be exactly this type of thing. --------------------------------------------------------------------------- Chris Smith xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 19 15:31:24 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:37 2004 Subject: RFC: Simple XML Event-Based API for Java In-Reply-To: <349A6636.7D7E28AF@jclark.com> References: <3.0.32.19971217103304.00b3fbe8@pop.intergate.bc.ca> <199712171903.OAA04014@unready.microstar.com> <3498E043.5F764F28@technologist.com> Message-ID: <3.0.1.16.19971219152149.54bf9f52@pop3.demon.co.uk> At 19:19 19/12/97 +0700, James Clark wrote: > >Suppose you have a document doc.xml that references an external parsed >entity chapters/3.xml and suppose chapters/3.xml contains some element >with an attribute that is a relative URL "4.xml". I would claim that >the appropriate URL to use as the base for resolving that relative URL >is the URL of the resource that contains the URL, so that relative to >the document URL, that relative URL should be interpreted as >chapters/4.xml rather than 4.xml. But unless the parser passes through >positional information, there's no way an application can do this. I would strongly support this interpretation. It's the natural one from HTML browsers and it is what I have implemented in XLL in JUMBO. I have found that the best way forward for me is that every WF fragment possesses a URL, since it may further reference other fragments. This works OK for me as far as I have got, but I am not a URL specialist. I don't know what happens when we get XML which is formed 'in vacuo' - e.g. as part of a serialized object, typed in on the command line, etc. :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tms at ansa.co.uk Fri Dec 19 15:32:54 1997 From: tms at ansa.co.uk (Toby Speight) Date: Mon Jun 7 16:59:38 2004 Subject: Unspecified #IMPLIED attributes in Java In-Reply-To: "Mark L. Fussell"'s message of "Fri, 19 Dec 1997 06:51:24 -0800 (PST)" References: Message-ID: Mark> Mark L. Fussell > In article , Mark > wrote: Mark> On 18 Dec 1997, Toby Speight wrote: >> ... DSSSL's attribute-string function returns #f for [unspecified >> #IMPLIED attributes]; the Java equivalent of this is of course, null. >> I think this is the Right Thing to do; it's sometimes important to >> tell the difference between and . Mark> I certainly agree that it is useful to tell the difference Mark> between these two cases, but it does bring up the issue that Mark> Peter said: do all users understand the issue? That's up to the application program. I have no problem with programs that treat the two examples the same *provided their documentation says that's what they are doing* (though I'd be more likely to declare the default value to be the empty string in the DTD). In DSSSL, this behaviour would be (let ((val (attribute-string "bargh"))) (if val val "")) Mark> Also, null can only be used for 'notSpecified' if null is not an Mark> acceptable value. Frequently it is, so it is better to have a Mark> seperate 'notSpecified' marker or attribute. Are we talking about the same thing here? If the parser returns a string for each attribute value, then the Java null reference is distinct from any acceptable (i.e. writable in the XML document) value. You've confused me with your suggestion that null may be an acceptable value; would you care to clarify? >> The first case is often used to mean a known, empty value; the second >> to mean "not known" or "not applicable". Mark> Standardizing on a particular interpretation is unfortunately Mark> much more difficult. ... The problem is that there are many Mark> possible and useful interpretations of "missing information": Mark> ... I realise this; I was merely attempting to describe what #IMPLIED is used for in practice, with specific application[*] conventions - that's why I used the word "often" ;-). [*] using the word "application" in its SGML sense - argh! Mark> But it can be convenient to not be so "wordy". In which case the Mark> application will have to be very explicit and consistent about what Mark> 'notSpecified' means (and, for XML, how that relates to #IMPLIED when Mark> there is a DTD). Agreed. Mark> For MONDO, this can be very consistent because 'notSpecified' and Mark> #IMPLIED are both treated exactly equivalent to the parameter not Mark> existing. But other applications may have difficulty with this. I've been looking at it the other way around - to me, it seemed "obvious" to return #IMPLIED as null, and then to think about whether the no-DTD case is equivalent. [I think that that bias springs from the fact that I haven't written any DTD-less applications and I generally use traditional SGML tools (SP, Jade, psgml-mode, etc.).] FWIW, I concur that DTD-less processing ought to be equivalent to specifying all attributes as #IMPLIED, but for the parser API, there is a difference: I think that the parser should return null in the valid-processing case, but in the well-formed DTD-less case, it cannot know that the attribute has been omitted, and so will return neither the name nor the value of the attribute. If I write a {grove, tree} builder, it would be useful to know whether a DTD was used, so that it can report an error to an application trying to access an attribute that was not declared (this may be the symptom of a typo, perhaps). If a DTD was not used for the parse, then the access should return null (as if the attribute were declared #IMPLIED). -- xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Fri Dec 19 16:17:52 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:59:38 2004 Subject: XML as a programming tool Message-ID: One of the examples in my book was of state-driven programming. It's not exactly like writing programs - the XML document specifies states and triggers for those states. I'd foolishly thought about using it for a remote control airplane, but the prospect of crashes (more than the computer) was not so pleasant. Instead, it controls light switches, which are a lot safer most of the time. It might also be an interesting tool for model railroads - feed a controller an XML schedule, let the controller run the train. (Just don't let any real railroad hear about this.) I guess it's programming like 'programming' a VCR - someone else has written the program, I just feed it the data that controls its behavior. Still, even that limited prospect was exciting, and capable of some pretty complex stuff. Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From cskerr at geocities.com Fri Dec 19 16:32:58 1997 From: cskerr at geocities.com (Charles Kerr) Date: Mon Jun 7 16:59:38 2004 Subject: JDK 1.2 (was Re: XML as a programming tool) Message-ID: <001801bd0c9b$da184610$375c0f81@plato> The APIs seem to be mostly stable -- if I were you I'd try jumping to 1.2. However, every time I try to use MSXML 1.8 with the JDK 1.2 beta, I get an Exception... >> JUMBO is 1.02 and I was planning to go to 1.1.4. Is there any reason (other >> than brain overload) why I shouldn't now jump straight to 1.2? i.e. is the >> beta reasonably stable? > >Well yah most of it is stable. It has no new language features other than weak >references (which really are not a language feature like inner classes), but a >lot new API's including Swing. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Fri Dec 19 17:00:00 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:38 2004 Subject: Unspecified #IMPLIED attributes in Java References: Message-ID: <349A9A89.D145C298@technologist.com> Toby Speight wrote: > > If I write a {grove, tree} builder, it would be useful to know whether > a DTD was used, so that it can report an error to an application trying > to access an attribute that was not declared (this may be the symptom > of a typo, perhaps). If a DTD was not used for the parse, then the > access should return null (as if the attribute were declared #IMPLIED). The DSSSL model is that trying to access a random attribute merely returns #f. Although this could allow a typo to pass, it has the benefit of making stylesheets a little more robust to DTD variations. For instance, the same stylesheet can work with various versions of HTML. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Fri Dec 19 18:40:27 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:38 2004 Subject: XML as a programming tool Message-ID: <199712191840.SAA30600@mail.iol.ie> The concept of a DTD has a resonance with data driven programming such as JSP Jackson Structured Programming and JSD - Jackson System Design. I have on occasion used DTDs to document time ordered interfaces to objects. It can be a very powerful technique! Take a really simple object interface - an object with open,close,read,write methods. These have a time ordering which is not captured in this: int open(); int close(); int read(); int write(); Compare this:- Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Dec 19 18:58:12 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:38 2004 Subject: AElfred 1.0beta4 release Message-ID: <199712191855.NAA03585@unready.microstar.com> There is a new version of Microstar's �lfred XML parser available, incorporating some of the suggestions that have come up in recent discussions on this list. You can try out the new version online or download it using the following URL: http://www.microstar.com/XML/ �lfred 1.0beta4 contains some major changes to the interface: 1. New callbacks void startExternalEntity (XmlParser p, URL systemId) void endExternalEntity (XmlParser p, URL systemId) void charData (XmlParser p, char ch[], int length) void ignorableWhitespace (XmlParser p, char ch[], int length) 2. Removed callbacks void data (XmlParser p, String data) 2. Modified callbacks void startDocument (XmlParser p) void attribute (XmlParser p, String aname, String value, boolean isSpecified) Apologies in advance to those of you who have already integrated �lfred into your tools -- I hope that the changes won't cost you more than 15 minutes or so of modification and testing. The addition of ignorable whitespace is required by the XML spec (though �lfred is non-conforming for error-reporting, I want the information that it provides to be correct). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Dec 19 19:59:47 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:38 2004 Subject: Plug and Play XML Message-ID: <199712191957.OAA04476@unready.microstar.com> I recently had a request about samples texts to use with �lfred (Microstar's XML parser). With �lfred, or any other URL-enabled XML parser, you should be able to parse an XML document directly from the Internet. For example, when you download aelfred-1.0beta4.zip (from http://www.microstar.com/XML/), you should be able to just unzip it and point it at a URL, with no other setup. With the JDK, you change to the directory where you unzipped �lfred and type java EventDemo With Microsoft's Java VM, you can type jview EventDemo (Of course, you can run the command from any directory once �lfred is on your classpath). Here are two URLs that you can use to start playing: http://www.microstar.com/XML/donne.xml http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml Type them in, and watch the events roll down your screen -- no manual downloading required. I'd love to see the URLs for more online XML documents that we can all try out (the XML specification at www.w3.org does not currently work, because of character-encoding errors in the XML document). I might put up Beowulf in UTF-8, just to keep the other parser writers busy... All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Jon.Bosak at eng.Sun.COM Fri Dec 19 22:26:04 1997 From: Jon.Bosak at eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:59:38 2004 Subject: LISTRIVIA (was Re: RFC: Simple XML Event-Based API for Java) In-Reply-To: <3.0.1.16.19971217080814.49c74e72@pop3.demon.co.uk> (message from Peter Murray-Rust on Wed, 17 Dec 1997 08:08:14) Message-ID: <199712192224.OAA01708@boethius.eng.sun.com> I don't ordinarily send mail just to say "me too," but I want to publicly support Peter in his campaign against unnecessary quoting and attachments in mail to public lists. For people like me who archive their mail and have to get a lot of it over a phone line, such things are enormously annoying *despite* the fact that my company is picking up the expense. I can't imagine how frustrating it must be for people who are paying by the kilobyte. Jon xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From donpark at quake.net Fri Dec 19 22:47:33 1997 From: donpark at quake.net (Don Park) Date: Mon Jun 7 16:59:38 2004 Subject: LISTRIVIA (a proposal) Message-ID: <000a01bd0ccf$98a9f280$0100007f@localhost> Here is my "me too" and a proposal. Lets shorten the xml-dev signature from: xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) to: xml-dev: XML Developer mailing list. For info: http://ic.ac.uk/xmldev/info.html. Don xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 20 00:47:51 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:38 2004 Subject: LISTRIVIA (a proposal) In-Reply-To: <000a01bd0ccf$98a9f280$0100007f@localhost> Message-ID: <3.0.1.16.19971220002313.2d0f2b72@pop3.demon.co.uk> At 14:43 19/12/97 -0800, Don Park wrote: >Here is my "me too" and a proposal. Lets shorten the xml-dev signature I'll let Henry reply to this. He *did* ask me a few days ago about shortening it, and I suggested not - but maybe we should reconsider. I suspect Henry does not have resources on the list server itself, so might have to put it at www.ch.ic.ac.uk. We have been very lucky in the lack of 'Unsubscribes' on this list and perhaps there isn't a need for such a long .sig. But it's a difficult business. If you make it too difficult (after all everyone forgets the syntax of the list administration) then this simply means that Henry (not me ) gets all the "please help me, I want to get off" mails which none of the rest of us see. P. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 20 00:51:11 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:38 2004 Subject: LISTRIVIA (Duplicate postings) In-Reply-To: <199712192224.OAA01708@boethius.eng.sun.com> References: <3.0.1.16.19971217080814.49c74e72@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971220014355.37af7b70@pop3.demon.co.uk> At 14:24 19/12/97 -0800, Jon Bosak wrote: [... in support of unnecessary bytecount on the list...] It's something that comes partially out of the SGML culture. When I first started posting to comp.text.sgml, I was quickly shown by Erik Naggum - gently but very firmly - the appropriate way to use quoting. For those who remember his time on c.t.s., I think Erik is one of the most precise people I have "met" on the Internet. There is another matter of style, which I was going to raise at an appropriate time, but which Jon's contribution has catalysed me to mention. On XML-SIG there is a very strict policy against duplicate postings. Penalties (which of course are confidential) are Draconian. I'll explain the problem... A duplicate posting occurs when someone (B) replies to the list and simultaneously to the poster (A). If you do the arithmetic you will see that the original poster (A) gets two copies of the message, one from the list (L) and one from (B). Not quite identical because the headers are different, so they *look* like different messages. It gets quite disappointing for (A) to find that it's the same old letter again. Again, if you do the sums you will see that (A) gets about twice as many bytes as they really want. If you think deeply about the psychology, you'll see that it often has a similar effect on (A) as unnecessary quoting has. Now, if you don't *post* to the list, you won't be aware of this. BUT, if you do, then you'll find that sometimes you get two copies with the same content. You'll also start to recognise the people who fall into category (B). Why do they do it? Not because (B) wants to upset (A), IMO. It works something like this: When (B) gets a message posted by (A) to the list, (B) will see two fields in the header, something like this: To: xml-dev@ic.ac.uk From: A [This is not very attractive markup, and will look much nicer when mailers represent it as: A xml-dev@ic.ac.uk but a surprisingly large number of people can, in fact, interpret the first syntax without error. It is normally taken to mean that A sent a message to xml-dev@ic.ac.uk, and that xml-dev@ic.ac.uk sent it on to all the participants. Now, it starts to get a bit complicated. Let's assume that B is a member of XML-DEV, and wants to reply so that everyone can see what they (B) have written. Most mailers have a "Reply" option, often on a menu, or by pressing the "R" key. If you simply Reply to the message, it will go to (A), because most mailers look in the "From" field and assume that you want to send to the address represented by the content of the "From:" fields. So the mailer would generate a reply something like: To: A From: B and the message would go to (A), the original poster. Rats! This isn't what B wanted. Of course they (B) want (A) to read the message, but they also want everyone else on XML-DEV to read it. One way to do it would be to type the words "xml-dev@ic.ac.uk" into the "To:" field, like this: To: xml-dev@ic.ac.uk From: B and, perhaps surprisingly, this actually works - i.e. it sends a message from B to the XML-DEV list. So, what's the problem? Well, typing "xml-dev@ic.ac.uk" is 16 characters and it's very tedious to type this and check that it's right. So there's a clever way round this. Many mailers have a "Reply to All" function. This looks at everyone mentioned in the mail header and sends them all a copy of the mail. So when (B) Replys in this fashion, their outgoing mail header looks something like this: To: xml-dev@ic.ac.uk, A From: B So everyone on XML-DEV and A gets a copy. This is just what B wants. Everyone's happy. Unfortunately not. There's a very subtle point which lots of people quite naturally miss. A gets sent a message. And everyone on XML-DEV gets a message. But wait! A is a member of XML-DEV. The majordomo at ic.ac.uk isn't clever enough to know that B has sent their own personal copy of the mail to A. So, if you do the arithmetic, you'll se that A gets TWO copies of the message. And, if you think very carefully, you'll see that they aren't quite the same. One has a header saying that it has come from XML-DEV, and the other that it has come from B. But the content of the two messages is the same. What can be done about it? Well, those of you who have followed so far will see that deleting the string "A" from the To: field will solve the problem. But this is often quite long - it might be something like: "Peter Murray-Rust" which is now *45* characters - a lot of deleting. And easy to miss one out. But there's a clever trick, which perhaps not everyone knows (and probably works on most mailers). It needs practice, but most people learn in time. A. click the cursor just in front of the string you want to remove. You may see a vertical bar, or block character. B. Without taking your finger off the mouse, move it slowly to the right. The background to the letters will go green! [It might be blue on some machines, but don't worry.] When you've got to the end of the string (the one you want to delete) take your finger *off* the mouse. The background will still be green! C. Now - before you do anything else, find the "Delete" key. It's usually got "Delete" written on it. Sometimes it says "Del", or sometimes "DEL". Press it firmly, just once. The green string will disappear, *and* all the letters in it. D. *Now* you can press the "Send" button. If you work it out, your To: field will be simply: To: xml-dev@ic.ac.uk just as if you'd typed it in, but so much less effort. I realise this has been a long tutorial, and we've not even been able to cover the Cc: field, or what to do without a mouse. But if you can master this, you'll probably be able to manage the Cc: field [It stands for "Copy" and when you "Reply to all" you'll reply to people in that field as well. If it happens to be (A) you can use the same technique to delete the characters.] So, let's see if we can get the duplicate postings down to zero :-) Then I won't even have to mention things that might otherwise happen... P. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Sat Dec 20 01:41:22 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:38 2004 Subject: XML as a programming tool References: <199712191840.SAA30600@mail.iol.ie> Message-ID: <349ACE06.BFB3D358@technologist.com> Sean Mc Grath wrote: > > The concept of a DTD has a resonance with data driven programming such as > JSP Jackson Structured Programming and JSD - Jackson System Design. > > I have on occasion used DTDs to document time ordered interfaces to objects. It > can be a very powerful technique! We discuss this in a paper we gave at SGML/XML 97. We call this a "protocol." "Software Component Interface Description in SGML" "Additional architectural constraints may be provided which currently are not enforced by any programming language." "Examples include protocols and design patterns. Protocols are permissible sequences of method invocation and attribute access, possibly with additional temporal constraints. Design patterns are specifications of a set of roles in a pattern and identification of the mapping of specific classes and methods in the current definitions onto these roles." http://www.cgl.uwaterloo.ca/meta/sgml97/mmccool/index.html I do see an interesting correlation between the ideas in that paper and the aspect programming paper someone posted earlier. ON THE OTHER HAND, protocols should be rare in good software design. You can usually define an interface so that it doesn't require much explicit time ordering. For instance you can open file objects automatically when they are created and close them automatically when they are destroyed. -- Paul Prescod -- http://itrc.uwaterloo.ca/~papresco Art is always at peril in universities, where there are so many people, young and old, who love art less than argument, and dote upon a text that provides the nutritious pemmican on which scholars love to chew. -- Robertson Davies in "The Cunning Man" xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 20 09:38:16 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:38 2004 Subject: XML as a programming tool In-Reply-To: <349ACE06.BFB3D358@technologist.com> References: <199712191840.SAA30600@mail.iol.ie> Message-ID: <3.0.1.16.19971220103508.2b3f1292@pop3.demon.co.uk> At 14:41 19/12/97 -0500, Paul Prescod wrote: >Sean Mc Grath wrote: >> >> The concept of a DTD has a resonance with data driven programming such as >> JSP Jackson Structured Programming and JSD - Jackson System Design. >> >> I have on occasion used DTDs to document time ordered interfaces to objects. It >> can be a very powerful technique! > >We discuss this in a paper we gave at SGML/XML 97. We call this a >"protocol." [...] >"Software Component Interface Description in SGML" [...] >http://www.cgl.uwaterloo.ca/meta/sgml97/mmccool/index.html These look very interesting. AIUI Paul's tool is for generating code and documentation for software projects, essentially by attaching semantics to an SGML document. In a sense the document is acting as a series of instructions. There would seem to be extensions to recipes in general, so that XML could be used to perform tasks - this is the vision I have for chemistry, for example (though it could also work for cakes). In a sense that is what I am doing in my simple case with Java menus. has the implied semantics of "insert a call to addSeparator() at this point". requests calls to a hierarchy of new Menu and new Menuitem calls. This is another reason, for example, the BEHAVIOR attribute in XLL seems important. You could use it to do lots of things, "directed" by a core XML script. I have already suggested we would benefit from some agreed semantics, so that we can write the code that carries them out. For example, BEHAVIOR="display" would call the display() routine (this is what JUMBO does at present), but BEHAVIOR="doit" could call the doit() routine. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Sat Dec 20 10:46:54 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:38 2004 Subject: XML as a programming tool Message-ID: <199712201046.KAA31284@GPO.iol.ie> [Paul Prescod] > >ON THE OTHER HAND, protocols should be rare in good software design. You >can usually define an interface so that it doesn't require much explicit >time ordering. For instance you can open file objects automatically when >they are created and close them automatically when they are destroyed. >-- Oh, I'd have to disagree with you there! In many problem domains object abstractions are used to represent things that have a life history: bank accounts, customers, space flights etc. Recognising and leveraging the natural time ordering of the events that occur to these objects can be both powerful and natural. Grady Booch et al have written about a variety of ways to do it: time lines, flow diagrams etc. IMHO SGML/XML can gainfully be applied in this field. I made a stab at it at SGML '96 in Boston when I gave a paper that compared SGML DTDs with the ideas in the JSP and JSD software development methodologies. I would argue that *not* utilising the natural time ordering of events inherent in many systems is one of the things that can make event driven programming a real dog. How many times have you seen this:- if (event == OPEN) { if (ALREADY_OPENED==TRUE) barf(); else { ALREADY_OPENED=TRUE do somthing useful. } } In SGML/XML, very analagous sort of stuff results from loose content models: start_foo { InFoo == TRUE } start_a { if (InFoo == TRUE) ..... } An interface that allows events to occur in any old order leads to the introduction of state variables that control what events are valid and when. The state space gets very large very quickly. For N boolean state variables a program can be in 2**N possible states! SGML/XML is a great way to reduce a state space because SGML/XML DTDs can be usefully thought of as devices for imposing a time ordering on events. Take something like a simple bank account model: ... This is both a concise piece of documentation about the goings on of these BankAccounts and a starting point for the implementation code. As events occur they are "parsed" prior to the real processing code thus checking the desired time ordering and obviating the need for state variables to do it in the processing code. In Jackson, simple structure editors are used to create life histories which quite frankly are within a syntactic asses roar of DTDs. A Jackson structure editor is a bit like a DTD editor except that processing code can be attached to all the nodes in the DTD tree structure. Case in point: I used to write real-time financial trading systems for the PC in 80286 assembler(!). We used a Jackson Editor to model the whole system and auto-generate the procedural aspects of the code from our life histories/data models. Before I left we have re-written the whole thing for Sun Workstations in Ansi C. The point? The life-histories, data models did not change only the implentation language.did. Substitute "implemention language" for "formatting codes" in the above and it sure sounds like SGML. Sean Mc Grath sean@digitome.com Digitome Electronic Publishing http://www.digitome.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 20 11:30:18 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:38 2004 Subject: LISTRIVIA (Was Re: XML as a programming tool) In-Reply-To: <199712201046.KAA31284@GPO.iol.ie> Message-ID: <3.0.1.16.19971220122449.0a5712ec@pop3.demon.co.uk> A lot of members on this list are new to XML and SGML, and will hope to "learn as they read". (This can be quite hard as they may have misconceptions through experience of "broken" HTML.) I think it will be useful if all sample code is well-formed XML (rather than SGML) unless explicitly specified. (e.g. if you include SGML rather than XML, write At 11:16 20/12/97 +0000, Sean Mc Grath wrote: [... lots of very exciting stuff ...] >a simple bank account model: > > > > > > This isn't WF XML for several reasons. A correct version might read: >... [...] > >within a syntactic asses roar of DTDs. A Jackson structure editor is a bit We all make syntactic asses of ourselves and I have done so on numerous occasions, especially on XML-SIG. People have been very patient - "they know what I mean". But here the readers *don't* know what you mean. So we can all try to be well-formed asses :-). Therefore: (a) try to be very careful about XML examples and related matters. People will say "this is written by an expert so it must be right - I'll cut and paste it..." (b) tactfully and gently correct any errors that *do* get through. It won't be taken badly - we all make errors. For example I've corrected the "element" to "ELEMENT" as this is now required by the PR. [At one stage it wasn't, and it's often easy to work with outdated versions.]. If it's unclear, you might ask "why isn't this "X"? - the answer may be revealing. I have *never* seen any flames on these lists when people are corrected for genuine mistakes. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From murata at apsdc.ksp.fujixerox.co.jp Sat Dec 20 12:13:19 1997 From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto) Date: Mon Jun 7 16:59:38 2004 Subject: Plug and Play XML In-Reply-To: <199712191957.OAA04476@unready.microstar.com> Message-ID: <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp> David Megginson writes: > >Here are two URLs that you can use to start playing: > > http://www.microstar.com/XML/donne.xml > http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml As a co-editor of an (upcoming) RFC for text/xml and application/xml, I think that I should point out the correct procedure for encoding determination. (I have not checked these two Web sites, and Flfred.) For those XML documents transmitted by the HTTP protocol, XML parsers should use the charset parameter of the media type text/xml (BTW, the default of this parameter is 8859-1). XML parsers should ignore the encoding declaration within XML documents transmitted by HTTP. More about this, see the XML PR and the HTTP/1.1 Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 20 12:56:42 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:38 2004 Subject: Plug and Play XML In-Reply-To: <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp> References: <199712191957.OAA04476@unready.microstar.com> Message-ID: <3.0.1.16.19971220134605.2227cfa4@pop3.demon.co.uk> At 21:10 20/12/97 +0900, MURATA Makoto wrote: >David Megginson writes: >> >>Here are two URLs that you can use to start playing: >> >> http://www.microstar.com/XML/donne.xml >> http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml There are a large number of non-textual XML files under: http://ala.vsms.nottingham.ac.uk/vsms/java/jumbo/cml12/cml/ most of them served from APPLETs, but you can get the *.xml from the HTML source. > >As a co-editor of an (upcoming) RFC for text/xml and application/xml, >I think that I should point out the correct procedure for encoding determination. (I have not checked these two Web sites, and >Flfred.) > >For those XML documents transmitted by the HTTP protocol, XML parsers >should use the charset parameter of the media type text/xml (BTW, >the default of this parameter is 8859-1). XML parsers should ignore >the encoding declaration within XML documents transmitted by HTTP. >More about this, see the XML PR and the HTTP/1.1 Thanks for this reminder. For Chemical Markup Language Henry and I had originally devised our own MIME type (not official) : chemical/x-cml. But, with the likely introduction of other namespaces (e.g. RDF:*, MathML) in CML documents, it is clear that there is no need for diversity, since the namespaces themselves will have means of identifying the XML application. So CML documents will be "text/xml", unless we should use "application/xml" instead. What will differentiate a text/xml document from an application/xml one? P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From murata at apsdc.ksp.fujixerox.co.jp Sat Dec 20 13:41:15 1997 From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto) Date: Mon Jun 7 16:59:38 2004 Subject: Plug and Play XML In-Reply-To: <3.0.1.16.19971220134605.2227cfa4@pop3.demon.co.uk> Message-ID: <9712201340.AA02986@lute.apsdc.ksp.fujixerox.co.jp> Peter Murray-Rust writes: > >There are a large number of non-textual XML files under: > >http://ala.vsms.nottingham.ac.uk/vsms/java/jumbo/cml12/cml/ > >most of them served from APPLETs, but you can get the *.xml from the HTML >source. I am pleasantly surprised to see a lot more information than before. I should send this URL to one of my friends (a Ph. D in chemistry)! Peter Murray-Rust writes: > >What will differentiate a text/xml document from an application/xml one? text/* is used for text, and appliction/* is for binary data. Thus, text/xml is appropriate for XML documents. (The reason that application/xml is introduced is only for transmitting XML documents in UTF-16 or UCS-2 via e-mail.) text/* has the charset parameter, which specifies the encoding method. text/* (implicitly) allows code conversion by proxy servers. application/* does not have the charaset parameter (if not explicitly defined for subtypes). application/* (again, implicitly) disallows code conversion by proxy servers. Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Sat Dec 20 14:21:58 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:38 2004 Subject: Plug and Play XML Message-ID: <199712201423.BAA15123@jawa.chilli.net.au> > From: Peter Murray-Rust > What will differentiate a text/xml document from an application/xml one? When is each appropriate? I think the idea is to use text/xml in the normal case, and application/xml as a fallback. I think I first suggested it, but it certainly was not my preferred option: I would prefer everything to be application/xml, because I do not like the idea of dumb HTTP/MIME systems fiddling and transcoding data, which they may do for text/xml. Application/xml is a binary transmission; no bits are molested en route. The trouble with text/xml is that XML positively encourages the use of all ISO 10646 characters, for example all the symbol and publishing characters. If the data is "transcoded" enroute from a large character set encoding (e.g. Unicode or an East Asian one) to a small encoding (e.g. 8859-n) then a dumb transcoder will not translate a non-encoding- repertoire character into its numeric character reference, but probably swallow it, or put out something strange. In practise this means that all characters above 127 should be encoded using numeric character references rather than directly by XML document generators. Smart intermediate XML systems should also attempt to replace characters in data and attributes with numeric character references. When you are devising your own PI notations, and comment conventions you should also duplicate numeric character references. The unpleasant implication in all this is for native language markup. If your XML data will be sent to users who use other scripts, do not use characters in XML names that are not available in their regional character sets. Numeric character references do not apply, currently, to names. (I hope this will eventually be changed in SGML and XML, but I think the facts and the effected users will eventually speak for themselves in due time.) This is why you should be conservative in your choice of name characters. The < 127 characters are OK. The 128-255 range of characters in 8859-1 and ISO 10646 are probably pretty safe too. This problem even effects within nations, if the nation has a few different repertoires in common use: in particular in Japan Unix systems using EUC have available several thousand more kanji than older PC (i.e. shift-JIS) and macintosh systems: it is probably prudent for Japanese users to only use those characters available in shift-JIS for naming. None of these considerations were new for the XML discussion: what was new was that XML works with a particular operating model that says that documents must cope with HTTP/MIME systems but also must provide enough information to create the MIME headers in the first place. The restriction that numeric character references cannot be used in markup, just in data and attribute values, comes from the old character model of SGML. In this model, it made no sense to allow numeric character references in names, and indeed would be considered bad, because it created markup that could not be read in a simple editor. XML is probably one of the most thoroughly internationized software systems around: in particular, this internationalization has been in place and under discussion from the very beginning, and not "tacked on". Internationalization (I18n) is one area of XML that must cause difficulties for parser writers to get right. But the benefit is that once they have it right, it makes life much simpler and richer for users. Which is not to say that XML i18n is perfect, but it is certainly near state-of-the-art, given the need to fit in with HTTP/MIME and operating systems. I certainly hope that XML will not remain "state-of-the-art" for long, and that advances in various technologies--in particular, for operating system vendors to agree on a charset/encoding labelling schema that they all implement in their OS (or the adoption of MIME as a file format, e.g. .MIM)-- will overtake it. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From murata at apsdc.ksp.fujixerox.co.jp Sat Dec 20 15:15:04 1997 From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto) Date: Mon Jun 7 16:59:38 2004 Subject: Plug and Play XML In-Reply-To: <199712201423.BAA15123@jawa.chilli.net.au> Message-ID: <9712201514.AA02987@lute.apsdc.ksp.fujixerox.co.jp> Rick Jelliffe writes: >When is each appropriate? I think the idea is to use text/xml in the >normal case, and application/xml as a fallback I believe that this is the idea of the XML WG and also the idea of W3C. However, it is still not cleary presented in the XML PR. Rick Jelliffe writes: >I think I first suggested it, but it certainly was not my preferred >option: I would prefer everything to be application/xml, because I do >not like the idea of dumb HTTP/MIME systems fiddling and transcoding data, >which they may do for text/xml. Application/xml is a binary transmission; >no bits are molested en route. You might want to try this once again in the XML SIG. If everybody agrees on this, I am more than happy to agree. But I do not want to have both text/xml and application/xml for HTTP, as this is likely to confuse people. Is it possible to persuade people *not* to use text/xml? Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sat Dec 20 16:20:50 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:38 2004 Subject: Plug and Play XML In-Reply-To: <9712201514.AA02987@lute.apsdc.ksp.fujixerox.co.jp> References: <199712201423.BAA15123@jawa.chilli.net.au> Message-ID: <3.0.1.16.19971220171515.0a5796a0@pop3.demon.co.uk> At 00:14 21/12/97 +0900, MURATA Makoto wrote: >Rick Jelliffe writes: >>When is each appropriate? I think the idea is to use text/xml in the >>normal case, and application/xml as a fallback > >I believe that this is the idea of the XML WG and also the idea of W3C. >However, it is still not cleary presented in the XML PR. > >Rick Jelliffe writes: >>I think I first suggested it, but it certainly was not my preferred >>option: I would prefer everything to be application/xml, because I do >>not like the idea of dumb HTTP/MIME systems fiddling and transcoding data, >>which they may do for text/xml. Application/xml is a binary transmission; >>no bits are molested en route. > >You might want to try this once again in the XML SIG. If everybody agrees >on this, I am more than happy to agree. But I do not want to have >both text/xml and application/xml for HTTP, as this is likely to confuse >people. Is it possible to persuade people *not* to use text/xml? There are two conflicting messages here, and I think it's critical that this is addressed *quickly* :-). Otherwise a large number of servers will have been set up where people guess the type (probably as text/xml), and the chance of uniformity will have been missed. Personally I am neutral, although given the effort that has gone into i18n, the thought of anything tweaking the bits en route sounds horrid. The application has enough to do without mending documents that have been tweaked for humans to read. I would hope that there is only one MIME type for XML as it will be impossible for most people to work out the difference. Two will simply confuse people and they (the types) will simply serve as synonyms. From what Rick says, application seems more logical, but I imagine there are lots of text/sgml documents out there already and people will go by analogy. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sat Dec 20 18:32:28 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:38 2004 Subject: text/xml vs. application/xml In-Reply-To: <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp> References: <199712191957.OAA04476@unready.microstar.com> <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp> Message-ID: <199712201829.NAA00608@unready.microstar.com> MURATA Makoto writes: > > http://www.microstar.com/XML/donne.xml > > http://home.sprynet.com/sprynet/dmeggins/texts/darkness/darkness.xml > As a co-editor of an (upcoming) RFC for text/xml and > application/xml, I think that I should point out the correct > procedure for encoding determination. (I have not checked these > two Web sites, and ?lfred.) Thank you very much for the information. Currently, both of these web servers return "application/octet-stream" as the MIME type for *.xml and *.dtd files: in this case, is it correct for an XML parser to fall back on other character-encoding detection techniques, as ?lfred does? > For those XML documents transmitted by the HTTP protocol, XML parsers > should use the charset parameter of the media type text/xml (BTW, > the default of this parameter is 8859-1). XML parsers should ignore > the encoding declaration within XML documents transmitted by HTTP. > More about this, see the XML PR and the HTTP/1.1 I have two important queries: 1) Are you certain that ignoring the encoding declaration is conforming behaviour? It seems to me that it would make more sense to report an error if the charset parameter and the encoding declaration differ (especially since the PR requires any document without a BOM or encoding declaration to be in UTF-8). 2) Why pick a default encoding that conforming XML parsers are not required to support? ?lfred does accept encoding="ISO-8859-1", but some other parsers do not. It seems to me that either the RFC or the PR needs to be amended. I can also anticipate a different problem: few private people (as opposed to companies or organisations) have any control at all over what their HTTP servers send out. Imagine an exchange student at a big American University, who wants to publish a UTF-8 or UCS-2 Arabic XML text in her personal web space. She will have a very hard time even finding out who is in charge of the university's HTTP server (if she knows what an HTTP server is), and she will probably have graduated before the university's administration has gotten around to approving letting the web-master look into reporting the correct encoding for her document. In the end, it looks like application/xml is a _much_ better choice than text/xml -- with ?lfred, I have found that I can do a very good job autodetecting character encoding, and I imagine that other parser writers will find the same. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jeremy at allaire.com Sat Dec 20 20:32:08 1997 From: jeremy at allaire.com (Jeremy Allaire) Date: Mon Jun 7 16:59:38 2004 Subject: XML as a programming tool Message-ID: <01bd0cd4$6ddc5c00$LocalHost@jeremyhp> >I guess it's programming like 'programming' a VCR - someone else has written >the program, I just feed it the data that controls its behavior. Still, even >that limited prospect was exciting, and capable of some pretty complex stuff. I've put a fair amount of thinking into the problem (opportunity) of XML and Web devices. For experimental purposes, I began work with wrapping an X.10 device automation interface (X.10 is a late 70s standard for very simple device automation over AC wiring) with a tag wrapper. The proof of concept actually worked. You can check out the custom tag which enables this at the following site; search for "X10": http://www.allaire.com/TagGallery/ X.10 already has a concept of loading device activity profiles (essentially schedules, in fact very close to CDF in terms of the kind of data required). A next-generation "over the wire protocol" -- CEBus -- promises to enable much richer forms of device automation and profiling, and I would surmise that XML will be a pretty important enabler. I'm betting on it. Jeremy Allaire xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Dec 21 00:53:13 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:38 2004 Subject: text/xml vs. application/xml In-Reply-To: <199712201829.NAA00608@unready.microstar.com> References: <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp> <199712191957.OAA04476@unready.microstar.com> <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp> Message-ID: <3.0.1.16.19971221014239.0a57e018@pop3.demon.co.uk> At 13:29 20/12/97 -0500, David Megginson wrote: [...] >I can also anticipate a different problem: few private people (as >opposed to companies or organisations) have any control at all over >what their HTTP servers send out. I am extremely sympathetic to this. XML will revolutionise the 'publishing process' by providing direct author2reader communications (and much else). It seems to me essential that authors are allowed to say what their documents are, and XML gives them this opportunity, whilst - as David says - with MIME they do not have complete freedom. [I have suffered the same problem - people mailing me and asking 'can I change the MIME type of my files?'; answer 'sorry'.] BTW I have now hacked AElfred beta 4 under JUMBO, and it seems to work fine. I can extract all the DTD information I want and render it as a tree, as well as the conventional data. If - as David suggests - the current AElfred API is close to the planned convergence, then fine. It *did* take me longer than 15 mins - but I wasn't at my brightest :-). I'd still like to see the #IMPLIED problem clearly agreed. AElfred AIUI outputs null as the value for a non-existent attribute whether the attribute is declared or not P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From rwaldin at pacbell.net Sun Dec 21 03:22:53 1997 From: rwaldin at pacbell.net (Ray Waldin) Date: Mon Jun 7 16:59:38 2004 Subject: element content vs. element attribute References: <01bd0cd4$6ddc5c00$LocalHost@jeremyhp> Message-ID: <349C8DF2.EC73820F@pacbell.net> Hi everyone, I've used XML twice now and in both cases I've ended up with nothing to markup except more tags and whitespace :). These languages are used to communicate relationships between external resources, not marked up text. The intent was to describe these relationships in a flexible but well defined format and XML offered a simple (and soon to be standard!) way of doing this. So far, so good. I've seen other examples of this type of "pure tag language" and noticed that some of them seem to force content into tags for no reason. My question is, given the "nothing to markup" scenario, which is more appropriate?, when is each more appropriate?, and why?: 1234 or In other words, when should data be contained by elements? Or conversely, when should data be an attribute of an element instead of contained by that element? I prefer the latter method, given an attributes ability to store CDATA without CDATA section delimiters. OSD and CDF use the former method for: Solitaire and I'm not sure why as: could serve the same purpose and is more inline with the rest of the language. Any general guidelines? Thanks! -Ray xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From liamquin at interlog.com Sun Dec 21 09:09:47 1997 From: liamquin at interlog.com (Liam Quin) Date: Mon Jun 7 16:59:38 2004 Subject: element content vs. element attribute In-Reply-To: <349C8DF2.EC73820F@pacbell.net> Message-ID: <Pine.BSI.3.95.971221021753.1377D-100000@shell1.interlog.com> On Sat, 20 Dec 1997, Ray Waldin asked: > when should data be contained by elements? Or conversely, when should > data be an attribute of an element instead of contained by that element? There are a number of issues that may help here, depending on how the information is going to be used... Some pragmatics first: * it's often easiest for people writing ad-hoc parsers if you only use elements; there's only one syntax to handle * if you will ever need to have more complex structured values with markup in them, they will need to be in element content, because XML (like SGML) has a restriction that you can't put element markup inside attributes * if you want the information to be displayed in XML or HTML or SGML browsers most or all of the time, use content, as the style sheets are generally less flexible with attributes. * it's relatively easy to strip out all attribute values and make a pared-down instance, if that's useful * attributes are good for things like interpretations of a text by someone transcribing it, not part of actual content A philosophical view: * attributes may be used for annotating the element tree; in other words, you could use them to store element properties. for example, <boiler MinTemperature="7" MaxTemperature="320"> steam water gunk </boiler> Unfortuantely, a practical example would add units and tolerance to the temperature, and then you need to use elements or a non-XML sub- structure: <boiler MinTemperature="7 {units K} {tolerance {plus 3} {minus 2}"....> This is generally unsatisfactory because it's not using XML; so <boiler> <MinTemperature <unit ref="SIUnits#Kelvin" abbr="K">Kelvin</unit> <value plus="3" minus="2">7</value> </MinTemperature> </boiler> Clearly you could take those items i have left as attributes and turn them into elements, and in fact any element E with attribute list A and content model C can be converted into an element E' with content model E.atts(A), E.content(E) e.g. <!Element Boiler-prime (Boiler-prime-attributes, Boiler-prime-content) > <!Element Boiler-prime-attributes ( MinTemperature-prime, MaxTemperature-prime ) > <!Element Boiler-prime-content (steam-prime, water-prime, gunk-prime) > It is therefore possible to think of attributes as syntactic sugar for a very restricted kind of content model. Unfortunately, this is not quite correct, because XML attributes support a set of constraints on their content which is entirely different to that supported for elements. If you only ever use CDATA, ID and name group attributes, retain ID attributes as attributes, and convert name group token lists to corresponding empty elements, the conversion still applies. In theory, then, attributes are a useful but limited shrthand in most cases, but, essential for IDref and other cases that are not supported in element content. In practice, they can be used to make an instance more readable, or to reduce file size, or to distinguish between different sorts of information. Hope this helps. Lee (tired at 4 am!) -- Liam Quin -- the barefoot typographer -- Toronto lq-text: freely available Unix text retrieval IRC: Learn about XML/SGML/XSL/XLL/DSSSL on irc.dragonnet.org in #xml email address: l i a m q u i n, at host: i n t e r l o g dot c o m xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Dec 21 10:08:21 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:38 2004 Subject: element content vs. element attribute In-Reply-To: <Pine.BSI.3.95.971221021753.1377D-100000@shell1.interlog.co m> References: <349C8DF2.EC73820F@pacbell.net> Message-ID: <3.0.1.16.19971221110404.0baf108e@pop3.demon.co.uk> At 04:09 21/12/97 -0500, Liam Quin wrote: >On Sat, 20 Dec 1997, Ray Waldin asked: [... a very common and important question of style ...] >> when should data be contained by elements? Or conversely, when should >> data be an attribute of an element instead of contained by that element? > >There are a number of issues that may help here, depending on how the >information is going to be used... > >Some pragmatics first: > [...] I have run into exactly this problem with Technical Markup Language. I wanted to design it with as few ELEMENTs as possible and have evolved this to <XVAR TITLE="BolierTemperature" UNITS="Celsius" FUZZY="Range" ...>120-125</XVAR> where there are a number of attributes that qualify the value. For reasons Liam has outlined, and some others (see below) I have come to the conclusion that ELEMENTs are easier to work with than attributes. So, to Liam's criteria I'll add: * X*L tools formally require more support to be given to ELEMENTs than attributes. For example, if I have a unit of length (metre), but don't know whether it occurs as kilometre or centimetre [1], I can search in content with standard XML syntax: DESCENDANT(ALL,UNITS)STRING(1,"metre",0) whereas I have no way of searching in attribute values unless I write my own software. * When you have to write significant amounts of code to process an attribute it may be work reworking it as an ELEMENT. JUMBO includes a lot of code for automatic conversion between UNITS and so it makes sense to make this an ELEMENT, because much of that processing can then be done automatically. Put another way, at present JUMBO has to know which ELEMENTs might have UNITS attributes and call special code. If UNITS is contained, the processing is requested just like any other ELEMENT. > Unfortuantely, a practical example would add units and tolerance to > the temperature, and then you need to use elements or a non-XML sub- > structure: Yes - and I ran into trouble here. So the example I gave is horrid, and I am reworking this using <MINVALUE> and things like this. I would much rather have a proliferation of ELEMENTs than attributes. [Part of my worry about multiplying ELEMENTs was that the content models can get very complex. Since much of my XML will not be validatable, that's less of a problem now.] > <boiler MinTemperature="7 {units K} {tolerance {plus 3} {minus 2}"....> > This is generally unsatisfactory because it's not using XML; so Liam is an illicit helium distiller, I see. :-) P. [1] Some countries use the variant "meter" so you will have to do two searches. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From gfrer at luna.nl Sun Dec 21 11:34:02 1997 From: gfrer at luna.nl (Gerard Freriks) Date: Mon Jun 7 16:59:38 2004 Subject: element content vs. element attribute Message-ID: <v04002803b0c28c14aad3@[194.151.26.78]> > My question is, >given the "nothing to markup" scenario, which is more appropriate?, when >is each >more appropriate?, and why?: > ><blah>1234</blah> > >or > ><blah value="1234" /> > >Any general guidelines? > In my view: XML (or any other Tag-language) will be used to express: - Datamodels of a part of the universe, which handle the relationships between entities with the Model. It defines the Context of the information. - with Terminology (a set of Tags) which give names to the entities - with Rules to obey. It will be likely that the above examples will be Tagged like: <blah> <number> 1234 </number> </blah> if rules of the model allow it allow it. or <blah> <ASCII> 1234 </ASCII> <blah> <blah> will be an entity from a Model indicating a context <number> will be an atribute indicating how things are coded Attributes will be derived from other Models. 'Nothing to markup' is nothing. It equals chaos. Greetings Gerard Freriks Gerard Freriks,huisarts, MD C. Sterrenburgstr 54 3151JG Hoek van Holland the Netherlands Telephone: (+31) (0)174-384296/ Fax: -386249 Mobile : (+31) (0)6-54792800 ARS LONGA, VITA BREVIS xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sun Dec 21 11:42:31 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:39 2004 Subject: Undeclared attributes in �lfred In-Reply-To: <3.0.1.16.19971221014239.0a57e018@pop3.demon.co.uk> References: <9712201210.AA02982@lute.apsdc.ksp.fujixerox.co.jp> <199712191957.OAA04476@unready.microstar.com> <199712201829.NAA00608@unready.microstar.com> <3.0.1.16.19971221014239.0a57e018@pop3.demon.co.uk> Message-ID: <199712210153.UAA01766@unready.microstar.com> Peter Murray-Rust writes: > BTW I have now hacked AElfred beta 4 under JUMBO, and it seems to > work fine. I can extract all the DTD information I want and render > it as a tree, as well as the conventional data. If - as David > suggests - the current AElfred API is close to the planned > convergence, then fine. It *did* take me longer than 15 mins - but > I wasn't at my brightest :-). I'd still like to see the #IMPLIED > problem clearly agreed. AElfred AIUI outputs null as the value for > a non-existent attribute whether the attribute is declared or not Thank you for taking the time to try out the new release. For documents without DTDs, this is probably the only option (any attribute is potentially an #IMPLIED attribute). For documents with DTDs, I could create a query method like boolean isDeclaredAttribute (String elname, String aname) but ?lfred has already grown too large (it's over 25K), so I would need evidence of a pressing need. In the meantime, you can use the query method Enumeration declaredAttributes (String elname, String aname) to build a hashtable, and then look in the hashtable whenever you need to know whether an attribute is #IMPLIED or simply undeclared. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sun Dec 21 11:58:20 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:39 2004 Subject: element content vs. element attribute In-Reply-To: <349C8DF2.EC73820F@pacbell.net> References: <01bd0cd4$6ddc5c00$LocalHost@jeremyhp> <349C8DF2.EC73820F@pacbell.net> Message-ID: <199712211155.GAA00316@unready.microstar.com> Ray Waldin writes: > In other words, when should data be contained by elements? Or > conversely, when should data be an attribute of an element instead > of contained by that element? Here's a good, general distinction: * use elements for structurally-significant information; and * use attributes for meta-data. One problem, that will become more obvious when more XML tools are available, is that most WYSIWYMG (M="might" or "may") XML editing software will like show character data (element content) on the screen by default, but will show attributes only on request, possible in a pop-up dialog. It makes sense to have the most important information (the real content) inside elements, then, and to have the meta-data out of the way in attributes. (Peter: how do you display attributes in Jumbo?) Of course, what is and isn't meta-data will vary depending on the document type, but here are some common examples: * a unique identifier * a security level * a revision or release level * rendition information (yech) * configuration information * the preferred unit of measurement All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Sun Dec 21 13:55:12 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:39 2004 Subject: element content vs. element attribute Message-ID: <199712211355.NAA08564@mail.iol.ie> [David Megginson] > >One problem, that will become more obvious when more XML tools are >available, is that most WYSIWYMG (M="might" or "may") XML editing >software will like show character data (element content) on the screen >by default, but will show attributes only on request, possible in a >pop-up dialog. I suspect this comment is right on the money. I have never come across a way of displaying attribute data in an SGML editor that "felt" right. For one job I was involved in, attribute editing was such a pain that we wrote "ConvertAttrbutesToElements" and "ConvertElementsBackToAttriibutes" transformations:- SGML doc -> [ConvertAttributesToElements] -> SGML doc -> [Editing Environment] -> SGML doc -> [ConvertElementsBackToAttributes] -> SGML doc. The fact that this is doable in a lossless fashion suggests that the attribute/element decision is largely a product of taste and a pragmatic consideration of the tools you intend to use. As for the philosophical difference, I dunno. I suspect that a Bertrand Russel or a Kurt Godel or a Daniel Dennet or a Douglas Hofstadter could always rustle up a counter-example for any hypothesis. My head hurts and I am heading at full speed past the point where I know what I am talking about but if we take the data versus meta-data distinction -- Is "SayingSomethingAboutThisData" data or meta-data in this case:- <Foo SayingSomethingAboutThisData = "FALSE"> Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tadmc at metronet.com Sun Dec 21 15:48:48 1997 From: tadmc at metronet.com (Tad McClellan) Date: Mon Jun 7 16:59:39 2004 Subject: element content vs. element attribute In-Reply-To: <349C8DF2.EC73820F@pacbell.net> from "Ray Waldin" at Dec 20, 97 07:33:06 pm Message-ID: <199712211435.IAA00768@magna.flash.net> A non-text attachment was scrubbed... Name: not available Type: text Size: 1668 bytes Desc: not available Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971221/d2dcafde/attachment.bat From peter at ursus.demon.co.uk Sun Dec 21 16:11:58 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:39 2004 Subject: element content vs. element attribute In-Reply-To: <199712211155.GAA00316@unready.microstar.com> References: <349C8DF2.EC73820F@pacbell.net> <01bd0cd4$6ddc5c00$LocalHost@jeremyhp> <349C8DF2.EC73820F@pacbell.net> Message-ID: <3.0.1.16.19971221170925.2bf7468e@pop3.demon.co.uk> At 06:55 21/12/97 -0500, David Megginson wrote: >Ray Waldin writes: > >(Peter: how do you display attributes in Jumbo?) JUMBO uses 4 sorts of display: - event stream ("text with embedded tags") - tree - X*L-predicated - specialist (downloadable classes or user-applied) Event Stream This is essentially to be rendered as text. JUMBO has the following ways of dealing with tags: - recognise them as HTML and produce HTML-compliant rendering. It's not pretty, and I've only done HTML 2.0 [I didn't set out to produce a browser, remember :-). However it's necessary to have one, because people will start "embedding XML in HTML" so we have to have a renderer. The most likely first task is to render XML-LINKs - present them as text with tags. Because Java does not have a nice way of embedding buttons in text, I either have to use paint(), which is very slow and which has no defined textual semantics (e.g. Ctl-X) OR use TextArea, where the tags are simple transliterations of the input and have no clickability (because TextArea 1.02 has no clickability that *I* can see). In a better situation I would create pretty buttons for the tags and paint them nice colours according to whether they have attributes Note that the second model is editable, and is XML-sensitive (e.g. there are options like "JumpBalancedtags") Tree The Nodes have a variety of buttons in paint(). One button is "At" in cyan. Clicking it reveals a box with attributes in. This can be edited, and the editor is DTD driven. It deals with #IMPLIED, REQUIRED, #FIXED, etc. It does not deal with NOTATION because I don't understand it. It will deal with XML-LINK when I have written a drag and drop top add the internal links (isn't Java boring...) XML-driven. JUMBO makes a best guess as to what the drafters of the spec expect for things like xml:link SHOW="EMBED". JUMBO has asked about this a number of times and will try no to do anything to unexpected. JUMBO has also asked about xml:space="DEFAULT", but has no default at present There are already quite a few hardcode attributes in X*L and all require specialist code to be written. Specialised This requires bespoke code to be written, e.g. <ARRAY CONTENT="MATRIX" STRUCT="UpperTriangular" ROWS=3>1 2 3 4 5 6</ARRAY> has (I think) nothing displayed in the lower half. > >Of course, what is and isn't meta-data will vary depending on the >document type, but here are some common examples: > >* the preferred unit of measurement I took this view initially but (see recent posting) have changed my mind because UNITS are complex objects. As everyone agrees, the distinction is subjective BUT will be influenced by the tools that we create on this list and elsewhere. personally I am against having structure in attributes if it can be avoided because it requires additional code to be written. I have been so impressed with the economy of doing everything in XML, that I would hate to see more 'mini-languages' inside attributes. P. BTW I am working hard on a new snapshot of JUMBO. It will be the last 1.02 version, I think. There are quite a lot of new goodies, and I will try to create the distribution in smaller packets as I know it has been difficult to download. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Sun Dec 21 17:06:50 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:39 2004 Subject: element content vs. element attribute References: <199712211435.IAA00768@magna.flash.net> Message-ID: <349D4570.7188F25B@technologist.com> Tad McClellan wrote: > [ I hope I don't mispeak here. I haven't yet gotten my arms around all > the differences between XML and SGML. (That's why I am lurking here ;-) > > Someone please correct me if I have it wrong in an XML context > ] No, you are absolutely right. The original poster was a little confused about CDATA. I meant to point that out but forgot. Thanks for doing so. -- Paul Prescod -- http://itrc.uwaterloo.ca/~papresco Art is always at peril in universities, where there are so many people, young and old, who love art less than argument, and dote upon a text that provides the nutritious pemmican on which scholars love to chew. -- Robertson Davies in "The Cunning Man" xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Sun Dec 21 17:07:15 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:39 2004 Subject: element content vs. element attribute References: <01bd0cd4$6ddc5c00$LocalHost@jeremyhp> <349C8DF2.EC73820F@pacbell.net> Message-ID: <349D4C8A.CE7B9CD5@technologist.com> Probably the best forum for DTD questions is comp.text.sgml. After all, XML DTDs are SGML DTDs and people there have been making them for more than a decade. In fact, this very topic was covered recently. Use dejanews and look for the thread (mis!)named "Entities vs. Attributes" from around 1997/06/16 in the comp.text.sgml archive. Many of the points raised here are the same as there. This is a recurring question and perhaps deserves a section on the special topics page [1] of the SGML Web Page, maybe as part of a DTD design section. if its patron saint is willing. Here is what I am thinking of: <H3>On DTD Design</H3> <P>There are many heuristics for and opinions on proper DTD design. <UL> <LI>A recurring question is when to use attributes or sub-elements. This was discussed <A HREF="elements-attributes.html">in comp.text.sgml</A> and <A HREF="">in XML-DEV</A>. <LI>What constitutes an elegant DTD? In the summer of 1997, DTD designers discussed this in the comp.text.sgml thread titled "The Aesthetics of Document Type Design." </UL> Paul Prescod [1]http://www.sil.org/sgml/topics.html xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Dec 21 22:38:33 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:39 2004 Subject: LISTRIVIA (SPAM) Message-ID: <3.0.1.16.19971221232717.21ef07c4@pop3.demon.co.uk> The members of this list have probably received a SPAM today. This was not posted to the list, but the spammer appears to have got a list of the subscribers' addresses. I shall discuss with Henry whether anything can be done. The same spammer obtained another set of addresses again apparently from chemime@ic.ac.uk. Please do not waste space discussing this on the list :-) If you have any special knowledge, please mail me or Henry. I doubt there is much to be done, but it would appear that membership of this list cannot be regarded as confidential. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From SimonStL at classic.msn.com Sun Dec 21 22:45:53 1997 From: SimonStL at classic.msn.com (Simon St.Laurent) Date: Mon Jun 7 16:59:39 2004 Subject: spam - chain letter Message-ID: <UPMAIL17.199712212244090691@classic.msn.com> I just got a very stupid chain letter in my mailbox from someone at Davidofvf@aol.com. They left all the addresses visible, and several of them connect back to the XML-DEV list. I'm hoping that this jerk just yanked the email addresses off the hypermail archive, but this is really obnoxious. If they're on the list, I hope that none of their documents ever parse. Simon St.Laurent Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jharmon at telecnnct.com Mon Dec 22 00:52:12 1997 From: jharmon at telecnnct.com (Jim Harmon) Date: Mon Jun 7 16:59:39 2004 Subject: spam - chain letter References: <UPMAIL17.199712212244090691@classic.msn.com> Message-ID: <349DB429.59E2B600@telecnnct.com> Simon St.Laurent wrote: I just recieved the same message. I suggest everyone who recieved a copy reply directly to the sender with a copy to postmaster@aol.com and abuse@aol.com, mentioning that this is a Federal Crime. The electronic version of "Chain Letters". DO NOT forward the message to ANYONE esle. (You'ld think everyone would have heard about this cr*p by now. These things go around about every 6 months.) > I just got a very stupid chain letter in my mailbox from someone at > Davidofvf@aol.com. They left all the addresses visible, and several of them > connect back to the XML-DEV list. I'm hoping that this jerk just yanked the > email addresses off the hypermail archive, but this is really obnoxious. > > If they're on the list, I hope that none of their documents ever parse. > > Simon St.Laurent > Dynamic HTML: A Primer / XML: A Primer (January) / Cookies (February) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) -- Jim Harmon The Telephone Connection jim@telecnnct.com Rockville, Maryland xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From murata at apsdc.ksp.fujixerox.co.jp Mon Dec 22 01:32:51 1997 From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto) Date: Mon Jun 7 16:59:39 2004 Subject: text/xml vs. application/xml In-Reply-To: <199712201829.NAA00608@unready.microstar.com> Message-ID: <9712220132.AA02991@lute.apsdc.ksp.fujixerox.co.jp> David Megginson writes: > >I have two important queries: > >1) Are you certain that ignoring the encoding declaration is > conforming behaviour? Yes, I am certain that ignoring the encoding declaration for text/xml is conforming behaviour. This is to allow transcoding. >It seems to me that it would make more sense > to report an error if the charset parameter and the encoding > declaration differ (especially since the PR requires any document > without a BOM or encoding declaration to be in UTF-8). HTTP 1.1 (http://www.w3.org/Protocols/HTTP/1.1/draft-ietf-http-v11-spec-rev-01.txt) The "charset" parameter is used with some media types to define the character set (section 3.4) of the data. When no explicit charset parameter is provided by the sender, media subtypes of the "text" type are defined to have a default charset value of "ISO-8859-1" when received via HTTP. Data in character sets other than "ISO-8859-1" or its subsets MUST be labeled with an appropriate charset value. See section 19.8.2 for compatibility problems. >2) Why pick a default encoding that conforming XML parsers are not > required to support? Alfred does accept encoding="ISO-8859-1", but > some other parsers do not. It seems to me that either the RFC or > the PR needs to be amended. HTTP people stick to the default 8859-1 in spite of a *lot* of effort from W3C. On the other hand, IETF (RFC2130) recommends UTF-8 as a default. >I can also anticipate a different problem: few private people (as >opposed to companies or organisations) have any control at all over >what their HTTP servers send out. I am sympathetic to this. Rick Jelliffe proposed that only application/xml should be used in the XML SIG. I will follow the consensus in the XML SIG or WG. Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Dec 22 02:03:47 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:39 2004 Subject: text/xml vs. application/xml In-Reply-To: <9712220132.AA02991@lute.apsdc.ksp.fujixerox.co.jp> References: <199712201829.NAA00608@unready.microstar.com> <9712220132.AA02991@lute.apsdc.ksp.fujixerox.co.jp> Message-ID: <199712220200.VAA00527@unready.microstar.com> MURATA Makoto writes: > >1) Are you certain that ignoring the encoding declaration is > > conforming behaviour? > > Yes, I am certain that ignoring the encoding declaration for text/xml > is conforming behaviour. This is to allow transcoding. Thank you again for your posting and for your work on the MIME types. I do not have access to any clarifications that may have been posted in the SIG, so I necessarily rely only on the text of the PR. The following appears in the (normative) section 4.3.3, "Character Encoding in Entities": It is an error for an entity including an encoding declaration to be presented to the XML processor in an encoding other than that named in the declaration, or for an encoding declaration to occur other than at the beginning of an external entity. On the other hand, the following appears in appendix F, "Autodetection of Character Encodings (Non-Normative)": The second possible case occurs when the XML entity is accompanied by encoding information, as in some file systems and some network protocols. When multiple sources of information are available, their relative priority and the preferred method of handling conflict should be specified as part of the higher-level protocol used to deliver XML. Rules for the relative priority of the internal label and the MIME-type label in an external header, for example, should be part of the RFC document defining the text/xml and application/xml MIME types. If "internal label" means the encoding declaration, then this note supports your statement; unfortunately, the note is non-normative, while the excerpt that I quoted first is normative, so the first must take precedence (unless I've missed something elsewhere in the PR). If the paragraph in the non-normative appendix expresses the WG's true intention, then the PR will need to be revised to support it. I think, however, that it would be unfortunate if the charset parameter were used. Consider, for example, the following document, encoded in ASCII (despite the incorrect claim in the encoding declaration): <?xml version="1.0" encoding="ISO-10646-UCS-2"?> <doc>This is a sample XML document.</doc> Let's say, now, that I place this document in a directory that is accessible through both HTTP and anonymous FTP, and also put a copy on my local machine. Here's what will happen: 1) java EventDemo http://www.myhost.org/texts/sample.xml ==> receives charset="ISO-8859-1" as the default, ignores the encoding declaration, produces correct output (accidentally), and reports no error. 2) java EventDemo ftp://ftp.myhost.org/pub/texts/sample.xml ==> reads the encoding declaration, realises that the document is _not_ in UCS-2, and reports an error (or worse, puts out garbage without reporting an error). 3) java EventDemo sample.xml ==> same as (2). It is counter-intuitive that well-formedness depends on the transmission protocol. > Rick Jelliffe proposed that only application/xml should be used in the > XML SIG. I will follow the consensus in the XML SIG or WG. Please feel free to repost this message to the SIG, if you think that it will be helpful there. I strongly support Rick's suggestion for application/xml, partly because it will avoid the requirement to make several last-minute changes to the PR, and partly because it will save XML from being trapped by some of the same constraints as HTML. If typical (private) users cannot post XML documents in their web space in languages other than English, then the whole effort will be at least a partial failure. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Mon Dec 22 03:36:24 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:39 2004 Subject: text/xml vs. application/xml Message-ID: <199712220338.OAA18436@jawa.chilli.net.au> > From: David Megginson <ak117@freenet.carleton.ca> > I strongly support Rick's suggestion for application/xml, partly > because it will avoid the requirement to make several last-minute > changes to the PR, and partly because it will save XML from being > trapped by some of the same constraints as HTML. If typical (private) > users cannot post XML documents in their web space in languages other > than English, then the whole effort will be at least a partial > failure. No, XML the way it is will be a partial success, and a complete success for most scripts and most systems :-) And people will avoid doing things that break, so eventually mistakes will be learned from. This is not as bad a thing as might be expected: look how much concensus there is on most of XML--many people do not trust experts when the experts come from a slightly different domain (e.g. SGML's peoples expertise in electronic publishing was previously sometimes regarded as being off-topic to web-publishing, whereas in fact there is a deal of overlap). This is only natural, and the way humans are. As a methodology, it is a way to get the simplest possible system: start with the easiest, see where it breaks, fix it. This is like a child playing with a knife: a parent may think themselves wise to allow a child to play with the knife to discover how sharp it is, but that parent is not being very far-sighted. So we shouldnt despair if people are not ready to accept what I (and I think what Gavin Nicol and most people who have been trying to come up with useful solutions agree on this) am saying: that is that * all document must be adequately labelled with "prime metadata" (i.e., all the information needed to process the information without inexact heuristics), and * all this prime metadata must be kept with the document at all stages of its transmission, in whatever form. This is why text/xml falls down (broken as designed, as someone might say) if any intermediate transcoders do not rewrite the MIME headers correctly. Point-to-point protocols allow makers of intermediate WWW systems to fiddle with the prime metadata in unpleasant ways. However, there is no guarantee that generators of XML will get the encoding correct in the firstplace (in which case a guess by the server based on locale may give better results anyway.) Nevertheless, I think end-to-end service is far better in this regard, especially if we need to transmit database material. For example, if a database record is sent <person>Pavel Ha*ek</person> where the * character is the c hacek ligature in 8859-2 (&#*c8;) but blindly re-labelled as being 8859-1. In that case the * is E grave, which will be quite wrong. If there is a chance of intermediate systems around the world relabelling MIME character sets inappropriately then that is a big problem for text/xml. Note that in the above example, a transcoder would also stuff up the data, unless it was smart enough to know that the file was XML, and so put in the correct numeric character reference. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Mon Dec 22 08:48:02 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:39 2004 Subject: Order of Declarations in DTD subsets Message-ID: <3.0.1.16.19971222091430.52cf4bdc@pop3.demon.co.uk> Is it required that a declaration in the subset come before it is used. [I have tried to find this in the spec and not been able to - please forgive me if it's there]. An example: ELEMENTs must be declared for ATTLIST in validatable documents. So is the following legal in this case: <!ATTLIST FOO BAR CDATA #REQUIRED> <!ELEMENT FOO ANY> In a WF document this might have a different effect - the ATTLIST declaration is WF and implicitly creates an ELEMENT declaration. When this ELEMENT is declared, is it then an error? Of course its normal and natural to predeclare things before their use but it's not necessary in some languages. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jjc at jclark.com Mon Dec 22 09:33:18 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:59:39 2004 Subject: Order of Declarations in DTD subsets References: <3.0.1.16.19971222091430.52cf4bdc@pop3.demon.co.uk> Message-ID: <349E32C6.BF754E1B@jclark.com> Peter Murray-Rust wrote: > > Is it required that a declaration in the subset come before it is used? In general no. The exceptions are that parameter entities must be declared before they are referenced and general entities must be declared before they are referenced within default attribute values (this is specified under Well-Formedness Constraint - Entity Declared). In particular it is *not* required: - to declare an element type before it is used in content model - to declare an element type before it is used in an attlist declaration - to declare a notation before its name is used as the default value for a NOTATION attribute - to declare a notation before it is used as the notation for an unparsed entity - to declare an unparsed entity before its name is used in the value for an ENTITY or ENTITIES attribute - to declare a general entity before it occurs in an EntityValue James xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at light.demon.co.uk Mon Dec 22 09:44:22 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:59:39 2004 Subject: Order of Declarations in DTD subsets In-Reply-To: <3.0.1.16.19971222091430.52cf4bdc@pop3.demon.co.uk> Message-ID: <NixfEDAhSjn0Ewib@light.demon.co.uk> In message <3.0.1.16.19971222091430.52cf4bdc@pop3.demon.co.uk>, Peter Murray-Rust <peter@ursus.demon.co.uk> writes >Is it required that a declaration in the subset come before it is used. [I >have tried to find this in the spec and not been able to - please forgive >me if it's there]. > >An example: > >ELEMENTs must be declared for ATTLIST in validatable documents. So is the >following legal in this case: > ><!ATTLIST FOO BAR CDATA #REQUIRED> ><!ELEMENT FOO ANY> No, that's fine. As I understand it, the general philosophy is that the XML processor 'pauses for breath' after reading whichever DTD subsets it is asked to read (internal only, or both). It then sees what elements and attribute lists have been declared. (It is only at this point that it can 'do the right thing' as regards (a) reporting multiple element declarations and (b) merging, and possibly reporting on, multiple attribute list declarations.) >In a WF document this might have a different effect - the ATTLIST >declaration is WF and implicitly creates an ELEMENT declaration. When this >ELEMENT is declared, is it then an error? No, I don't think it's helpful to see the ATTLIST declaration as implicitly creating an ELEMENT declaration. This isn't an error. Richard. Richard Light SGML/XML and Museum Information Consultancy richard@light.demon.co.uk xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Mon Dec 22 11:44:54 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:39 2004 Subject: Order of Declarations in DTD subsets In-Reply-To: <NixfEDAhSjn0Ewib@light.demon.co.uk> References: <3.0.1.16.19971222091430.52cf4bdc@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971222123928.553f2366@pop3.demon.co.uk> At 09:36 22/12/97 +0000, Richard Light and James Clark wrote: [... helpful replies deleted ...] I was asking in the context of the emerging Simple API - I don't think this gives us any problems, but I'd like to be sure that I can extract the ELEMENTs before the ATTLISTs, regardless of the order they are actually declared in. I also note that the order of declarations of ELEMENTs, and attributes within ATTLISTs is undefined. This is an area where different implementers of the API might return different orders, so that it is marginally more difficult to check if 2 DTDs are "equivalent". Is it possible or desirable to suggest a canonical order for the output of either through the API? [Alternatively, can Java check for the equality of 2 Enumerations (other than by the user explicitly building a Hashtable)?] P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From walter.kriha at systor.com Mon Dec 22 15:05:13 1997 From: walter.kriha at systor.com (walter.kriha@systor.com) Date: Mon Jun 7 16:59:39 2004 Subject: XML as a programming tool Message-ID: <41256575.00433DFD.00@druide> Hi, I'd like to give some examples for the use of SGML/XML in software development (sorry, I never did any publishing with SGML/XML and used it for software development only). Force: flexible and adaptive software needs meta-information: This kind of software tends to remove definitions from source code. They are put into meta-layers, repositories or - more likely because of missing software infrastructures - into simple configuration files. These files generate a big mess pretty soon: They are changed and code breaks. The overall structure is more than unclear. Parameter definitions in configuration files are complicated and have to be parsed by every client. example: Token = 15 somevalue 32 anothervalue Team development gets very hard. What IS the authoritative structure and content of configuration files? The first approach is usually to come up with a class that maps ".ini" style configuration files. Still, you would like to have more: tokens in hierarchies, many tokens of the same name. Validation. And you would like to split the information into smaller, separate parts so you can avoid copying them. That means you want entity management. All of this must be programmed by your team - or? Solution: It takes about 2-3 weeks to integrate e.g. the SP parser/entity manager kit into a framework. Most of the work goes into wrapping SP native classes from the parser API into apropriate framework classes and interfaces (This should get much better with a standard parser API). If you got a generic composite object machine built in, just map the parser events into your tree classes (nodes) and you have a representation of the configuration information in memory. The next step is to add some wrappers for convenience, e.g. implementing a tiny query interface (findElementByName() etc. and your clients can avoid hardcoding element lookups and a value class with some conversion functions. During boot your framework pulls in the configuration documents, the parsers validates the content and hands it off to "PartBuilder" instances that instantiate the proper objects and off you go. The entity manager in the SP toolkit even enables you to pull configuration information from some server without the client even noticing. This way you end up with a defined configuration information that still can be highly specialized per customer. Its real power shows when you imaging having hundreds or thousands of installations at customer sites (some configured even there by service teams) and you want to ship a new verson of your software. Can you integrate the existing information during installation? Across releases and possibly extensive modifications? If you used a copy/paste/change approach to create new customer configuration information it's now time to look for a new job... Force: Use dynamic information safely. Static typed languages sometimes force developers to use untyped information to avoid changes in interfaces. Examples are: getValueByName(String name) etc. In effect one is working around the static type system. Solution: Semantic data streams or the composite message pattern are easily implemented using the basic tree model from above. You can transfer whole trees or just parts. The factory that generates these types (they ARE types because there is a DTD for them) makes sure that they are created properly. Due to their self-describing nature the structures can change without breaking existing clients. Applications for this are externalization, serialization, event and object bus systems. Force: Error messages must be language independent and unique. Solution: Describe your message catalogs in SGML/XML. Use the ID mechanism name the programmatic tokens that show up in source code. The parser is going to tell you if somebody used the same token twice. The same applies if you need a poor mans implementation repository with some trader functionality, e.g. to automatically load classes in factories where the client tells you what interface he wants and some hints about the properties the object should have. You could map these properties via introspection directly to beans but every once in a while an indirection is necessary, e.g. if you bought some beans whose properties have to be mapped to your systems language. Force: avoid copying of information in your system. Many systems duplicate a lot of information in various components or layers. Let's say there is a customer type in the analysis model. This usually turns into a customer database table schema, a gui ressource description of a customer view and some representation of customer in the "model" part which is a C++ header or a java class. Most of this information is just a duplicate. Solution: Use SGML/XML description for all these aspects and reference customer information from one place. Write generic modules that read this information at runtime. Force: Share information without coupling objects tightly. Let's say you are doing some workflow. The workflow objects are part of a tree (built from SGML/XML information) and child and parent nodes can communicate with each other, using some fixed interfaces and some dynamic ones(semantic data streams using DOM). But every once in a while some information is created in a node that is useful for some other node that is NOT directly connected to the first node. How can this node get the information without linking both nodes? Solution: Turn some information tree into a blackboard. Create some SGML/XML instance that models the structure of the information you want to share. The elements can be empty (there IS use for markup without content(:-)). Load this tree into memory. Make your nodes also implement an observer type interface. Now clients can do lookups and if nothing is there yet, they can register for change. This has three advantages: - publisher and subscriber are NOT directly coupled and can change any way they want without affecting the other. - There is no need to do sequential processing. The workflow tree will settle into a correct state but the path it takes is undetermined and decoupled from the descriptive workflow logic. (this makes some people with a strong procedural background a bit nervous). - the blackboard is highly structured and not chaotic. Debug routines can print human readable snapshots. Force: process error, trace and debug information automatically. I guess everybody has seen that huge and unstructured mess created by error, trace or debugging messages. In mission critical applications agents are supposed to react on those kinds of messages. Solution: Write error, trace or debug information in SGML/XML. This can be well formed information only. Don't allow anybody to write unstructured information anywhere. They have to go to a factory, get a special type of SGML/XML node and fill it in. Now it is easy for agents to find critical information. To get to the information they let the output go through the parser and use a SGMLApp implementation that does not build a tree but processes the parser events on the fly. (Assuming that in this case the information need not be represented as a tree). Using the same convenience wrappers from above the agents are totally independent of any structural changes in the output stream, caused by different execution order etc.) and will continue working. (I have seens desperate moves to process e.g. Unix kernel and boot messages via handcoded applications...) Force: Translate from one domain language into a different one. I suspect that about 50% of work in business programming goes into format transformations between different COTS or other applications and databases. One can view database schemas, interfaces and protocols as little domain languages. Since SGML/XML information trees have enough descriptive power to represent those, it is possible to build automatic translator sub-frameworks for "data-schlepping". Solution: example: import server. Frequently information in a new format has to be imported into a system (e.g. DTA electronic commerce data, financial instruments data etc.). Storage Objects convert these formats into SGML/XML representation. This makes further processing independent of the different physical data formats of the new format and the existing system. But it does not solve the language problem itself: one format might call the customer "customer" and the other one "BusinessPartner". A translator framework provides wrapper that wrapp the new information tree (e.g. DTA info) into the internal language. Of course the mapping process is driven by mapping information specified in SGML/XML. If more than simple name mapping is necessary, the wrappers can be dynamically configured with little action objects that can compute values etc. Of course these are again configured using SGML/XML configurations. Force: Get information from OO-Analysis into the system In every larger framework the gap between OO-Analysis and implementation is huge. Direct mapping from an analysis class to code just leads to totally inflexible systems. That's why e.g. Enterprise Java Beans treats concurrency, persistence etc. as being "orthogonal" to an objects implementation. This means that the implementation of these do not happen in the object. They are provided by containers etc. The next thing that's going to be pulled out of objects is business logic. (our framework did this already and used SGML/XML to describe the workflow). But what does this mean for the analysis information if it doesn't get turned into code? Solution: Use analysis information to build up a meta-information layer. Use SGML/XML to describe it. Now generic objects can interpret this information and instantiate the necessary objects for processing. The meta-layer objects are of course the same ones we used to implement the configuration information, the trace facility etc. Conclusion: For all these uses of SGML/XML basically the same software components were used over and over again. And the real hard ones were written by James Clark anyway(:-). This is reusable software and has the nice side effect that after a while programmers get familiar with the interfaces and don't have to learn new ones to new data formats all the time. I mean - what's the difference between configuration information, external data formats, blackboards etc? Just different DTDs. But more important than reuse is the flexibility of software using SGML/XML to represent meta-information. Bringing a system to a new release does no longer mean: transform BLOB1 into BLOB2. It means transform XXX.dtd into YYY.dtd - a defined and traceable process. Due to the self-describing nature of SGML/XML information versioning becomes a defined and automated process too. Different versions can be detected and automatic translators can upgrade "legacy" objects. No longer do I have to have old classes in the system for backward compatibility reasons only. The bad news: Past (bad) experience shows that the real problem with using SGML/XML in software development is not a technical one. Using SGML/XML makes only sense if the everybody is willing to make information and assumptions EXPLICIT so they can go into DTDs and instances. This seems to be a sore point for many programmers that rather see this hidden in code (just look at the slow progress of pre/postcondition specification or semantic interface definitions). And no, I don't have a solution for this one. Merry Christmas and a Happy New Year, Walter xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Dec 22 15:46:05 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:39 2004 Subject: text/xml vs. application/xml In-Reply-To: <199712221500.PAA03260@nathaniel.eps.inso.com> References: <199712220200.VAA00527@unready.microstar.com> <199712221500.PAA03260@nathaniel.eps.inso.com> Message-ID: <199712221542.KAA01207@unready.microstar.com> Gavin Nicol writes: > > It is an error for an entity including an encoding declaration to > > be presented to the XML processor in an encoding other than that > > named in the declaration, or for an encoding declaration to occur > > other than at the beginning of an external entity. > > Note that this is "an error" not a fatal, or even necessarily > reportable error. Absolutely correct -- Tim Bray made the same point on this list a couple of weeks ago. The parser is not _required_ to report an error, but it is allowed to; in either case the document is still not well-formed. > >1) java EventDemo http://www.myhost.org/texts/sample.xml > > ==> receives charset="ISO-8859-1" as the default, ignores the > > encoding declaration, produces correct output (accidentally), > > and reports no error. > > It could report a mismatch. In this case yes, because it was possible to parse the encoding declaration. If the document had been encoded in UCS-2, it is unlikely that the parser would even have recognised an encoding declaration if it were trying to parse with the default charset="ISO-8859-1" (the parser would have to have some very sophisticated error-recovery techniques). > >2) java EventDemo ftp://ftp.myhost.org/pub/texts/sample.xml > > ==> reads the encoding declaration, realises that the document is > > _not_ in UCS-2, and reports an error (or worse, puts out > > garbage without reporting an error). > > > >3) java EventDemo sample.xml > > ==> same as (2). > > > >It is counter-intuitive that well-formedness depends on the > >transmission protocol. > > I would argue that all 3 could, and perhaps should produce similar > results. In that case, however, it will be necessary to amend the PR, so that parsers will not have the option of reporting an error, and so that the documents will qualify as well-formed. > This has nothing to do with MIME types. The main reason for problems > is that people (often unknowingly) violate the standards. HTTP is > pretty clear that for anything other than ISO 8859-1, the content must > be labelled correctly (i.e. it must have the correct charset). Unfortunately the only people who have control over that labelling are the system administrators -- if Sprynet decides to return the MIME type text/xml for all *.xml files, then I probably will not have the option of posting XML documents on my personal web site in anything but ISO-8859-1. Furthermore, the other problem remains: if text/xml uses ISO-8859-1 as the default, the the PR _must_ be amended to require XML processors to support ISO-8859-1 encoding -- after all, XML is a profile of SGML designed specifically for the Internet, and we will have a lot of explaining to do if it cannot play nicely. > The only time application/xml really makes sense is when UCS-2 or > UTF-16 data is being sent via email. In theory, yes; in practice, no. Private users built HTML into something big enough to attract the interest of the corporate and government sectors -- using text/xml will mean that for the next several years, at least, many private users will be unable to post anything but ISO-8859-1-encoded documents in their personal web space easily (and no XML parsers are required to support that encoding). This type of consideration does not matter so much for SGML, which is an International Standard defined independent of its media; XML, however, is a consortium standard created for a specific medium, so it cannot afford to ignore the more pragmatic concerns. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Mon Dec 22 15:53:58 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:39 2004 Subject: XML as a programming tool In-Reply-To: <41256575.00433DFD.00@druide> Message-ID: <3.0.1.16.19971222164723.469f2654@pop3.demon.co.uk> Walter, I found your analysis fascinating - though I didn't understand all of it. A few points. At 16:05 22/12/97 +0100, walter.kriha@systor.com wrote: > > >I'd like to give some examples for the use of SGML/XML in software >development (sorry, I never did any >publishing with SGML/XML and used it for software development only). Don't apologise. There is nothing to say that XML should only be used for "documents". My original posting was to suggest that *if* XML were widely used to support s/w development, then additional tools would start to become available, to all our benefit. > >Force: flexible and adaptive software needs meta-information: Yes. My configuration for JUMBO now consists almost entirely of XML files. This means that: (a) I can use standard XML tools for them (b) WF meta-information is trivially easy to insert. So most of the components of the files have <HELP> or some other documentation, often written in WF HTML. There can also be RDF and other things (e.g. like authorship). > >It takes about 2-3 weeks to integrate e.g. the SP parser/entity manager kit >into a framework. Most of the work goes into wrapping SP native classes >from the parser API into apropriate framework classes and interfaces (This >should get much better with a standard parser API). If you got a generic >composite object machine built in, just map the parser events into your >tree classes (nodes) and you have a representation of the configuration >information in memory. >The next step is to add some wrappers for convenience, e.g. implementing a >tiny query interface (findElementByName() etc. and your clients can avoid My query interface uses XLL TEIpointers, something like: Node node = tree.TEISearchFirstNode("DESCENDANT(1,MOL)"); Since XLL systems are required to provide TEIaddressing, it's trivial to use it as a search query as well. [...] >Force: Share information without coupling objects tightly. >Let's say you are doing some workflow. The workflow objects are part of a >tree (built from SGML/XML information) and child and parent nodes can >communicate with each other, using some fixed interfaces and some dynamic >ones(semantic data streams using DOM). But every once in a while some >information is created in a node that is useful for some other node that is >NOT directly connected to the first node. How can this node get the >information without linking both nodes? > >Solution: Turn some information tree into a blackboard. [...] I suspect that XLL XML-LINK="EXTENDED" can be used here. This can be used to capture the non-hierarchical relations. How *generic* it can be, is something which I think we have not resolved. > [...] >Conclusion: >For all these uses of SGML/XML basically the same software components were >used over and over again. And the real hard ones were written by James >Clark anyway(:-). This is reusable software and has the nice side effect I agree :-) [...] >The bad news: > >Past (bad) experience shows that the real problem with using SGML/XML in >software development is not a technical one. Using SGML/XML makes only >sense if the everybody is willing to make information and assumptions >EXPLICIT so they can go into DTDs and instances. This seems to be a sore >point for many programmers that rather see this hidden in code (just look >at the slow progress of pre/postcondition specification or semantic >interface definitions). And no, I don't have a solution for this one. I am optimistic about this, because of the success of XML in other fields. If people start writing small configurations files in XML to support "conventional documents", then gradually these will grow into larger scale resources. (This seems inevitable - people end up writing systems based on awk, or VB, which started out as 10 lines). Because XML scales well (I take this on faith :-) it becomes attractive to start building programs round an XML core. P. > > >Merry Christmas and a Happy New Year, > > >Walter > > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Mon Dec 22 16:07:30 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:59:39 2004 Subject: spam - chain letter In-Reply-To: <349DB429.59E2B600@telecnnct.com> (message from Jim Harmon on Sun, 21 Dec 1997 19:28:25 -0500) Message-ID: <199712221612.LAA22272@geode.ora.com> [Jim Harmon] > I suggest everyone who recieved a copy reply directly to the sender > with a copy to postmaster@aol.com and abuse@aol.com, mentioning that > this is a Federal Crime. The electronic version of "Chain Letters". I hate to prolong this thread, but this particular chain mail is *not* prosecutable, I believe. It wasn't an MMF, it was just an avoid-bad- luck message, and actually a pretty amusing one. Did anyone else actually read it? [Davidofvf] > John Mellor who was four years old deleted his letter and later the > same day while helping his mother prepare dinner, fed his hands > through the meat mincer, climbed into the microwave oven and > exploded himself all over his mother's shocked face. That's very _South Park_. [Jim Harmon] > DO NOT forward the message to ANYONE esle. Yeah - being amused by it doesn't mean I was glad to receive it. > (You'ld think everyone would have heard about this cr*p by now. > These things go around about every 6 months.) In general, yes, but I've never seen this one before. I don't know what Davidofvf's deal is, but that message was very bizarre. -Chris -- <!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN" "<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487 <USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Mon Dec 22 17:12:15 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:39 2004 Subject: XML as a programming tool Message-ID: <199712221712.RAA00699@mail.iol.ie> >[walter.kriha@systor.com] > >Force: flexible and adaptive software needs meta-information: > [Much interesting stuff deleted] Here is a technique we use here that I find *very* useful and *very* general. Someone out there might find it useful or know a better way to do it. When building particular configurations - either of software or documents - we build a "project file" (in XML of course). They all look at bit like this (this one is for a Document Database): <FinancialInformationDatabase> <Level title = "Company Law"> <Level title = "Companies Acts"> <Level title = "Companies Act 1985"> <file name = "uk85p1"> <Level title = "Auditing Guidelines"> <Level title = "Statements of Accounting Practice"> ... This is a "configuration" file basically. We might have a dozens of different processing scripts to run against this project to build different things. We don't want to have to manage oodles of little scripts all over the place, most of which need the meta-information. We tack the scripting stuff on to the configuration file (here using Python):- <Python name = "Do Something With The Files"> <[CDATA[ # THis script *knows* about the project file it itself is in # it is in the Project variable # Move to the Companies Act 1985 Node Project.Seek ("Level","title","Companies Act 1985") flist = GetChildren(...) # Do something useful ]]> </Python> <Python name = "Some Other Interesting Script"> <[CDATA[ ... ]]> </Python> </FinancialInformationDatabase> To run a particular script we load this XML file, locate the named Python Node and Execute its contents. The loader ensures that the code has acccess to the XML Tree structure in the variable "Project". This is a bit weird at first sight. A project file that contains the meta-data about a project and also the scripting required to operate on the meta-data:-) I find this hugely useful in practice. Note that the technique relies on the fact that Python allows you to evaluate a lump of code at run-time. Perl is the same. Dunno about Java. Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Mon Dec 22 17:24:18 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:39 2004 Subject: Corrected Re-post with SGML metamorphosed to XML Message-ID: <199712221723.RAA20948@mail.iol.ie> >[walter.kriha@systor.com] > >Force: flexible and adaptive software needs meta-information: > [Much interesting stuff deleted] Here is a technique we use here that I find *very* useful and *very* general. Someone out there might find it useful or know a better way to do it. When building particular configurations - either of software or documents - we build a "project file" (in XML of course). They all look at bit like this (this one is for a Document Database): <FinancialInformationDatabase> <Level title = "Company Law"> <Level title = "Companies Acts"> <Level title = "Companies Act 1985"> <file name = "uk85p1"/> </Level> </LeveL> <Level title = "Auditing Guidelines"> <Level title = "Statements of Accounting Practice">  ... </Level> </Level> </FinancialInformationDatabase> This is a "configuration" file basically. We might have a dozens of different processing scripts to run against this project to build different things. We don't want to have to manage oodles of little scripts all over the place, most of which need the meta-information. We tack the scripting stuff on to the configuration file (here using Python):- <Python name = "Do Something With The Files"> <[CDATA[ # THis script *knows* about the project file it itself is in # it is in the Project variable # Move to the Companies Act 1985 Node Project.Seek ("Level","title","Companies Act 1985") flist = GetChildren(...) # Do something useful ]]> </Python> <Python name = "Some Other Interesting Script"> <[CDATA[ ... ]]> </Python> </FinancialInformationDatabase> To run a particular script we load this XML file, locate the named Python Node and Execute its contents. The loader ensures that the code has acccess to the XML Tree structure in the variable "Project". This is a bit weird at first sight. A project file that contains the meta-data about a project and also the scripting required to operate on the meta-data:-) I find this hugely useful in practice. Note that the technique relies on the fact that Python allows you to evaluate a lump of code at run-time. Perl is the same. Dunno about Java. Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Mon Dec 22 17:34:08 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:39 2004 Subject: LISTRIVIA (Spam and other things) In-Reply-To: <199712221612.LAA22272@geode.ora.com> References: <349DB429.59E2B600@telecnnct.com> Message-ID: <3.0.1.16.19971222181425.4a47090e@pop3.demon.co.uk> Please can we avoid discussion of spam on XML-DEV for whatever motives. A primary weapon against spam is ignoring it. It's easy to get sidetracked. After a few private mails the position as I understand it is: - XML-DEV (and another list that Henry moderates) had the "who" facility enabled, so that membership of the list could be requested. - I believe that Henry intends to disable this to avoid a spammer being able to access the distribution list (at least without cracking into the machine/list). - this means that people will no longer be able to read the name/address they were subscribed under and this may make unsubscription more difficult. - (un)subscribe costs Henry a lot of effort, and it is extremely boring and tedious. We owe a great debt to him for this labour underneath the surface. Anyone (un)subscribing should do as much as possible to lighten this load. [I get a number of people writing to me to ask to unsubscribe, so I expect the number that Henry gets is larger.] I suggest we adopt the great battle cry of the late great John Major : "wait and see". If the situation continues at present it's bearable - we all get spams/MMFs most days. My main concern is that (a) the workload gets increasingly heavy anyway and (b) more moderation of some sort is required. Personally I favour moderation at subscribe-time, rather than post-time if it really has to come to that, but I hope it doesn't. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jharmon at telecnnct.com Mon Dec 22 19:21:50 1997 From: jharmon at telecnnct.com (Jim Harmon) Date: Mon Jun 7 16:59:39 2004 Subject: spam - chain letter References: <199712221612.LAA22272@geode.ora.com> Message-ID: <349EBAAF.773C2448@telecnnct.com> Chris Maden wrote: > [Jim Harmon] > > DO NOT forward the message to ANYONE esle. > > Yeah - being amused by it doesn't mean I was glad to receive it. > > > (You'ld think everyone would have heard about this cr*p by now. > > These things go around about every 6 months.) > > In general, yes, but I've never seen this one before. I don't know > what Davidofvf's deal is, but that message was very bizarre. "Davidofvf" has flamed me numerous times today. He claims he wrote this himself, entirely. I've reported him to relevant AOL persons, and this particular abuse should end. This guy seems to think of himself as a "hacker" and will be rudely awakend shortly. (The previous incarnations of this letter spoke of a beekeeper in Venezuela, a housewife in California, etc. etc., and had absolutely no purpose beyond "send more letters, or you'll be sorry", via snailmail.) -- Jim Harmon The Telephone Connection jim@telecnnct.com Rockville, Maryland xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From wharold at tivoli.com Mon Dec 22 20:08:31 1997 From: wharold at tivoli.com (Ward Harold) Date: Mon Jun 7 16:59:40 2004 Subject: lex, yacc, and xml Message-ID: <349EC7CB.23212C35@tivoli.com> <question name="why hand code parsers" class="potentially stupid"> Why is it that all of the XML parsers/processors I've seen appear to be hand coded rather than generated via lex/yacc or flex/bison? I seem to recall seeing something to the effect that yacc/bison can't handle the class of grammar that XML falls into. Then again I'm not a compiler constructor, opted for the AI sequence in graduate school, so I may be imagining things. Even if there is a technical reason for eschewing parser generation surely the basic lexing and scanning could be done with lex/flex, no? </question> <signature name="Ward K. Harold" address="ward.harold@tivoli.com"/> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Mon Dec 22 20:17:42 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:40 2004 Subject: lex, yacc, and xml Message-ID: <3.0.32.19971222121707.00a488e0@pop.intergate.bc.ca> At 02:04 PM 22/12/97 -0600, Ward Harold wrote: ><question name="why hand code parsers" class="potentially stupid"> >Why is it that all of the XML parsers/processors I've seen appear to be >hand coded rather than generated via lex/yacc or flex/bison? It could be done. At least one person I know is working on it. And in fact Norbert Mikula's NXP uses, I believe, a lex/yacc-like-thingie for Java. On the other hand, this is probably going to get you quite large code size; also there will likely be problems handling encodings; also these generators tend to make it hard to generate high-quality error messages. On the other hand, if you want a quick one-off in C, lex/yacc are proabably a very reasonable strategy. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Mon Dec 22 21:39:20 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:40 2004 Subject: JFC and SGML Message-ID: <349EA327.F5A4FA37@technologist.com> This quote from a JavaWorld article is just strange enough to make me curious... "Advanced text handling At the SIGS conference, Ted Faison of Faison Computing Inc. demonstrated the sophisticated text handling implemented in the JTextComponent class, which has SGML-like capabilities. JFC applications can support rich text, including multiple fonts, sizes, colors, highlighting, and embedded pictures. Complex layout will be supported, allowing developers to assign an element to a box to constrain its position and have other text elements flow around it." <SARCASM>That sounds like SGML alright!</> Does anyone know if there is a kernel of truth in the reference to SGML? Paul Prescod -- http://itrc.uwaterloo.ca/~papresco Art is always at peril in universities, where there are so many people, young and old, who love art less than argument, and dote upon a text that provides the nutritious pemmican on which scholars love to chew. -- Robertson Davies in "The Cunning Man" xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Dec 22 22:10:15 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:40 2004 Subject: lex, yacc, and xml In-Reply-To: <349EC7CB.23212C35@tivoli.com> References: <349EC7CB.23212C35@tivoli.com> Message-ID: <199712222207.RAA00338@unready.microstar.com> Ward Harold writes: > <question name="why hand code parsers" class="potentially stupid"> > Why is it that all of the XML parsers/processors I've seen appear to be > hand coded rather than generated via lex/yacc or flex/bison? I seem to > recall seeing something to the effect that yacc/bison can't handle the > class of grammar that XML falls into. Then again I'm not a compiler > constructor, opted for the AI sequence in graduate school, so I may be > imagining things. Even if there is a technical reason for eschewing > parser generation surely the basic lexing and scanning could be done > with lex/flex, no? > </question> This is actually a very good question, but I will second most of Tim's comments. With ?lfred, I set out to produce an Java-based XML parser under 20K (I missed by about 6K, but I'm still working on it). A hand-crafted recursive-descent parser seemed like the only reasonable choice, and it turned out to be very fast as well. In fact, it is not much harder to write a recursive-descent parser than it is to write out EBNF productions, at least not once you get into a rhythm and write a few helper methods for lexical scanning (like "readName()"). All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From zwang at pstat.ucsb.edu Mon Dec 22 22:29:46 1997 From: zwang at pstat.ucsb.edu (Zheng Wang) Date: Mon Jun 7 16:59:40 2004 Subject: JFC and SGML Message-ID: <Pine.GSO.3.95.971222142308.7977B-100000@fisher> Yes, in the JFC swing text, the swing team implemented an interface called element. They intended to capture the spirit of an SGML element. Zheng Wang Department of Statistics and Applied Probability University of California, Santa Barbara E-mail: zwang@pstat.ucsb.edu; http://www.pstat.ucsb.edu/~zwang xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From zwang at pstat.ucsb.edu Mon Dec 22 22:32:44 1997 From: zwang at pstat.ucsb.edu (Zheng Wang) Date: Mon Jun 7 16:59:40 2004 Subject: lex, yacc, and xml Message-ID: <Pine.GSO.3.95.971222142454.7977C-100000@fisher> The core part of the parser is actually parser a regular expression. I tried to use JLex, a parser generator (http://www.cs.princeton.edu/~appel/modern/java/JLex/manual.html) to do this.=20 But to use JLex, I have to transfer the XML syntax into JLex specification. So I don't think it is efficient to do that. Perhaps there are some other lex can to this easily. Zheng Wang Department of Statistics and Applied Probability University of California, Santa Barbara E-mail: zwang@pstat.ucsb.edu; http://www.pstat.ucsb.edu/~zwang xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From norbert at datachannel.com Mon Dec 22 23:49:55 1997 From: norbert at datachannel.com (Norbert Mikula) Date: Mon Jun 7 16:59:40 2004 Subject: lex, yacc, and xml References: <3.0.32.19971222121707.00a488e0@pop.intergate.bc.ca> Message-ID: <349EFC8A.A83C0C4@datachannel.com> Tim Bray wrote: > At 02:04 PM 22/12/97 -0600, Ward Harold wrote: > ><question name="why hand code parsers" class="potentially stupid"> > >Why is it that all of the XML parsers/processors I've seen appear to be > >hand coded rather than generated via lex/yacc or flex/bison? > > It could be done. At least one person I know is working on it. And > in fact Norbert Mikula's NXP uses, I believe, a lex/yacc-like-thingie > for Java. That's correct, NXP was created using JavaCC. NXP is currently being redesigned (overdue) to conform to 1.0. Please also have a look at : http://www.datachannel.com/pressroom/releases/Press32.htm -- Norbert H. Mikula Sr. Online Information Architect Norbert@DataChannel.com DataChannel, 155 108th Avenue NE Ste 400, Bellevue, WA 98004 Phone: 425.462.1999 Fax: 425.637.1192 http://www.datachannel.com -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 428 bytes Desc: Card for Norbert Mikula Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19971222/2ab5b41f/vcard.vcf From peter at ursus.demon.co.uk Tue Dec 23 00:21:19 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:40 2004 Subject: lex, yacc, and xml In-Reply-To: <199712222207.RAA00338@unready.microstar.com> References: <349EC7CB.23212C35@tivoli.com> <349EC7CB.23212C35@tivoli.com> Message-ID: <3.0.1.16.19971223011057.35df2688@pop3.demon.co.uk> At 17:07 22/12/97 -0500, David Megginson wrote: >Ward Harold writes: > > > <question name="why hand code parsers" class="potentially stupid"> > > Why is it that all of the XML parsers/processors I've seen appear to be > > hand coded rather than generated via lex/yacc or flex/bison? I seem to I think there are also historical reasons :-). Most (but not all) of the XML parsers have been written in Java and the lex/yacc functionality wasn't as fully available in Java. (As Tim says, Norbert has used JACC - which is the "right" way to do it, but it does generate a large amount of Java code/classes. This is fairly impenetrable, so in the absence of an agreed API it isn't easy to integrate into other applications if you want access to things not in the API. Another reason was that some early constructs in the languages (especially involving Parameter Entities) were difficult for some humans and machines to interpret :-). The current approach to PEs is considerably simpler and (I assume) can be lexed and yacc'ed OK. In my view the difficulty in writing a parser is not the BNF or recursive descent (even I have partially written one of those) but agreeing on what the semantics are in various cases :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Tue Dec 23 08:49:56 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:40 2004 Subject: XML as a programming language Message-ID: <199712230849.IAA16060@GPO.iol.ie> [Posted on behalf of Mr Jon Spinosa who asked me to post in here in private correspondence] [In relation to Time Ordering being important + useful and XML's ability to capture it] Another good example that supports Sean's assertion is in medical care. Clinical practice guidelines (ie---treatment protocols) almost always have a time _and_ sequence based aspect which is expressed implicitly or explicitly in the clinical narrative. Being able to model this in XML could protentially allow "engines" that could make sure things get done in the proper order and time. This is extremely hard to accomplish currently using only RDMS technology alone or programmatically because the observations of the health care providers seldom fit the nice cubby holes needed for the program. Sorry for the poor description but I am behind in my Holiday gift shopping. John Spinosa john@spinosa.com Sean Mc Grath sean@digitome.com Digitome Electronic Publishing http://www.digitome.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From h.rzepa at ic.ac.uk Tue Dec 23 12:55:56 1997 From: h.rzepa at ic.ac.uk (Rzepa, Henry) Date: Mon Jun 7 16:59:40 2004 Subject: Christmas Interuptions Message-ID: <v03110703b0c555541ce6@[155.198.224.86]> Dear all, During the Christmas vacation, neither I nor the local list administrators will be around to deal with any problems relating to the list. Given the most unfortunate spams that members might have received personally (ie not directly via the list itself), we have decided to suspend certain list operations during the vacation. Specifically, these are a) (un) subscription requests to the list are now fully private, ie MUST be moderated by myself. No-one will be able to subscribe (or unsubscribe, the two are linked) from the list during the vacation period. b) The hypermail archive will not be updated during this period. Postings to the list from existing members WILL continue and will NOT need to be moderated. Please accept however that we will not be able to respond to any undesirable postings during this period. It is my fervent hope that there will be none of course. Very best wishes to you all for Christmas and the new year. xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From walter.kriha at systor.com Tue Dec 23 13:24:53 1997 From: walter.kriha at systor.com (walter.kriha@systor.com) Date: Mon Jun 7 16:59:40 2004 Subject: XML as a programming tool Message-ID: <41256576.0046250B.00@druide> wrote: >I found your analysis fascinating - though I didn't understand all of it. >A few points. It wasn't really an analysis. This stuff was implemented in a large framework project about 2 years ago. And when we did it the reuse potential of SGML in connection with generic objects was much less clear than it looks now (:-) >My query interface uses XLL TEIpointers, something like: >Node node = tree.TEISearchFirstNode("DESCENDANT(1,MOL)"); >Since XLL systems are required to provide TEIaddressing, it's trivial to >use it as a search query as well. [...] I haven't really studied TEI pointers yet. From what I remeber they look like the HyTime tree addressing mode which I understand as basically giving indices into a tree e.g. (1,2,3) meaning from the root element the second child and from this the third child.But it looks like you can specify some properties too. This is surely a nice way to address things but if used in queries it seems to bind the query to an exact tree layout. That' why I prefer purely property based queries because then the locations of the elements can change but my client still gets the information it wants (if it is still in the tree somewhere). And change it will... >>Solution: Turn some information tree into a blackboard. >I suspect that XLL XML-LINK="EXTENDED" can be used here. This can be used >to capture the non-hierarchical relations. How *generic* it can be, is >something which I think we have not resolved. I had something more profane in mind: if the implementation of a DOM node also implements an observer interface then clients can register for changes in elements or element content. This is a solution if you need to separate publishers from subscribers without going to an object bus technology right away. The document tree provides a structured medium for decoupled communication between objects. Thank you very much for your response and have a nice day, Walter xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 23 15:34:45 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:40 2004 Subject: XML as a programming tool In-Reply-To: <41256576.0046250B.00@druide> Message-ID: <3.0.1.16.19971223160204.2cff40f2@pop3.demon.co.uk> At 14:25 23/12/97 +0100, walter.kriha@systor.com wrote: [...] >>My query interface uses XLL TEIpointers, something like: >>Node node = tree.TEISearchFirstNode("DESCENDANT(1,MOL)"); >>Since XLL systems are required to provide TEIaddressing, it's trivial to >>use it as a search query as well. >[...] >I haven't really studied TEI pointers yet. From what I remeber they look It's worth taking the time :-) They are very powerful. And of course they are tested. >like the HyTime tree addressing mode which I understand as basically giving >indices into a tree e.g. (1,2,3) meaning from the root element the second >child and from this the third child.But it looks like you can specify some >properties too. I can't answer authoritatively for HyTime, but the present XLL addressing scheme owed a *great* deal to the HyTime effort and (IMO) is an excellent middle point between the total power but abstraction of HyTime and nothing at all. > >This is surely a nice way to address things but if used in queries it seems >to bind the query to an exact tree layout. That' why I prefer purely No. It has to be a tree, but the exact layout does *not* need to be known. >property based queries because then the locations of the elements can >change but my client still gets the information it wants (if it is still in >the tree somewhere). And change it will... This is precisely why it is so powerful - it does not need to know the exact structure. As an example, JUMBO uses a variety of tree-structured components and ancillary files: - DTDs (in tree form - gurus tell me this is a simple "grove") - namespace information - mime types - menus Let's assume that this is agglomerated into a single large tree, which continues to grow as I add more bits on. And as my brain is mature and therefore full, I can never remember where the bits are. BUT suppose I know that I want to find all menu items which have the word "Print" in (I might have written a print output routine, for example. *Without* referring back to anything I can write: NodeSet printNodes = jumboTree.TEISearch("ROOT()DESCENDANT(ALL,MENU)DESCENDANT(ALL,MENUITEM)STRIN G(1,"Print",0) All I have to remember is that there are things called MENUITEMs which occur *somewhere* within MENUs and I only want those with String content (PCDATA) which somewhere contains the substring "Print". Then no matter how much the tree is edited. merged, or whatever, *so long as the MENUITEMs occur within the subtree of MENUs* the search will be valid. Wow! If you are interested, JUMBO contains this functionality which is within the letter and the spirit of the XLL spec. It can be extended by obvious means, but I am reluctant to do anything that might compromise the boundaries of the spec. [The area that is most problematic is case-sensitivity, because I am sure that people will want it, but the spec is adamant that - at present - all components of the string above are case-sensitive. Extensions to regexps, etc. are also "obvious".]. P. > > > >>>Solution: Turn some information tree into a blackboard. > >>I suspect that XLL XML-LINK="EXTENDED" can be used here. This can be used >>to capture the non-hierarchical relations. How *generic* it can be, is >>something which I think we have not resolved. >I had something more profane in mind: if the implementation of a DOM node >also implements an observer interface then clients can register for changes This is the software implementation I assume - (e.g. the Java Observable class). You make "connections" between the observer and the observable. In XLL (AIUI) this can be done with EXTENDED links - you can link one set of "resources" to another set. That could then, of course, be implemented by the Observer mechanism. So a *document* can mandate that when document A changes, addresses B should be notified. [But I'm not an expert]. >in elements or element content. This is a solution if you need to separate >publishers from subscribers without going to an object bus technology right >away. The document tree provides a structured medium for decoupled >communication between objects. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 23 15:44:00 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:40 2004 Subject: Christmas Interruptions (and SPAM) In-Reply-To: <v03110703b0c555541ce6@[155.198.224.86]> Message-ID: <3.0.1.16.19971223163035.32b7a244@pop3.demon.co.uk> At 12:54 23/12/97 +0100, Rzepa, Henry wrote: >Dear all, > >During the Christmas vacation, neither I nor the local list administrators >will be around to deal with any problems relating to the list. The list has been going nearly a year now and it is fitting to offer our thanks to Henry for all the hard work he has put in. He tells me that there are a substantial number of non-automatic things he has to do that take up a considerable amount of time. It is very much a self-sacrifice because in our dynamic view of education running the XML-DEV list does not bring Henry any benefits. [In contrast I benefit to a considerable degree.] > >Given the most unfortunate spams that members might have received >personally (ie not directly via the list itself), we have decided to suspend >certain list operations during the vacation. XML-DEV Members will almost all have received a racist, and therefore illegal, spam today. This did not involve XML-DEV - only its members - and has not appeared on the hypermail, but we cannot take chances. If something like this were to happen, then we should have to take immediate action which might well involve us instituting more complex (and therefore costly) procedures. I have mailed abuse@aol.com about this. Please do NOT mail the list about any aspect of spam. *I* shall be around during over the next 2 weeks, so if there are any pressing problems, please send them to me, though I can't promise action. I do not know - nor wish to know - what individual recipients do with their mail from XML-DEV. I suspect that some redistribute it further within their organisation - if so, make sure you have trapped and deleted this morning's mail. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Tue Dec 23 16:00:36 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:40 2004 Subject: Christmas Interruptions (and SPAM) Message-ID: <3.0.32.19971223080134.00929a90@pop.intergate.bc.ca> At 04:30 PM 23/12/97, Peter Murray-Rust wrote: >>During the Christmas vacation, neither I nor the local list administrators >>will be around to deal with any problems relating to the list. Three cheers to Henry for maintaining a very high-value resource. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From srn at techno.com Tue Dec 23 19:24:01 1997 From: srn at techno.com (Steven R. Newcomb) Date: Mon Jun 7 16:59:40 2004 Subject: Christmas Interruptions (and SPAM) In-Reply-To: <3.0.1.16.19971223163035.32b7a244@pop3.demon.co.uk> (message from Peter Murray-Rust on Tue, 23 Dec 1997 16:30:35) Message-ID: <199712231617.LAA01463@bruno.techno.com> I hope this is the VERY LAST comment on the excrement in our mailboxes. (Sorry, Peter, this needs to be said. You can't shoulder the burden that we must each shoulder ourselves.) > ... take immediate > action which might well involve us instituting more complex (and therefore > costly) procedures. No. As a practical matter, where obnoxious anonymous speech is concerned, there are only two choices: respond with counteracting speech, or pretend it's not happening and wait for it to stop. The former choice, responding with more speech, is at best a waste of our time. Nobody on this list is persuaded by this infantilism anyway, so there's nothing to counteract. The latter choice is the best: pretend it's not happening. Delete and forget, in one quick motion. Send no replies. Do not forward anything to anyone. Do not discuss it with anyone, publicly or privately. Take no action whatsoever. It is a scientifically proven fact that positive and negative reinforcement are both reinforcement. In other words, any reaction whatsoever, positive or negative, is likely to reinforce a childish bad behavior. The more we react to these notes, the more likely we are to receive more such notes. So far, even by discussing this, we are giving this person exactly what he wants. Let's stop reinforcing the poor boob. The bad behavior will eventually cease, and we all will have lost the least possible time and effort. To take any other course is to consume scarce resources and produce nothing. -Steve -- Steven R. Newcomb, President, TechnoTeacher, Inc. srn@techno.com http://www.techno.com ftp.techno.com voice: +1 972 231 4098 fax +1 972 994 0087 3615 Tanner Lane Richardson, Texas 75082-2618 USA xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Tue Dec 23 20:37:16 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:59:40 2004 Subject: Christmas Interruptions (and SPAM) In-Reply-To: <199712231617.LAA01463@bruno.techno.com> References: <3.0.1.16.19971223163035.32b7a244@pop3.demon.co.uk> Message-ID: <3.0.3.32.19971223153637.0313b630@pop.mindspring.com> At 11:17 AM 12/23/97 -0500, Steven R. Newcomb wrote: >No. As a practical matter, where obnoxious anonymous speech is >concerned, there are only two choices: respond with counteracting >speech, or pretend it's not happening and wait for it to stop. > >The former choice, responding with more speech, is at best a waste of >our time. Nobody on this list is persuaded by this infantilism >anyway, so there's nothing to counteract. I hope you are right. After a racist spam like that, I think it is good that Henry and Peter have treated this as a real problem, and I hope that the blacks who participate on this list know that they are welcome here. Jonathan jonathan@texcel.no Texcel Research http://www.texcel.no xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 23 21:09:51 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:40 2004 Subject: Christmas Interruptions (and SPAM) In-Reply-To: <199712231617.LAA01463@bruno.techno.com> References: <3.0.1.16.19971223163035.32b7a244@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971223215934.2cff58bc@pop3.demon.co.uk> At 11:17 23/12/97 -0500, Steven R. Newcomb wrote: >I hope this is the VERY LAST comment on the excrement in our >mailboxes. (Sorry, Peter, this needs to be said. You can't shoulder >the burden that we must each shoulder ourselves.) > >> ... take immediate >> action which might well involve us instituting more complex (and therefore >> costly) procedures. > >No. As a practical matter, where obnoxious anonymous speech is >concerned, there are only two choices: respond with counteracting >speech, or pretend it's not happening and wait for it to stop. I would like this to be the last public posting on this issue :-) Henry is away, so I am doing whatever is possible over the holiday period. ***IF YOU NEED TO COMMUNICATE ON NON-XML MATTERS, MAIL ME AND I WILL TRY TO DO THE APPROPRIATE THING*** Several people have mailed me privately. I'll make the following points: (A). The list is mounted, serviced by and (implicitly) funded by Imperial College. Therefore, whatever members of this list, including Henry and myself think, we have to abide by the rules and practice of that organisation. I have spoken with Henry, since I felt that action was necessary. As a result we have disabled the hypermail server, not least for reasons in (B). Henry is NOT sysadm of the list technology. He is therefore bound by what is possible within their remit - nothing unusual about this - most of us work that way. My major concern was that offensive mail could be posted to the hypermail *** and be visible to anyone on the WWW ***. Of course this could have happened any time during the year, but someone would have been available to react quickly and delete it (though I personally know that deleting mail from hypermail is non-trivial). (B) In the UK (where Imperial, Henry are I are located), such postings *can be* illegal. That is why I used the word specifically. I have sat on our local Police and Community Consultative Committee in Harrow (London). This is a community of very widespread origins and racial harassment is taken seriously. If someone were to post such a message through a letter box it would be investigated by the police with a view to prosecution if enough strong evidence were available. If someone were to post such a message in a public electronic place (e.g. a mailing list) and it reached a UK resident, then I would not be surprised if an investigation were launched. I would not wish such a message to remain in public view for any length of time without taking action. For the record, there is another mailing list at IC run by Henry which has suffered the same problem. I appreciate the concern of the list members. However, PLEASE do not post to the list, whether or not it is hypermailed. I do not have to explain the benefit of NOT having these messages visible to unlimited humans and robots. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From murata at apsdc.ksp.fujixerox.co.jp Wed Dec 24 07:24:20 1997 From: murata at apsdc.ksp.fujixerox.co.jp (MURATA Makoto) Date: Mon Jun 7 16:59:40 2004 Subject: Shameless advertisement Message-ID: <9712240723.AA03016@lute.apsdc.ksp.fujixerox.co.jp> A Japanese book on XML is published today. "Introduction to XML" Makoto Murata, Atsuhito Momma, Kyoichi Arai Published by Nihon Keizai Shimbun ISBN 4-532-14610-0 This book is based on the XML proposed recommendation. For example, "parsed entities" rather than "text entities". Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Wed Dec 24 12:59:16 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:59:40 2004 Subject: lex, yacc, and xml In-Reply-To: Ward Harold's message of Mon, 22 Dec 1997 14:04:27 -0600 References: <349EC7CB.23212C35@tivoli.com> Message-ID: <f5b90ta7v8k.fsf@cogsci.ed.ac.uk> Ward Harold <wharold@tivoli.com> asked about lex/yacc for XML The currently available LT XML release (http://www.ltg.ed.ac.uk/software/xml/) uses a lex/yacc parser. The division of labour is not completely comfortable, particularly in the area of the handling of parameter entities, but on balance the architecture is pretty clean. Sources are included in the distribution. We've re-written by hand for efficiency reasons for the new release which will appear shortly. ht -- Henry S. Thompson, Human Communication Research Centre, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440 Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk URL: http://www.cogsci.ed.ac.uk/~ht/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Thu Dec 25 00:45:07 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:59:40 2004 Subject: IDL? Message-ID: <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> I have to agree with Paul here. If the interface is only implementable in a single language, then you've failed. Programmers need to make engineering tradeoffs among a number of factors, and will sometimes very reasonably choose one language over another. We could expediently pick one language and ignore all others in order to simplify the problem a little, but that would be letting the solution dictate the problem. I don't recommend it, since it would simply mean that the parsers written in other languages would be guaranteed incompatible. --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: Paul Prescod [SMTP:papresco@technologist.com] > Sent: Wednesday, December 17, 1997 12:30 PM > To: xml-dev@ic.ac.uk > Subject: Re: IDL? > > Peter Murray-Rust wrote: > > > > The interface has to be simple enough for people like me to understand > and > > to tell my friends what it's about. I would prefer to limit the > Consumers, > > Factories and the rest to as few as possible. > > An IDL interface implies no extra complication in the Java interface. It > merely describes the Java interface in terms that are more universal > than Java itself -- it is like a DTD for interfaces. So far nobody has > yet proposed anything that would make an IDL description impossible. All > I ask is that: > > a) nobody do so later (e.g. require runtime lookup of Java class objects > or do something simiarly brain-dead) and > > b) implementations in other languages be considered "successes" in terms > of the success/failure of this project. > > I don't think that either of these constraints endanger the success of > the Java-specific part of the project or make the Java-specific part > more difficult. > > Paul Prescod > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Thu Dec 25 14:29:46 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:40 2004 Subject: IDL? In-Reply-To: <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.mi crosoft.com> Message-ID: <3.0.1.16.19971225152720.20cf3544@pop3.demon.co.uk> At 16:44 24/12/97 -0800, Andrew Layman wrote: >I have to agree with Paul here. If the interface is only implementable in a >single language, then you've failed. Programmers need to make engineering I'd hate to use the word 'fail' (I'm serious about this). If we meet David Megginson's criteria, then we have succeeded, possibly only in a smallish way, but it's a success. It would be a success because: - it would show that there is a communal willingness amongst early adopters of XML to collaborate. This, in itself would be a remarkable achievement, since members of this list are not bound by the same commitment as members of the W3C (and de facto the XML-SIG). It would show that there are people and organisations that wish to work with interoperable software and which are - in some measure - prepared to put resource and effort into the communal pool. - it gives experience and guidance for what works in this environment. IOW if David's criteria can be met, then they may be useful criteria for the next time round. - it gives **working software** from which we can gain experience for the next 'virtual project', whatever that is. >tradeoffs among a number of factors, and will sometimes very reasonably >choose one language over another. We could expediently pick one language I am very sympathetic to this concern. I wrote my earlier SGML system (costwish) in tcl, and I'd be far happier writing the current (JUMBO) graphics in tk than Java. >and ignore all others in order to simplify the problem a little, but that >would be letting the solution dictate the problem. I don't recommend it, >since it would simply mean that the parsers written in other languages would >be guaranteed incompatible. OK - Andrew and Paul have made a point. The question is what is to be done, **practically**. I assume that their postings, like all on this list, are constructive. What we cannot so, IMO, is to lose the momentum that we have at present for Jan 12. Therefore I'd suggest that Andrew and Paul propose a way of taking the Jan 12th result (which is only 2 weeks away) forward. A primary point to realise is that the Jan12 milestone will address many language-independent points. It will basically define what concepts come under the simple API and what terminology is to be used. If we agree on those then I would assume that creating a language-independent API/IDL would be relatively straightforward. >From my point of view (I can't answer for DavidM or TimB) I would see a possible way forward as: - Jan 12: Sax-J - shortly after: IDL for Simple API (SIDL?) based on experience with Sax-J - formulation of SIDL-J (Java binding for SIDL) I have got sufficient public and private mail to suggest that this list plays a useful role, but a central motivation - at least for me - is that it is more than a talking shop. Discussion is easy, implementation is much harder. Therefore, for those who propose that we should have an IDL approach (and this is not the first time it's been discussed) any implication that they would be prepared to contribute some effort would be extremely welcome :-) P. BTW - and this is not meant negatively - a common interface would be very useful for MSXML. I downloaded it yesterday and I'm still trying to find out how to get off the ground with it. I'll get there, but - like all software at this level - there is quite a lot of exploration to be done. P. [... redundant xml-dev signatures deleted ...] Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Fri Dec 26 15:15:57 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:40 2004 Subject: IDL? In-Reply-To: <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> References: <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> Message-ID: <199712261255.HAA00299@unready.microstar.com> Andrew Layman writes: > I have to agree with Paul here. If the interface is only > implementable in a single language, then you've failed. > Programmers need to make engineering tradeoffs among a number of > factors, and will sometimes very reasonably choose one language > over another. We could expediently pick one language and ignore > all others in order to simplify the problem a little, but that > would be letting the solution dictate the problem. I don't > recommend it, since it would simply mean that the parsers written > in other languages would be guaranteed incompatible. I have no intention of proposing anything that precludes a compiled language (without runtime type-checking), so if that's the only barrier, then there is no need for concern. There _is_ a reason for concern, however, with the goals and scope of this project. It will be (I hope) an easy task to design a simple API, with sample interfaces in Java, but we need to know what kind of an API we are designing, and why. For example, when DOM interfaces are available for NXP, ?lfred, and Lark as well as MSXML, Peter may not need any other common interface for Jumbo. If people really want the DOM, then the parser writers should work on implementing whatever the current draft defines instead of spending time on the simple interface. If they still need the simple event-based interface, then I have to understand what they need it for: 1) (My suggestion.) A pre-DOM interface, defining the events returned by an XML parser, and providing enough information to build a DOM tree (PIs, attributes, elements, data, DOCTYPE declarations, etc.). 2) (Tim's suggestion.) A post-DOM interface, for people who don't want to learn the complexity of the DOM, and providing only the minimum possible information (elements, attributes, and data). Do we need either if these? If so, which one do we want? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mecom-gmbh at mixx.de Fri Dec 26 18:29:59 1997 From: mecom-gmbh at mixx.de (james anderson) Date: Mon Jun 7 16:59:40 2004 Subject: attlistdecl (nmtoken vs attvalue) in enumeration Message-ID: <34A3F790.217153DF@mixx.de> greetings, why are the elements of an enumarated attribute type specified to be name tokens rather than attribute values? to wit: [52] AttlistDecl ::= '<!ATTLIST' S Name AttDef* S? '>' [53] AttDef := S Name S AttType S Default [54] AttType ::= StringType | TokenizedType | EnumeratedType [57] EnumeratedType ::= NotationType | Enumeration [58] NotationType ::= 'NOTATION' S '(' S? Name (S? '|' Name)* S? ')' [59] Enumeration ::= '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' [60] Default ::= '#REQUIRED' |?'#IMPLIED' | (('#FIXED' S)? AttValue) ? isn't their domain actually attribute values, as in the example: <!ATTLIST list type (bullets|ordered|glossary) "ordered"> ? since name characters constitute a smaller domain than attribute value characters don't you end up with attribute values which can't be included in the enumeration constraint? ? even without the distinction in character range, isn't this conflating two domains - that of interned tokens and that of string values - which are better of kept distinct. either the constraints look like tokens, but must be parsed as if they were strings, or the constraint evaluation must permit tokens to be compared to strings. ? wouldn't <!ATTLIST list type ("bullets"|"ordered"|"glossary") "ordered"> be both clearer and easier to implement ? thanks, bye, xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Fri Dec 26 19:50:02 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:40 2004 Subject: IDL? In-Reply-To: <199712261255.HAA00299@unready.microstar.com> References: <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> Message-ID: <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> At 07:55 26/12/97 -0500, David Megginson wrote: [...] >I have no intention of proposing anything that precludes a compiled >language (without runtime type-checking), so if that's the only >barrier, then there is no need for concern. Good. If I understand this, the creation of Sax-J will not only not preclude the creation of a language-independent API, but should make that easier. > >There _is_ a reason for concern, however, with the goals and scope of >this project. It will be (I hope) an easy task to design a simple >API, with sample interfaces in Java, but we need to know what kind of >an API we are designing, and why. > >For example, when DOM interfaces are available for NXP, ?lfred, and >Lark as well as MSXML, Peter may not need any other common interface >for Jumbo. If people really want the DOM, then the parser writers >should work on implementing whatever the current draft defines instead ("current draft" [of the DOM] I assume). >of spending time on the simple interface. If they still need the >simple event-based interface, then I have to understand what they need >it for: > >1) (My suggestion.) A pre-DOM interface, defining the events returned > by an XML parser, and providing enough information to build a DOM > tree (PIs, attributes, elements, data, DOCTYPE declarations, etc.). > I wasn't aware that your suggestion and Tim's were sufficiently distinct - I thought that the two of you had enough common ground this wasn't a concern. By "pre-DOM" I assume you mean: - valid only until the DOM comes into effect - (possibly) a subset of DOM functionality. >2) (Tim's suggestion.) A post-DOM interface, for people who don't > want to learn the complexity of the DOM, and providing only the > minimum possible information (elements, attributes, and data). By "post-DOM" I assume you mean "will not be onsoleted by the DOM", rather than "cannot be put into operation until the DOM. > >Do we need either if these? If so, which one do we want? There are a number of possible axes here. However I think it is very important that we do not re-discuss this area *in such depth* that yet again it runs into the sand. (I am assuming that a DOM interface is of the order of 3-9 months away, and its finalisation date will often be uncertain. It is psychologically not a good idea to try to create something specifically for this interstitial period.) So my axioms are: - the DOM approach is complex for webhackers to learn. Do NOT underestimate the difficulty newcomers have in trying to interpret XML's full power. - webhackers will NOT use parts of XML. WF documents do not need DOCTYPEs, webhackers may not understand NOTATION, PIs will be regarded as hardcoded rituals. - the current crop of parsers show sufficient variation in terminology and approach to confuse a webhacker trying to interface to XML. - There will shortly (we hope) be a large number of webhackers. Some typical uses of Sax-J will be: - "I've got this chunk of XML embedded in the middle of some HTML. How do I read it?" [This is likely to have no DOCTYPE, will not use PIs or NOTATION and will almost certainly not use entities.] - "I want to start learning about XML. Is there a simple program [== code] that I can use to learn-as-I-go?" - "I don't understand the XML spec. Is there a simple subset of XML that I can get started on?" These concerns will endure after the DOM - indeed it may even highlight them more strongly. So I believe that there is a valid and useful role for an interface dealing with elements, attributes and data. And that it will endure. So - please don't regard the David/Tim axis as significant. One advantage of limiting ourselves to E, A D is that it is more likely that we shall agree, shall finish by Jan 12, etc. Finally - as I said before - do not underestimate the value of producing this interface. David has pointed out that *he* needed clarification of the goals. I hope that David and Tim can agree on a revised set of goals in the next day or two and that we can go ahead on those. If you post on this subject, please try to help us achieve something concrete; we know from experience that it's very easy to broaden the outlook to a stage where things become too diffuse for virtual collaboration. I think we are nearly there... P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From richard at cogsci.ed.ac.uk Sat Dec 27 18:11:20 1997 From: richard at cogsci.ed.ac.uk (Richard Tobin) Date: Mon Jun 7 16:59:40 2004 Subject: lex, yacc, and xml In-Reply-To: Henry S. Thompson's message of 24 Dec 1997 12:59:07 +0000 Message-ID: <24515.199712271811@pitcairn.cogsci.ed.ac.uk> > The currently available LT XML release (http://www.ltg.ed.ac.uk/software/xml/) > uses a lex/yacc parser. The ugliest part of this code is the DTD parsing, because you want (say) SYSTEM returned in some places as a keyword, and in others as a name. To achieve this, the yacc layer has to be constantly setting the lexer mode ("lexical tie-ins"). Contrast this with C (surprise!) where you can't have a variable with the same name as a keyword. As Henry said, the performance is one reason why we switched to a plain C parser. Another is the question of 16-bit characters, though this could probably have been kludged since all the syntactically important characters are < 128. -- Richard xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Sat Dec 27 19:59:26 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:40 2004 Subject: IDL? In-Reply-To: <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> References: <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> <199712261255.HAA00299@unready.microstar.com> <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> Message-ID: <199712271242.HAA00378@unready.microstar.com> Peter Murray-Rust writes: > >1) (My suggestion.) A pre-DOM interface, defining the events returned > > by an XML parser, and providing enough information to build a DOM > > tree (PIs, attributes, elements, data, DOCTYPE declarations, etc.). [...] > By "pre-DOM" I assume you mean: > - valid only until the DOM comes into effect > - (possibly) a subset of DOM functionality. Actually, I am using it in a linear-processing model (you must view this with a monospaced font like Courier): * David's Model: PARSER --> SAX-J --> DOM --> [tree-based user application] | --------------> [event-based user application] In other words, a DOM builder would be just another an event-based SAX-J application. > >2) (Tim's suggestion.) A post-DOM interface, for people who don't > > want to learn the complexity of the DOM, and providing only the > > minimum possible information (elements, attributes, and data). > > By "post-DOM" I assume you mean "will not be onsoleted by the DOM", rather > than "cannot be put into operation until the DOM. In this case, I am using a slightly different model: * Tim's Model: PARSER --> DOM --> SAX-J --> [event-based user application] | ---------------> [tree-based user application] In other words, SAX-J would be just a simpler event-based API for the DOM. I don't see a very pressing need for the latter -- tree structures are familiar to coders, whether or not know anything about XML -- but I would be happy to implement it if requested (I do not want to exclude PI's, however, since that will exclude the possiblity of using architectural forms and other standards working on top of XML). Is this what everyone else is expecting? All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Sun Dec 28 12:48:32 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:40 2004 Subject: IDL? In-Reply-To: <199712271242.HAA00378@unready.microstar.com> References: <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> <199712261255.HAA00299@unready.microstar.com> <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971228134641.2b9fb744@pop3.demon.co.uk> At 07:42 27/12/97 -0500, David Megginson wrote: >Peter Murray-Rust writes: > > > >1) (My suggestion.) A pre-DOM interface, defining the events returned > > > by an XML parser, and providing enough information to build a DOM > > > tree (PIs, attributes, elements, data, DOCTYPE declarations, etc.). > [...] > > By "pre-DOM" I assume you mean: > > - valid only until the DOM comes into effect > > - (possibly) a subset of DOM functionality. > >Actually, I am using it in a linear-processing model (you must view >this with a monospaced font like Courier): [Note for hypermail readers - hypermail destroys any space-dependent formatting. I don't think there is a way round this. In XML, of course, there will be no problem - will there?] > >* David's Model: > > PARSER --> SAX-J --> DOM --> [tree-based user application] > | > --------------> [event-based user application] > >In other words, a DOM builder would be just another an event-based >SAX-J application. This is roughly what JUMBO does at present. It uses the event-based interfaces of lfred, Lark and NXP and builds a tree from the result. This tree (or primitive grove) actually contains tree-structured representations of PIs and DTDs. Therefore JUMBO has no problem in using this SAX-J model. [The question is whether SAX-J would offer doPI, doAttlist, etc.] This suggests that implementers of the DOM (or other tree-related interfaces) will build on top of SAX-J. IFF you/we can persuade them of this, then great. If not, then there might be a tendency for SAX-J to atrophy after the DOM. > > > >2) (Tim's suggestion.) A post-DOM interface, for people who don't > > > want to learn the complexity of the DOM, and providing only the > > > minimum possible information (elements, attributes, and data). > > > > By "post-DOM" I assume you mean "will not be onsoleted by the DOM", rather > > than "cannot be put into operation until the DOM. > >In this case, I am using a slightly different model: > >* Tim's Model: > > PARSER --> DOM --> SAX-J --> [event-based user application] > | > ---------------> [tree-based user application] > >In other words, SAX-J would be just a simpler event-based API for the >DOM. In this diagram there is a possible assumption that SAX-J can only be finalised after the DOM is finalised. I hope not, because otherwise the urgency will disappear. If, however, the DOM used different *terminology* from SAX-J (and terminology is a prime concern of mine) then we should have a conflict and a confusion. > >I don't see a very pressing need for the latter -- tree structures are >familiar to coders, whether or not know anything about XML -- but I >would be happy to implement it if requested (I do not want to exclude >PI's, however, since that will exclude the possiblity of using >architectural forms and other standards working on top of XML). I do not wish PIs to be excluded either, since they are the primary (suggested) mechanism for namespaces. However the problems of **knowing what to do with PIs** far outweigh the problems of *reading* them :-). IOW if I use doPI() (in Lark) or processingInstruction() in lfred I get a result like: target=BLORT PIString='snark="Boojum" vanish="softly silently"' whose semantics are far more difficult than simply capturing the contents. I'd just like to have a single syntax for: - the method name (e.g. doPI()) - the 'target' (e.g. "target") - the rest-of-the-PI (e.g. PIString) The same is even more important for NOTATION. If people are actually going to use NOTATION, then *please* give us some handles for the bits :-) > >Is this what everyone else is expecting? I was actually expecting: PARSER -->SAX-J --> [event-based user application] where SAX-J is a black box that emits Attributes, Elements, Data (and possibly PIs, NOTATIONs and DTD components in decreasing order of "gotta-have"). I appreciate that for those constructing parsers, the question of where SAX-J sits w.r.t. DOM is important and possibly affects their ease in implementing SAX-J. And, of course, we must make it easy for parser writers, since if it isn't they won't play. From the *user*'s point of view, the interior of SAX-J is irrelevant :-) P. My interpretation of "pre-" and "post-" on the time, rather than space, coordinate can be dropped, except to say that it's critical not to waste time "waiting for the DOM". Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From matthewg at poet.de Sun Dec 28 19:38:25 1997 From: matthewg at poet.de (Matthew Gertner) Date: Mon Jun 7 16:59:40 2004 Subject: IDL? Message-ID: <01bd13bc$11d91260$0100007f@pharcyde.poetsoftware.xo.com> Having silently followed the discussion on this list for many weeks, I am delighted to see the latest initiative to produce a parser backend specification which is both simple and elegant. I have also been following the DOM activity very closely, and I very much agree with David on this. There needs to be a clear distinction between what the goals of this list are (besides a lot of very stimulating discussion) and what the DOM is doing. David's suggestion 1) makes this distinction very clear, whereas 2) is somewhat of a mystery to me. Please correct me if I am wrong, but couldn't the phases in the "life" of an XML document be summed as follows: Text -> Events -> Grove There is no point that I can see in going from a tree-based view back to an event stream. The event stream is merely an evolution on the path from text to a grove. Furthermoe, nothing I have seen in the SAX proposal looks anything remotely like a simplified DOM. We are talking about two complete different concepts here. I am not suggesting coupling SAX too closely with the DOM or making anything dependent on the DOM release. What needs to be made clear is that if you are going from text to events, you need SAX (and this might be all that is needed for a lot of apps which are interpreting XML as a real-time event stream), and if you want to work on a grove, you need the DOM. If you need to go from text to a grove, a single SAX implementation of a DOM builder would be sufficient to support ALL parsers using SAX. This seems to be the killer synergy we are looking for, doesn't it? Why would you ever want to go from a nice, clean grove representation back to parser events? (This is not rhetorical -- I really think I need enlightenment here). Anyway, chalk up one strong vote for 1). Matthew -----Original Message----- From: David Megginson <ak117@freenet.carleton.ca> To: xml-dev@ic.ac.uk <xml-dev@ic.ac.uk> Date: Saturday, December 27, 1997 9:08 PM Subject: RE: IDL? >Peter Murray-Rust writes: > > > >1) (My suggestion.) A pre-DOM interface, defining the events returned > > > by an XML parser, and providing enough information to build a DOM > > > tree (PIs, attributes, elements, data, DOCTYPE declarations, etc.). > [...] > > By "pre-DOM" I assume you mean: > > - valid only until the DOM comes into effect > > - (possibly) a subset of DOM functionality. > >Actually, I am using it in a linear-processing model (you must view >this with a monospaced font like Courier): > >* David's Model: > > PARSER --> SAX-J --> DOM --> [tree-based user application] > | > --------------> [event-based user application] > >In other words, a DOM builder would be just another an event-based >SAX-J application. > > > >2) (Tim's suggestion.) A post-DOM interface, for people who don't > > > want to learn the complexity of the DOM, and providing only the > > > minimum possible information (elements, attributes, and data). > > > > By "post-DOM" I assume you mean "will not be onsoleted by the DOM", rather > > than "cannot be put into operation until the DOM. > >In this case, I am using a slightly different model: > >* Tim's Model: > > PARSER --> DOM --> SAX-J --> [event-based user application] > | > ---------------> [tree-based user application] > >In other words, SAX-J would be just a simpler event-based API for the >DOM. > >I don't see a very pressing need for the latter -- tree structures are >familiar to coders, whether or not know anything about XML -- but I >would be happy to implement it if requested (I do not want to exclude >PI's, however, since that will exclude the possiblity of using >architectural forms and other standards working on top of XML). > >Is this what everyone else is expecting? > > >All the best, > > >David > >-- >David Megginson ak117@freenet.carleton.ca >Microstar Software Ltd. dmeggins@microstar.com > http://home.sprynet.com/sprynet/dmeggins/ > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Mon Dec 29 11:53:48 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:40 2004 Subject: Goals of XML-DEV (was Re: IDL?) In-Reply-To: <01bd13bc$11d91260$0100007f@pharcyde.poetsoftware.xo.com> Message-ID: <3.0.1.16.19971229125057.363f5bc2@pop3.demon.co.uk> At 19:11 28/12/97 +0100, Matthew Gertner wrote: >Having silently followed the discussion on this list for many weeks, I am >delighted to see the latest initiative to produce a parser backend >specification which is both simple and elegant. I have also been following >the DOM activity very closely, and I very much agree with David on this. >There needs to be a clear distinction between what the goals of this list >are (besides a lot of very stimulating discussion) and what the DOM is >doing. Thanks for your support, Matthew. This initiative is very central to what I see the aims of XML-DEV are. After nearly a year it may be worth some gentle discussion of those goals. <PREAMBLE> XML-SIG, XML-WG, XML-DOM, etc. are all *formal* parts of the W3C process. The W3C is composed of a large number of member organisations who pay non-negligible subscriptions to support its work. XML-DEV is *not* part of the W3C process and I would see it as secondary - but hopefully complementary - to any formal W3C process. Thus if the W3C already had an initiative to produce a simple parser backend, I would recommend that it should not be pursued on XML-DEV. Similarly, if any activity on W3C appeals to the W3C it would be entirely appropriate for them to subsume some or all of it. </PREAMBLE> XML-DEV is "a list for XML developers". Henry Rzepa maintains the list membership and mechanics and I act as unofficial "moderator" - no messages are formally moderated. I would see the following as outside the remit of XML-DEV: - development *of* XML(XLL/XSL) [the W3C groups are for this] - non-XML markup languages or information management systems [comp.text.sgml or appropriate newsgroup] - FAQs for XML [http://www.ucc.ie/xml] and answering beginners' queries [comp.text.sgml] - current awareness for XML [http://www.sil.org/sgml/xml.html] I would very much like XML-DEV to be responsible for creating XML resources of value to the developer community (and possibly beyond that). A resource is something tangible that can be used by more than one person. Examples (all of which have occurred on XML-DEV) could be: - XML-compliant software - SGML DTDs converted to XML - protocols for dealing with whitespace, characters, etc. - interfaces for software - entity sets in XML - clarification of how to *implement* the specs - test documents (including "torture" tests) - collections of useful resources (software, documents, etc.) - know-how about management of distributed documents (XML with MIME, jars, etc.) - performance considerations I cannot speak for the WG or SIG, but it is clear that the formal processes are very busy "just" constructing the language specs. They do not have resources to create testbed systems and it has been my hope that resources made available on XML-DEV have been of value in confirming (or questioning) the *implementability* of the spec. For example, when the first generation of parsers came to be written, it became clear that the initial spec for parameter entities caused problems in interpretation and implementation; the WG subsequently revised PEs to a simpler form. XML-DEV is a place where early implementers can announce their resources, ask for volunteers to test them, give feedback to the WG/SIG. One of my main motivations is to try to limit unnecessary "semantic drift" in XML implementations. Although the syntax of XML is very carefully defined it semantics are (?deliberately) not. Different implementers will use their experience (or lack of) with SGML to interpret the spec in different ways. this flexibility can be valuable, but it should be clear that it is happening. Here is an example: <!DOCTYPE FOO SYSTEM "foo.dtd" [ ]> <FOO>bar</FOO> What would you expect a parser to do with such input? What would happen if the file/URL "foo.dtd" couldn't be found? Already the parser writers have a variety of *implied* semantics for this document. What I would hope is that we didn't evolve an early *inconsistent* set of behaviour for this sort of thing so that we end up with software-dependent documents. This is a real danger with XML if there is no public analysis of implementations. I believe that XML will need an RFC-like area for semantic and related problems - I suggested the use of a namespace "XDEV" for this. If this is provided elsewhere by a formal body, fine. Until such time, XML-DEV is available for communal systematisation if sufficient people feel it worthwhile. In a virtual environment you can never *force* people to do things they don't want to but I and others will hope to nurture any growing points we see :-) P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Dec 29 12:03:42 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? In-Reply-To: <Pine.GSO.3.96.971228102653.27324F-100000@calum> References: <199712271242.HAA00378@unready.microstar.com> <Pine.GSO.3.96.971228102653.27324F-100000@calum> Message-ID: <199712290121.UAA00293@unready.microstar.com> Paul Prescod writes: > > In other words, a DOM builder would be just another an event-based > > SAX-J application. > > I don't think that this model is possible. It would require SAX to support > all of the information that the DOM needs. That would knock the "Simple" > right out of SAX. Not necessarily. A level-one DOM does not require that much, and we could elect not to deliver certain information (like comments). I am not suggesting that we deliver the information required for the XML-specific DTD nodes. If we omit comments, then SAX-J would have to return only the following information: - elements - attributes - PIs - texts This should be sufficient for building a useful DOM. Strictly by the book, we specify whether each attribute was specified or defaulted, and we should specify which text is ignorable whitespace. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Dec 29 12:03:48 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? In-Reply-To: <3.0.1.16.19971228134641.2b9fb744@pop3.demon.co.uk> References: <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> <199712261255.HAA00299@unready.microstar.com> <199712271242.HAA00378@unready.microstar.com> <3.0.1.16.19971228134641.2b9fb744@pop3.demon.co.uk> Message-ID: <199712290112.UAA00280@unready.microstar.com> Peter Murray-Rust writes: > I appreciate that for those constructing parsers, the question of > where SAX-J sits w.r.t. DOM is important and possibly affects their > ease in implementing SAX-J. And, of course, we must make it easy > for parser writers, since if it isn't they won't play. From the > *user*'s point of view, the interior of SAX-J is irrelevant :-) Unfortunately, the question matters greatly for the _design_ of SAX-J: if SAX-J has to provide enough information to build a level-one DOM, it will have to deliver _all_ of the required information; if not, then it can be somewhat simpler. (It is possible, of course, that changes to the DOM's information set will then require future changes to SAX-J, but changes to DOM terminology will not.) All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Mon Dec 29 12:04:15 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? In-Reply-To: <01bd13bc$11d91260$0100007f@pharcyde.poetsoftware.xo.com> References: <01bd13bc$11d91260$0100007f@pharcyde.poetsoftware.xo.com> Message-ID: <199712290131.UAA00297@unready.microstar.com> Matthew Gertner writes: > Please correct me if I am wrong, but couldn't the phases in the > "life" of an XML document be summed as follows: > Text -> Events -> Grove There is no point that I can see in going > from a tree-based view > back to an event stream. The event stream is merely an evolution on > the path from text to a grove. Furthermoe, nothing I have seen in > the SAX proposal looks anything remotely like a simplified DOM. We > are talking about two complete different concepts here. An event-based call-back interface would be useful for automatic traversal of a DOM tree (rather than iterating through an enumeration), but the callbacks should then take DOM nodes as arguments. Personally, I believe that an event-based interface is almost always more difficult to use and understand than a tree-based interface -- it requires the user to manage stacks and allocate objects herself. On the other hand, for advanced programmers, and event-based interface has important advantages: - it allows linear processing of very large documents with very little memory - it can save the waste of building two separate trees, when the user needs to build a different kind of tree from the XML document For me, then, the advantage of a common interface was not to help naive coders, but to provide a standardised low-level access to XML documents; to strain an analogy, SAX-J would be the IP to the DOM's TCP. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Mon Dec 29 13:49:48 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? In-Reply-To: <199712290121.UAA00293@unready.microstar.com> References: <Pine.GSO.3.96.971228102653.27324F-100000@calum> <199712271242.HAA00378@unready.microstar.com> <Pine.GSO.3.96.971228102653.27324F-100000@calum> Message-ID: <3.0.1.16.19971229144650.3ef772a0@pop3.demon.co.uk> At 20:21 28/12/97 -0500, David Megginson wrote: [...] >Not necessarily. A level-one DOM does not require that much, and we >could elect not to deliver certain information (like comments). I am >not suggesting that we deliver the information required for the >XML-specific DTD nodes. > >If we omit comments, then SAX-J would have to return only the I would strongly support omitting comments from SAX-J, if only to prevent them being used for carrying inappropriate information :-) >following information: > >- elements >- attributes >- PIs >- texts I am very happy to settle for these. > >This should be sufficient for building a useful DOM. Strictly by the >book, we specify whether each attribute was specified or defaulted, This presumably requires an ATTLIST for the appropriate element (but does not require a validating parser.) The (only) difficulty is deciding on the conventions/terminology to be used. if we can agree on this it would be a useful step forward. >and we should specify which text is ignorable whitespace. This seems to me to require a validating parser, or at least an algorithm which maps contentDecl onto content. Without this I can't see how you can decide whether inter-element whitespace is declared PCDATA (in the contentDecl) or ignorable. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Mon Dec 29 13:53:04 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? In-Reply-To: <199712290112.UAA00280@unready.microstar.com> References: <3.0.1.16.19971228134641.2b9fb744@pop3.demon.co.uk> <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> <199712261255.HAA00299@unready.microstar.com> <199712271242.HAA00378@unready.microstar.com> <3.0.1.16.19971228134641.2b9fb744@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971229143920.31efb234@pop3.demon.co.uk> At 20:12 28/12/97 -0500, David Megginson wrote: >Peter Murray-Rust writes: > > > I appreciate that for those constructing parsers, the question of > > where SAX-J sits w.r.t. DOM is important and possibly affects their > > ease in implementing SAX-J. And, of course, we must make it easy > > for parser writers, since if it isn't they won't play. From the > > *user*'s point of view, the interior of SAX-J is irrelevant :-) > >Unfortunately, the question matters greatly for the _design_ of SAX-J: >if SAX-J has to provide enough information to build a level-one DOM, >it will have to deliver _all_ of the required information; if not, >then it can be somewhat simpler. (It is possible, of course, that >changes to the DOM's information set will then require future changes >to SAX-J, but changes to DOM terminology will not.) I understand. I therefore don't cast a vote as I am not able to make sufficiently valuable comments on the precise relationship of SAX-J to the DOM. I hope that you are interacting offline with Tim and others and will come up with a proposal that (at least that group) feels is workable. I shall be more than happy with whatever comes out and will pledge that JUMBO will interoperate with it. (JUMBO has been fighting AWT 1.02 and the dreaded rhinovirus recently, but should manage). P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Mon Dec 29 13:59:00 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? In-Reply-To: <199712291342.NAA29221@nathaniel.eps.inso.com> References: <3.0.1.16.19971225152720.20cf3544@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971229145607.364771ae@pop3.demon.co.uk> At 13:42 29/12/97 GMT, Gavin Nicol wrote: [...] > >If people can just model the interfaces in Java or C++, >I volunteer to convert an non-IDL specification into legal CORBA IDL. What a splendid offer, Gavin - thanks. It will be instructive for me (and probably to others) to see what a CORBA IDL looks like. I am sure DavidM or TimB can let you know their thinking up to now. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From matthewg at poet.de Mon Dec 29 15:38:43 1997 From: matthewg at poet.de (Matthew Gertner) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? Message-ID: <01bd146f$16bb9740$e8ecaec7@pharcyde.poetsoftware.xo.com> David, Thanks for the clarification. I understand the distinction a bit better now. As you say, the "events" received when traversing a DOM tree would be different from the events emitted by a parser since they would contain DOM data types. It seems to me that a standard tree iterator interface is what we are looking for in this case (this is how we perform tree traversal in our Wildflower SGML/XML repository). It is certainly worth discussing whether such an interface could be derived or otherwise related to an event-based parser backend. My gut tells me no, for the reason you mentioned (use of post-DOM information), as well as the practical consideration that specifying this type of interface would more logically be subsumed under the DOM. Matthew -----Original Message----- From: David Megginson <ak117@freenet.carleton.ca> To: Matthew Gertner <matthewg@poet.de> Cc: xml-dev@ic.ac.uk <xml-dev@ic.ac.uk> Date: Monday, December 29, 1997 1:14 PM Subject: Re: IDL? >Matthew Gertner writes: > > > Please correct me if I am wrong, but couldn't the phases in the > > "life" of an XML document be summed as follows: > > Text -> Events -> Grove There is no point that I can see in going > > from a tree-based view > > back to an event stream. The event stream is merely an evolution on > > the path from text to a grove. Furthermoe, nothing I have seen in > > the SAX proposal looks anything remotely like a simplified DOM. We > > are talking about two complete different concepts here. > >An event-based call-back interface would be useful for automatic >traversal of a DOM tree (rather than iterating through an >enumeration), but the callbacks should then take DOM nodes as >arguments. > >Personally, I believe that an event-based interface is almost always >more difficult to use and understand than a tree-based interface -- it >requires the user to manage stacks and allocate objects herself. On >the other hand, for advanced programmers, and event-based interface >has important advantages: > >- it allows linear processing of very large documents with very little > memory > >- it can save the waste of building two separate trees, when the user > needs to build a different kind of tree from the XML document > >For me, then, the advantage of a common interface was not to help >naive coders, but to provide a standardised low-level access to XML >documents; to strain an analogy, SAX-J would be the IP to the DOM's >TCP. > > >All the best, > > >David > >-- >David Megginson ak117@freenet.carleton.ca >Microstar Software Ltd. dmeggins@microstar.com > http://home.sprynet.com/sprynet/dmeggins/ > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Mon Dec 29 16:21:13 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? In-Reply-To: <199712271242.HAA00378@unready.microstar.com> References: <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> <199712261255.HAA00299@unready.microstar.com> <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> Message-ID: <3.0.3.32.19971229111332.03102458@pop.mindspring.com> At 07:42 AM 12/27/97 -0500, David Megginson wrote: >* David's Model: > > PARSER --> SAX-J --> DOM --> [tree-based user application] > | > --------------> [event-based user application] > >In other words, a DOM builder would be just another an event-based >SAX-J application. This is precisely the way I see it. For those who think we need something simpler than the DOM, please explain what it is that you would supply, and how it is simpler than the DOM. To me, the part of the DOM that deals with elements and attributes seems about as simple as you can get. For those who think that web hackers can't grok the DOM, can web hackers grok dynamic HTML? Didn't the DOM start out as a way to do a browser-independent version of dynamic HTML? Yes, it has added functionality since then, but I think that the part of the DOM that web hackers need is also easily understood by web hackers. On the other hand, better documentation for this subset would be useful, but the standard isn't finished yet... Jonathan jonathan@texcel.no Texcel Research http://www.texcel.no xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jwrobie at mindspring.com Mon Dec 29 16:21:18 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? In-Reply-To: <3.0.1.16.19971228134641.2b9fb744@pop3.demon.co.uk> References: <199712271242.HAA00378@unready.microstar.com> <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> <7BB61B44F197D011892800805FD4F7920244418E@red-03-msg.dns.microsoft.com> <199712261255.HAA00299@unready.microstar.com> <3.0.1.16.19971226204232.2dcf90a6@pop3.demon.co.uk> Message-ID: <3.0.3.32.19971229111749.03101198@pop.mindspring.com> At 01:46 PM 12/28/97, Peter Murray-Rust wrote: >At 07:42 27/12/97 -0500, David Megginson wrote: >>In other words, a DOM builder would be just another an event-based >>SAX-J application. >This suggests that implementers of the DOM (or other tree-related >interfaces) will build on top of SAX-J. IFF you/we can persuade them of >this, then great. If not, then there might be a tendency for SAX-J to >atrophy after the DOM. There will always be applications that prefer an event-based API, and they may never want to or need to convert to a tree view. It would be nice to have a standardized event-based API in addition to the standardized tree-based API that the DOM provides... Jonathan jonathan@texcel.no Texcel Research http://www.texcel.no xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From marcus at lab.com Mon Dec 29 18:44:30 1997 From: marcus at lab.com (Wendell Piez) Date: Mon Jun 7 16:59:41 2004 Subject: Still quizzing Microsoft on minimization Message-ID: <34A7EFE9.1477822C@lab.com> FWIW, this is quoted from a page currently on the Microsoft site: http://www.microsoft.com/msdn/news/drgui/102097.htm -- > Tags may be ended in one of three ways. As in HTML, <TAG> is ended by > </TAG>. Since XML is strict about proper nesting, you can also end the > innermost tag with </>, which is much simpler.... I suppose this may result from nothing more nefarious than confusion within Microsoft with regard to what makes legal XML. But might not such confusion be consequential if and as it spills into the marketplace? Or is this copy merely out of date? (The file name suggests it was keyed in October.) Is it still a live question whether Microsoft applications and/or guidelines will be fostering the deployment of "XML" repositories using tag minimization? I browsed this page today, Dec. 29 1997. Respectfully, Wendell Piez HuskyLabs wendell@lab.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Mon Dec 29 18:49:25 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:59:41 2004 Subject: Still quizzing Microsoft on minimization Message-ID: <7BB61B44F197D011892800805FD4F79202444193@red-03-msg.dns.microsoft.com> Thanks for catching the error. That copy is out-of-date. I'll send a note to Dr. Gui and ask for it to be fixed. --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: Wendell Piez [SMTP:marcus@lab.com] > Sent: Monday, December 29, 1997 10:46 AM > To: XML Dev > Subject: Still quizzing Microsoft on minimization > > FWIW, this is quoted from a page currently on the Microsoft site: > http://www.microsoft.com/msdn/news/drgui/102097.htm -- > > > Tags may be ended in one of three ways. As in HTML, <TAG> is ended by > > </TAG>. Since XML is strict about proper nesting, you can also end the > > innermost tag with </>, which is much simpler.... > > I suppose this may result from nothing more nefarious than confusion > within Microsoft with regard to what makes legal XML. But might not such > confusion be consequential if and as it spills into the marketplace? > > Or is this copy merely out of date? (The file name suggests it was keyed > in October.) Is it still a live question whether Microsoft applications > and/or guidelines will be fostering the deployment of "XML" repositories > using tag minimization? I browsed this page today, Dec. 29 1997. > > Respectfully, > Wendell Piez > HuskyLabs > wendell@lab.com > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Tue Dec 30 02:37:44 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:41 2004 Subject: Dr GUI Message-ID: <199712300240.NAA02844@jawa.chilli.net.au> FWIW, this is quoted from a page currently on the Microsoft site: > http://www.microsoft.com/msdn/news/drgui/102097.htm -- > A DTD is used to define a grammar for the tags and attributes. > This syntax is going to be supported, but deprecated by Microsoft. > It uses a special non-XML-based grammar that looks like the following: The DTD declaration syntax is not "non-XML". (...much deleted...) Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 30 09:27:48 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:41 2004 Subject: Still quizzing Microsoft on minimization In-Reply-To: <7BB61B44F197D011892800805FD4F79202444193@red-03-msg.dns.mi crosoft.com> Message-ID: <3.0.1.16.19971230102454.355f3ea2@pop3.demon.co.uk> At 10:49 29/12/97 -0800, Andrew Layman wrote: >Thanks for catching the error. That copy is out-of-date. I'll send a note >to Dr. Gui and ask for it to be fixed. Many thanks for your quick and effective response, Andrew. I am sure that many other members of XML-DEV appreciate this. P. [...quoted material snipped...] Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From Patrice.Bonhomme at loria.fr Tue Dec 30 10:57:50 1997 From: Patrice.Bonhomme at loria.fr (Patrice Bonhomme) Date: Mon Jun 7 16:59:41 2004 Subject: Job announcement: network programmer within a linguistic resource application Message-ID: <199712301057.LAA05990@chimay.loria.fr> [Not exactly the subject of this list but could interrest someone] Job announcement: network programmer within a linguistic resource application In the context of a European project aiming at networking linguistic resource servers, we seek a graduated software engineer with a few year experience in network programming and/or linguistic engineering for a one year position starting in February 1997. The applicant should have a good knowledge of Java and some experience in document management using SGML/XML (e.g using the Text Encoding Initiative guidelines). A good knowledge of English is necessary and preferably some knowledge of French. The net salary should be around 1500 Ecus per month depending on experience. Job location: LORIA, Laboratoire Lorrain de Recherche en Informatique et ses Applications B.P. 239, F-54506 Vandoeuvre Les Nancy http://www.loria.fr For further details please contact: Laurent Romary romary@loria.fr -- ============================================================== bonhomme@loria.fr | Office : B.228 http://www.loria.fr/~bonhomme | Phone : 03 83 59 20 37 -------------------------------------------------------------- * Projet Aquarelle : http://aqua.inria.fr * Serveur Silfide : http://www.loria.fr/Projet/Silfide ============================================================== xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Tue Dec 30 11:42:53 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? In-Reply-To: <199712291428.OAA29285@nathaniel.eps.inso.com> References: <199712290121.UAA00293@unready.microstar.com> <199712291428.OAA29285@nathaniel.eps.inso.com> Message-ID: <199712291653.LAA00310@unready.microstar.com> Gavin Nicol writes: > >If we omit comments, then SAX-J would have to return only the > >following information: > > > >- elements > >- attributes > >- PIs > >- texts > > I would recommend that you also leave in comments. That would allow > you to build a DOM representation, and write it out again such that > it would be very close to the original. My guess is that the level-one DOM includes comments only to support messed-up HTML-related tools that hide what should be processing instructions in comments. I don't know if we really want to dirty ourselves with this in XML, unless we can come up with a compelling reason. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Tue Dec 30 14:03:49 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:41 2004 Subject: msxml 1.8 questions Message-ID: <199712301403.OAA13293@mail.iol.ie> Any help with the following gratefully received! 1) Is there any way to avoid having to use the /cp switch to specify the class path. Setting the CLASSPATH environment variable does not help. 2) What is happening to my whitespace:- c>type foo.xml <Document> Some content <IAmEmpty/> Some more content</Document> c>jview.exe /cp c:\m;c:\m\classes msxml -d1 foo.xml DOCUMENT |---ELEMENT Document | |---PCDATA " Some content " | |---ELEMENT IAmEmpty | +---PCDATA " Some more content" +---WHITESPACE 0xa 3) The XMLViewer applet seems to be uppercasing element type names. Can I turn this off? Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Tue Dec 30 14:18:04 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:41 2004 Subject: SAX-J and the DPH (DJH?) Message-ID: <199712301417.OAA14285@mail.iol.ie> > > >If we omit comments, then SAX-J would have to return only the > > >following information: > > > > > >- elements > > >- attributes > > >- PIs > > >- texts > > I am wondering where XML-Devers see SAX-J relative to DPH?. He is the guy staring at 10GB of corporate XML docs and just wants to change <telephone>555-1234</telephone> into <telephone>555-4321</telepone> throughout the whole lot. If he uses SAX-J he blows all the aide-memoires stuffed into his comments? He also blows his CDATA marked sections. Is the DPH destined to always use a fully blown parser to output XML? Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Tue Dec 30 15:30:30 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:59:41 2004 Subject: SAX-J and the DPH (DJH?) In-Reply-To: <199712301417.OAA14285@mail.iol.ie> (message from Sean Mc Grath on Tue, 30 Dec 1997 14:17:55 GMT) Message-ID: <199712301535.KAA08044@geode.ora.com> [Sean McGrath] > I am wondering where XML-Devers see SAX-J relative to DPH?. By definition, the DPH isn't going to have much to do with SAX-J. However, you raise a good point; the SAX-J folks should decide whether an identity transform is a goal, and if so, comments need to be included. If not, then SAX-J is not an appropriate tool to use for document transformations. > He is the guy staring at 10GB of corporate XML docs and just wants > to change > > <telephone>555-1234</telephone> > > into > > <telephone>555-4321</telepone> > > throughout the whole lot. If he uses SAX-J he blows all the > aide-memoires stuffed into his comments? He also blows his CDATA > marked sections. Is the DPH destined to always use a fully blown > parser to output XML? Hell, no! He'd be a fool to use Java for this when s/555-1234/555-4321/g will do the trick. (I suspect that in any real-world situation like this, *all* numbers would want to be changed, but if not, just include the markup in the sed command.) -Chris -- <!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN" "<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487 <USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Tue Dec 30 15:42:35 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:41 2004 Subject: LISTRIVIA (was Re: Job announcement: network programmer within a linguistic resource application) In-Reply-To: <199712301057.LAA05990@chimay.loria.fr> Message-ID: <3.0.1.16.19971230141141.1ed7cf24@pop3.demon.co.uk> At 11:57 30/12/97 +0100, Patrice Bonhomme wrote: > >[Not exactly the subject of this list but could interrest someone] We have been relaxed so far about what has been posted to XML-DEV, since the traffic has been of manageable volume and high quality. We have little experience of how it will develop. "Grey" areas include: - job postings - product announcements and other commercial information - meeting announcements I suggest that the guideline should be: "Does a posting bring useful information to a significant number of XML-developers? Since a large number (if not all) SGML-based groups will be actively looking at XML, the phrase "SGML/XML" will become common. Clearly for meetings like SGML/XML97 this list is very appropriate (since many XML-related matters will have been discussed there for the first time; in fact I'm sad that no-one has reported any feedback from this meeting). However, general SGML/XML matters are probably more appropriate to comp.text.sgml. Since XML is clearly entering exponential phase, we can expect an increasing numbers of members, postings, announcements, etc. It will be important to limit these to "matters of interest to XML developers" and so far this has worked very well. > >Job announcement: network programmer within a linguistic resource application [... deleted...] P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From paul at arbortext.com Tue Dec 30 16:04:19 1997 From: paul at arbortext.com (Paul Grosso) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? Message-ID: <97Dec30.110135est.18822@thicket.arbortext.com> At 11:53 1997 12 29 -0500, David Megginson wrote: >Gavin Nicol writes: > > > >If we omit comments, then SAX-J would have to return only the > > >following information: > > > > > >- elements > > >- attributes > > >- PIs > > >- texts > > > > I would recommend that you also leave in comments. That would allow > > you to build a DOM representation, and write it out again such that > > it would be very close to the original. > >My guess is that the level-one DOM includes comments only to support >messed-up HTML-related tools that hide what should be processing >instructions in comments. I don't know if we really want to dirty >ourselves with this in XML, unless we can come up with a compelling reason. Your guess is wrong. The DOM has several reference applications including editors and data repositories, and retention of comments is a requirement for these applications. This is true for XML and has nothing to do with "dirty" use of comments in HTML. paul xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ebaatz at barbaresco.East.Sun.COM Tue Dec 30 16:30:11 1997 From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? Message-ID: <libSDtMail.199712301128.23378.ebaatz@barbaresco> David Megginson <ak117@freenet.carleton.ca> writes: > My guess is that the level-one DOM includes comments only to support > messed-up HTML-related tools that hide what should be processing > instructions in comments. I don't know if we really want to dirty > ourselves with this in XML, unless we can come up with a compelling > reason. I can't comment on DOM motivation, but my use of XML is to transform text that has been marked up with XML for better pronunciation into text marked up with the commands of a specific speech synthesizer. Having comments in the transformed text that state the intent and techniques of the markup is invaluable in trying to figure why something does not sound right and how to fix it. Eric Baatz Sun Microsystems Laboratories 2 Elizabeth Drive, MS UCHL03-207 (978) 442-0257 Chelmsford, MA 01824 fax: (978) 250-5067 USA Internet: eric.baatz@east.sun.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Tue Dec 30 16:31:03 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:41 2004 Subject: SAX-J and the DPH (DJH?) Message-ID: <199712301630.QAA23686@mail.iol.ie> [Sean Mc Grath] >> He is the guy staring at 10GB of corporate XML docs and just wants >> to change >> >> <telephone>555-1234</telephone> >> >> into >> >> <telephone>555-4321</telepone> >> >> throughout the whole lot. If he uses SAX-J he blows all the >> aide-memoires stuffed into his comments? He also blows his CDATA >> marked sections. Is the DPH destined to always use a fully blown >> parser to output XML? > [Chris Maden] >Hell, no! He'd be a fool to use Java for this when >s/555-1234/555-4321/g will do the trick. (I suspect that in any >real-world situation like this, *all* numbers would want to be >changed, but if not, just include the markup in the sed command.) My example was a bit simplistic. This transformation is no more difficult in fully blown SGML that it is in XML. The fun starts for the D⟨H when including the markup in the SED command is not an option due to the hierarchical sensitivity of the task. e.g. just telephone numbers occuring within the appendix elements and skipping those where the client attribute has the value = "Jones". That sort of thing. Maybe nothing short of a fully blown XML parser will do for these situations? Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Tue Dec 30 17:40:17 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:41 2004 Subject: Still quizzing Microsoft on minimization Message-ID: <199712301740.RAA28723@mail.iol.ie> The </> minimisation is used in the DSO examples that come with msxml 1.8 and still parse with msxml. >Thanks for catching the error. That copy is out-of-date. I'll send a note >to Dr. Gui and ask for it to be fixed. > >--Andrew Layman > AndrewL@microsoft.com > >> -----Original Message----- >> From: Wendell Piez [SMTP:marcus@lab.com] >> Sent: Monday, December 29, 1997 10:46 AM >> To: XML Dev >> Subject: Still quizzing Microsoft on minimization >> >> FWIW, this is quoted from a page currently on the Microsoft site: >> http://www.microsoft.com/msdn/news/drgui/102097.htm -- >> >> > Tags may be ended in one of three ways. As in HTML, <TAG> is ended by >> > </TAG>. Since XML is strict about proper nesting, you can also end the >> > innermost tag with </>, which is much simpler.... Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From tbray at textuality.com Tue Dec 30 17:48:59 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:59:41 2004 Subject: Still quizzing Microsoft on minimization Message-ID: <3.0.32.19971230095031.009b73c0@pop.intergate.bc.ca> At 05:40 PM 30/12/97 GMT, Sean Mc Grath wrote: >The </> minimisation is used in the DSO examples that come with msxml 1.8 >and still parse with msxml. On top of which, as recently as the second week in December, Microsoft people were showing "XML Examples" from the stage at Internet World in New York that had the </> minimization. At the time, they said they'd fix it Real Soon Now. I just now hit the MS site and did a search for "Data Source Object" and all the search results returned "404 not found" when I tried to get 'em, so maybe the fix is in progress as we speak. -Tim xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Tue Dec 30 21:42:18 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:59:41 2004 Subject: Still quizzing Microsoft on minimization Message-ID: <7BB61B44F197D011892800805FD4F792024441B6@red-03-msg.dns.microsoft.com> As I mentioned in earlier mail, I've written to the people who manage the site with an exhaustive list of things that need fixing. It will be fixed. --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: Tim Bray [SMTP:tbray@textuality.com] > Sent: Tuesday, December 30, 1997 9:51 AM > To: xml-dev@ic.ac.uk > Subject: RE: Still quizzing Microsoft on minimization > > At 05:40 PM 30/12/97 GMT, Sean Mc Grath wrote: > >The </> minimisation is used in the DSO examples that come with msxml 1.8 > >and still parse with msxml. > > On top of which, as recently as the second week in December, Microsoft > people were showing "XML Examples" from the stage at Internet World in > New York that had the </> minimization. At the time, they said they'd > fix it Real Soon Now. I just now hit the MS site and did a search for > "Data Source Object" and all the search results returned "404 not found" > when I tried to get 'em, so maybe the fix is in progress as we speak. > -Tim > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From andrewl at microsoft.com Tue Dec 30 21:48:50 1997 From: andrewl at microsoft.com (Andrew Layman) Date: Mon Jun 7 16:59:41 2004 Subject: Still quizzing Microsoft on minimization Message-ID: <7BB61B44F197D011892800805FD4F792024441B8@red-03-msg.dns.microsoft.com> Thanks. I've passed this along. As you can imagine, there is a substantial amount of lead time on getting examples written and checked. Particularly when specifications are not yet stable--as XML was not until recently--there are going to be deviations between what is published early and what the final specifications contain. I'm sure I'll miss a few of the errors that need correcting, so if you find additional fixes that need to be made, please send me mail at AndrewL@microsoft.com. --Andrew Layman AndrewL@microsoft.com > -----Original Message----- > From: Sean Mc Grath [SMTP:digitome@iol.ie] > Sent: Tuesday, December 30, 1997 9:40 AM > To: xml-dev@ic.ac.uk > Cc: Andrew Layman > Subject: RE: Still quizzing Microsoft on minimization > > The </> minimisation is used in the DSO examples that come with msxml 1.8 > and still parse with msxml. > > >Thanks for catching the error. That copy is out-of-date. I'll send a > note > >to Dr. Gui and ask for it to be fixed. > > > >--Andrew Layman > > AndrewL@microsoft.com > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From dcarter at lab.com Tue Dec 30 22:04:46 1997 From: dcarter at lab.com (David Carter) Date: Mon Jun 7 16:59:41 2004 Subject: SAX-J and the DPH (DJH?) In-Reply-To: <199712301630.QAA23686@mail.iol.ie> Message-ID: <3.0.3.32.19971230170341.0091d3ac@iron.butterfly.net> >The fun starts for the D⟨H when including the markup in the >SED command is not an option due to the hierarchical sensitivity of >the task. e.g. just telephone numbers occuring within >the appendix elements and skipping those where the client attribute >has the value = "Jones". That sort of thing. > >Maybe nothing short of a fully blown XML parser will do for these >situations? Actually the example you give is pretty easy to do in perl. I would presume a wide diffused line, though, dividing tasks which require "a fully blown XML parser" and those which can be dispatched by humbler means, and I do think it's an interesting distinction to study. -dc xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 31 02:42:07 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:41 2004 Subject: IDL? In-Reply-To: <97Dec30.110135est.18822@thicket.arbortext.com> References: <97Dec30.110135est.18822@thicket.arbortext.com> Message-ID: <199712310158.UAA00321@unready.microstar.com> Paul Grosso writes: > >My guess is that the level-one DOM includes comments only to support > >messed-up HTML-related tools that hide what should be processing > >instructions in comments. I don't know if we really want to dirty > >ourselves with this in XML, unless we can come up with a compelling reason. > > Your guess is wrong. > > The DOM has several reference applications including editors and > data repositories, and retention of comments is a requirement for > these applications. This is true for XML and has nothing to do > with "dirty" use of comments in HTML. Yes, but is the level-one DOM the right place for such information? The preservation of comments is certainly important for authoring tools, but _much_ more important is the preservation of general entity references in data and in attribute values, neither of which the current draft of the level-one DOM supports. Perhaps both comments and general entity references belong more properly in a level-two DOM rather than in level one, since they deal with lexical issues rather than logical structure. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ricko at allette.com.au Wed Dec 31 02:50:30 1997 From: ricko at allette.com.au (Rick Jelliffe) Date: Mon Jun 7 16:59:41 2004 Subject: Still quizzing Microsoft on minimization Message-ID: <199712310254.NAA03705@jawa.chilli.net.au> > From: Sean Mc Grath <digitome@iol.ie> > The </> minimisation is used in the DSO examples that come with msxml 1.8 > and still parse with msxml. > > >Thanks for catching the error. That copy is out-of-date. I'll send a note > >to Dr. Gui and ask for it to be fixed. Why call this an error? Why not just clearly label it as syntax which is not valid or WF XML, but can be allowed in SGML. That way people know it is not strange or naughty to use it if they need to, it is just not XML. So take it out of the XML examples, but not the msxml parser (or put it in a conditional section to allow a parser called "mssgml" as well from the same code base). The idea that one syntax (i.e. XML) is good for all documents is as bad as saying that there can be one DTD for all documents. SGML allows many variant syntaxes, and even its labarynths don't go nearly far enough to give everyone what they need. XML is a good markup syntax for the rest of us, but not the answer to all needs. Rick Jelliffe xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Wed Dec 31 10:56:44 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:41 2004 Subject: Still quizzing Microsoft on minimization Message-ID: <199712311056.KAA05415@mail.iol.ie> > >> From: Sean Mc Grath <digitome@iol.ie> > >> The </> minimisation is used in the DSO examples that come with msxml 1.8 >> and still parse with msxml. >> >> >Thanks for catching the error. That copy is out-of-date. I'll send a note >> >to Dr. Gui and ask for it to be fixed. > [Rick Jellife] >Why call this an error? Why not just clearly label it as syntax which is >not valid or WF XML, but can be allowed in SGML. Of course this is valid SGML that is not the point! The point is that 10 examples of Dynamic HTML are used to illustrate some pretty cool *XML* functionality using an applet called *XML*dso in a product called ms*XML*. The applet lives is com.ms.*XML*.dso.*XML*DSO.class. The readme HTML file is called *XML*dso.HTM. IT contains 20 references to the word *XML* and is entitled:- "Demo: Microsoft *XML* Data Source Object Applet The com.ms.*xml*.dso package contains an applet called *XML*DSO that can be used as an *XML* data provider in conjunction with the data binding features of Internet Explorer 4.0. for binding *XML* data to HTML element on the page. --------- Are you seriously suggesting that someone new to XML will not construe </> as legit. XML syntax as a result of being impressed (as I was) with the DSO stuff? NOTE: I think MSXML is a great piece of work. But as someone making a living in this field I am kinda anxious to see the standard hold water and get established without feature creep. That way people know >it is not strange or naughty to use it if they need to, it is just not XML. >So take it out of the XML examples, but not the msxml parser (or put it in >a conditional section to allow a parser called "mssgml" as well from the >same code base). > >The idea that one syntax (i.e. XML) is good for all documents is as >bad as saying that there can be one DTD for all documents. SGML allows >many variant syntaxes, and even its labarynths don't go nearly far >enough to give everyone what they need. XML is a good markup syntax >for the rest of us, but not the answer to all needs. > > >Rick Jelliffe > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Wed Dec 31 10:58:34 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:41 2004 Subject: SAX-J and the DPH (DJH?) Message-ID: <199712311058.KAA05512@mail.iol.ie> At 17:03 30/12/97 -0500, you wrote: >>The fun starts for the D⟨H when including the markup in the >>SED command is not an option due to the hierarchical sensitivity of >>the task. e.g. just telephone numbers occuring within >>the appendix elements and skipping those where the client attribute >>has the value = "Jones". That sort of thing. >> >>Maybe nothing short of a fully blown XML parser will do for these >>situations? > >Actually the example you give is pretty easy to do in perl. I would presume >a wide diffused line, though, dividing tasks which require "a fully blown >XML parser" and those which can be dispatched by humbler means, and I do >think it's an interesting distinction to study. Care to suggest a Perl implementation? I for one would find it very instructive to see what problems/issues/patterns emerge from studying a live example. Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ak117 at freenet.carleton.ca Wed Dec 31 12:29:58 1997 From: ak117 at freenet.carleton.ca (David Megginson) Date: Mon Jun 7 16:59:41 2004 Subject: DOM design In-Reply-To: <199712310504.FAA00344@nathaniel.eps.inso.com> References: <199712310158.UAA00321@unready.microstar.com> <199712310504.FAA00344@nathaniel.eps.inso.com> Message-ID: <199712311225.HAA00405@unready.microstar.com> Gavin Nicol writes: > >Perhaps both comments and general entity references belong more > >properly in a level-two DOM rather than in level one, since they deal > >with lexical issues rather than logical structure. > > I, for one, believe comments to be part of the *structure* of a document. Technically, this is not the case. An SGML or XML document has only two well-defined structures: 1) the logical (element/attribute/data) structure; and 2) the physical (entity) structure. The presence or absence of a comment has no effect on either of these, so like CDATA sections and PIs, comments are not structural. However, as has become clear in this discussion, many people do believe that comments are a significant part of an XML document's information set. That is certainly a legitimate view, but since comments are non-structural, they should not _automatically_ qualify for inclusion; instead, someone needs to make a strong case for them, as I have tried to do for PIs (also non-structural) in SAX-J. It still seems to me that it would make much more sense for the level-one DOM to cover only logical structure + PIs (the minimum needed to process XML documents for formatting, online transactions, etc.), while a level-two DOM could cover the physical structure and lexical items needed for editors and repositories (comments, entity references, ignored whitespace, etc.). Sticking everything into the level-one DOM muddies the whole thing unnecessarily. All the best, David -- David Megginson ak117@freenet.carleton.ca Microstar Software Ltd. dmeggins@microstar.com http://home.sprynet.com/sprynet/dmeggins/ xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Dec 31 13:21:59 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:42 2004 Subject: SAX-J and the DPH (DJH?) Message-ID: <199712311321.IAA02684@itrc.uwaterloo.ca> At 10:35 AM 12/30/97 -0500, Chris Maden wrote: >[Sean McGrath] >> I am wondering where XML-Devers see SAX-J relative to DPH?. > >By definition, the DPH isn't going to have much to do with SAX-J. >However, you raise a good point; the SAX-J folks should decide whether >an identity transform is a goal, and if so, comments need to be >included. If not, then SAX-J is not an appropriate tool to use for >document transformations. If a byte-for-byte entity transform (and not just a grove-identity transform) is a requirement, SAX-J would have to keep a whole pile of information about whitespace and perhaps other normalizations that I forget right now... Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Wed Dec 31 15:39:10 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:59:42 2004 Subject: SAX-J and the DPH (DJH?) In-Reply-To: <199712311058.KAA05512@mail.iol.ie> (message from Sean Mc Grath on Wed, 31 Dec 1997 10:58:24 GMT) Message-ID: <199712311543.KAA23457@geode.ora.com> [Sean McGrath] > >>The fun starts for the D⟨H when including the markup in the > >>SED command is not an option due to the hierarchical sensitivity > >>of the task. e.g. just telephone numbers occuring within the > >>appendix elements and skipping those where the client attribute > >>has the value = "Jones". That sort of thing. > >> > >>Maybe nothing short of a fully blown XML parser will do for these > >>situations? > > Care to suggest a Perl implementation? I for one would find it very > instructive to see what problems/issues/patterns emerge from > studying a live example. $inappendix = FALSE; while (<>) { if (/<appendix/) { $inappendix = TRUE; } if (/<\/appendix/) { $inappendix = FALSE; } if ((/^(.*<telephone[^>]*>)555-1234(.*)$/) && $inappendix) { $pre = $1; $post = $2; if (!(/client\s*=\s*["']Jones["']/)) { print $pre . "555-4321" . $post . "\n"; } else { print $_; } } else { print $_; } } This is a somewhat simplistic implementation; for instance, it assumes that there won't be more than one <telephone> on a single input line, and it doesn't scale well for multiple clients that you're filtering out. It also assumes no CDATA marked sections, and demonstrates why </> makes the DPH's life impossible. I haven't actually run this, but it's similar enough to other things I've done that I think it should work. -Chris -- <!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN" "<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487 <USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Wed Dec 31 16:02:47 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:42 2004 Subject: SAX-J and the DPH (DJH?) Message-ID: <199712311602.QAA07243@mail.iol.ie> [Paul Prescod] > >If a byte-for-byte entity transform (and not just a grove-identity >transform) is a requirement, SAX-J would have to keep a whole pile of >information about whitespace and perhaps other normalizations that I forget >right now... > Yes. We need some way to express levels of "logical equivalence" between XML docs. An agreement on this would point the way to a list of what must and what optionally may be in SAX-⟨. Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From papresco at technologist.com Wed Dec 31 17:16:57 1997 From: papresco at technologist.com (Paul Prescod) Date: Mon Jun 7 16:59:42 2004 Subject: Still quizzing Microsoft on minimization Message-ID: <199712311716.MAA13645@itrc.uwaterloo.ca> At 01:51 PM 12/31/97 +1100, Rick Jelliffe wrote: >> >Thanks for catching the error. That copy is out-of-date. I'll send a note >> >to Dr. Gui and ask for it to be fixed. > >Why call this an error? Why not just clearly label it as syntax which is >not valid or WF XML, but can be allowed in SGML. Obviously it is an error in the context of an article on XML. If it were an article on SGML, or SGML/XML, it would be correct. Paul Prescod xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 31 18:11:41 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:42 2004 Subject: Failure Criteria: Simple XML Event-Based API for Java In-Reply-To: <199712171442.JAA00570@unready.microstar.com> References: <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> <003f01bd0acd$a3266da0$0100007f@localhost> <3497ADBF.27C4F668@jclark.com> <3.0.1.16.19971217125845.38c7c89c@pop3.demon.co.uk> Message-ID: <3.0.1.16.19971231190802.3457b9c0@pop3.demon.co.uk> We are in danger of losing focus, yet again. I am sure that recent postings are valuable, but some of the visions are *way* beyond where we started from. I do not believe that we need a byte-for-byte transformation of XML documents, with all the CDATA sections, comments, parameter entities and goodness knows what else preserved. I don't think that TimB and DavidM envisaged this at the start either. I believe it is important that we make progress on this - see below. We have just about got critical mass on this if we buckle down. There are two sorts of posting on this subject: those people who have offered to *do* something; and those people who have offered to tell the others what to do :-). These are all of equal value, but some are more equal than others towards getting the project finished. [Tim Bray - ca 1997-12-12] >I agree with Peter that we should just buckle down and get on with what used >to be known as XAPI. Unless anything fundamental has changed I am still acting on this supposition. I have made what I hope is a constructive suggestion by offering to work with *whatever* comes out of the project. I am deliberately keeping quiet in public on the technical aspects of the interface. At 09:42 17/12/97 -0500, David Megginson wrote: >Peter Murray-Rust writes: > > > Yes. Let's please get this bus into the air. If it needs tweaking > > or junking later, it's not the end of the world :-). I couldn't > > bear it if we go down the same road as we have done 2-3 times > > before, drawing out the process and finally running out of steam. > >Any project should have measurable failure criteria. Here are my >suggestions. > >The Simple XML Event-Based API initiative will have failed if either >of the following is true: > >1) By Monday 12 January 1998, at least three Java parser writers have > not agreed to support a specific set of common interfaces. My understanding is that TimB and DavidM are on board, We also have an offer from Gavin Nicol to turn the result into an IDL. > >2) By Monday 12 January 1998, at least three Java applet or > application authors have not agreed to use the same set of common > interfaces that the parser writers have agreed to support. > MONDO and JUMBO have publicly offered and - forgive me, but without the hypermail I can't find it - another offer of a collaboration. We would be delighted to have other people involved. I suggest that in further postings you think "how can I help the project in a practical sense?" Reasons for SAX-J ----------------- I am very clear that there is a need for SAX-IDL and SAX-J. [If you also feel this, and feel that you can help the project, it would be useful to give additional positive reasons.] I do not know how many people have worked with more than one XML parser, but if you haven't take it on trust that the interfaces - though professionally written - that are provided are sufficiently different to cause a lot of confusion and unnecessary work. It's not trivial to find out the structures that each return, nor necessarily the detailed syntax. [Example - what is the structure of a PI? A string,? 2 strings? a string + nameValuePair*? a string which includes the <?...?>?, etc.] What is an Entity and how many different sorts are there? The consequence of this is that most people will probably start with one parser and stick with it. The problem with this is that most parsers at present have (a) bugs and (b) features. Example: <!DOCTYPE CML SYSTEM "cml.dtd"> What does the parser do if (a) cml.dtd exists? (b) cml.dtd exists but does not contain an element CML (c) exists but is broken (d) does not exist. [This may not be covered by the proposed API, but it's typical of the *many* unresolved problems that parser-writers have to face and where different ones have different behaviours.] I know that some people think that XML shouldn't control anything at this level ("it's up to the application"). That's a valid view but not everyone thinks that way and some of us need some more communality. In this way we shall get parser-application combinations which do not interoperate. In a mature XML world this isn't a problem, but at this stage there is a lot that we have yet to learn about how XML works. So I believe it's critical to define what comes out of a parser, and SAX is the most immediate way of doing this. As far as I know no-one has even compared the output of different parsers to see whether they are consistent. [JUMBO does this, but in pictorial form]. An underlying objection is that "we should simply wait for the DOM". I don't know how far away this is, but (a) by that time we shall have a variety of incompatible parsers and (b) if recent postings here are relevant to the DOM there will be considerable discussion about what should be in it. I also believe that for many applications the DOM is inappropriate, being too complicated for what many people need. In my naivety I thought there was value in a "Simple" interface - i.e. one which wasn't complicated. The advantage of this is that newcomers to XML could use it before moving on to more complex applications. "Simple" seems to have mutated into "elementary", as in "elementary particle". My feeling was that Lark, AElfred, et al are all - at present - simpler than the DOM and that they represented a rough communality of a greater degree of simplicity. In recent postings nobody seems to think there is a need for a less complex interface - the general impression is that "it's not worth doing anything for the DPH because they won't want anything we give them." I'd disagree, and maybe if no-one else does it it will emerge out of the present exercise. ----------------- I appreciate that it's difficult to home in on exactly what subset of functionality is required in SAX, but this now has to be left to the people actually working on it. P. Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Wed Dec 31 18:56:48 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:42 2004 Subject: SAX-J and the DPH (DJH?) Message-ID: <199712311856.SAA20429@mail.iol.ie> So this works if: 1) No more than 1 telephone number per line [Chris] 2) No cdata marked sections [Chris] 3) The attribute value literal for client does not have any entity references [Sean - suggested] 4) The target telephone number does not contain entity references [Sean - suggested ] 5) appendix elements do not nest [Sean - suggested] 6) Telephone numbers do not nest (problem if regexp matching is greedy) [Sean - suggested] Others? I think a little list of "gotchas" like this would find the way onto many a DPH's wall (including mine!). [Chris Maden] > >$inappendix = FALSE; > >while (<>) { > if (/<appendix/) { > $inappendix = TRUE; > } > if (/<\/appendix/) { > $inappendix = FALSE; > } > if ((/^(.*<telephone[^>]*>)555-1234(.*)$/) && $inappendix) { > $pre = $1; > $post = $2; > if (!(/client\s*=\s*["']Jones["']/)) { > print $pre . "555-4321" . $post . "\n"; > } > else { > print $_; > } > } > else { > print $_; > } >} > > Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From crism at ora.com Wed Dec 31 19:11:29 1997 From: crism at ora.com (Chris Maden) Date: Mon Jun 7 16:59:42 2004 Subject: SAX-J and the DPH (DJH?) In-Reply-To: <199712311856.SAA20429@mail.iol.ie> (message from Sean Mc Grath on Wed, 31 Dec 1997 18:56:39 GMT) Message-ID: <199712311916.OAA26295@geode.ora.com> [Sean McGrath] > So this works if: > > 1) No more than 1 telephone number per line [Chris] For my trivial solution. Perl can handle multiple matches per line; I'm just not very sophisticated yet. > 2) No cdata marked sections [Chris] Can be handled by looking for CDATA marked section starts and ends, using code similar to the appendix, and adding && !$incdata to all element-matching conditionals. > 3) The attribute value literal for client does not have any entity > references [Sean - suggested] > 4) The target telephone number does not contain entity references > [Sean - suggested ] The two real problems in this list. > 5) appendix elements do not nest [Sean - suggested] Not a problem - keep a reference counter instead of my trivial boolean approach. (Appendices rarely nest, but this is applicable to other kinds of elements.) > 6) Telephone numbers do not nest (problem if regexp matching is > greedy) [Sean - suggested] The regexp is greedy, but I can use a pattern that will only match single elements. > Others? I think a little list of "gotchas" like this would find the > way onto many a DPH's wall (including mine!). There are only two real problems here, the ones with entity references. These are, on their face, beyond the scope of a DPH. I would either (a) do a quick grep to see if I need to worry about it, or (b) run my script on the output of spam or a similar normalizer. I don't think anyone has claimed that Perl can address everything; as David (I think) said, there is a large fuzzy gray line between problems in the Perl domain and problems in the full XML processor domain. (The assertion can be proven by the fact that a Perl script can solve arbitrary XML processing problems, but will, in the course of doing so, eventually implement a full XML processor.) -Chris -- <!NOTATION SGML.Geek PUBLIC "-//Anonymous//NOTATION SGML Geek//EN"> <!ENTITY crism PUBLIC "-//O'Reilly//NONSGML Christopher R. Maden//EN" "<URL>http://www.oreilly.com/people/staff/crism/ <TEL>+1.617.499.7487 <USMAIL>90 Sherman Street, Cambridge, MA 02140 USA" NDATA SGML.Geek> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Wed Dec 31 19:16:41 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:59:42 2004 Subject: msxml 1.8 questions Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C099FCF@red-msg-56.dns.microsoft.com> All of your questions indicate that the new MSXML 1.8 isn't installed properly. The MSXML parser included with IE4 did not return whitespace nodes, and was not case sensitive - so if the XMLViewer is showing tag names in uppercase it's probably still using the old parser. If you run http://www.microsoft.com/standards/xml/xmlinst.exe it should install properly (it uses install.exe) If you run install.exe manually you should do so from a DOS command prompt with the current directory set to c:\msxml (or wherever you installed MSXML). > -----Original Message----- > From: Sean Mc Grath [SMTP:digitome@iol.ie] > Sent: Tuesday, December 30, 1997 6:04 AM > To: xml-dev@ic.ac.uk > Subject: msxml 1.8 questions > > Any help with the following gratefully received! > > 1) Is there any way to avoid having to use the /cp switch to specify the > class path. Setting > the CLASSPATH environment variable does not help. > > 2) What is happening to my whitespace:- > > c>type foo.xml > <Document> > Some content > <IAmEmpty/> > Some more content</Document> > > c>jview.exe /cp c:\m;c:\m\classes msxml -d1 foo.xml > DOCUMENT > |---ELEMENT Document > | |---PCDATA " Some content " > | |---ELEMENT IAmEmpty > | +---PCDATA " Some more content" > +---WHITESPACE 0xa > > 3) The XMLViewer applet seems to be uppercasing element type names. Can I > turn this off? > > Sean Mc Grath > sean at digitome dot com > > > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From digitome at iol.ie Wed Dec 31 19:28:53 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:59:42 2004 Subject: Failure Criteria: Simple XML Event-Based API for Java Message-ID: <199712311928.TAA22804@mail.iol.ie> [Peter Murry Rust] >There are two sorts of posting on this subject: those people who have >offered to *do* something; and those people who have offered to tell the >others what to do :-). Careful there. I am easily insulted:-) I am not a Java programmer (yet). Neither am I an XML parser writer. However, what goes on in SAX-J discussions is important to me because it will form the basis of what goes into SAX-IDL and thusly into C++ and Python which I *do* write XML software in. I do a lot of report generation/data harvesting from SGML/XML. That makes my usage of these things fundamentally different from the viewing type applications that (I believe) you are primarily interested in. I do not believe my worries/concerns are any less valid than yours. My need for a simple API is no less valid than yours. If someone had said "hey lets do a simple XML API for read-only apps" I would not have felt as concerned about dropped comments, pis, what have you. If that was said when SAX started, I apologise for missing it and unreservedly apologise for un-focusing the effort. Happy new year to everyone. Sean Mc Grath sean at digitome dot com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From peter at ursus.demon.co.uk Wed Dec 31 20:12:26 1997 From: peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:59:42 2004 Subject: Failure Criteria: Simple XML Event-Based API for Java In-Reply-To: <199712311928.TAA22804@mail.iol.ie> Message-ID: <3.0.1.16.19971231210820.1c47b7f6@pop3.demon.co.uk> At 19:28 31/12/97 GMT, Sean Mc Grath wrote: >[Peter Murry Rust] >>There are two sorts of posting on this subject: those people who have >>offered to *do* something; and those people who have offered to tell the >>others what to do :-). > >Careful there. I am easily insulted:-) Not at all - just concentrating minds :-) > >I am not a Java programmer (yet). Neither am I an XML parser writer. >However, what goes on in SAX-J discussions is important to me >because it will form the basis of what goes into SAX-IDL and >thusly into C++ and Python which I *do* write XML software in. Of course. And your involvement in the IDL would be much welcomed. > >I do a lot of report generation/data harvesting from SGML/XML. That makes my >usage of these things fundamentally different from the viewing type applications >that (I believe) you are primarily interested in. We appreciate that XML is many things to many people :-) There is clearly not a one-serves-everyone application. (In fact my aspirations go beyond the viewing process to the whole publication process - but publication involving some unconventional material.) The concentration on viewing is probably because may people will come to XML through viewing it as their first activity. But it's not restrictive. > >I do not believe my worries/concerns are any less valid than yours. My need >for a simple API is no less valid than yours. If someone Excellent :-) >had said "hey lets do a simple XML API for read-only apps" I would not have >felt as concerned about dropped comments, pis, what have you. I suspect that that was the primary concern that I had at the beginning. I *am* concerned about XML2XML applications but am not optimistic that we have very much experience to go on yet. > >If that was said when SAX started, I apologise for missing it and unreservedly >apologise for un-focusing the effort. No - I suspect that there weren't enough clear-cut goals. We need to focus on a subset. If the major demand is for document transformations, fine... > >Happy new year to everyone. On the Gregorian calendar... P. > >Sean Mc Grath >sean at digitome dot com > > > > >xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; >(un)subscribe xml-dev >To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; >subscribe xml-dev-digest >List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > Peter Murray-Rust, Director Virtual School of Molecular Sciences, domestic net connection VSMS http://www.nottingham.ac.uk/vsms, Virtual Hyperglossary http://www.venus.co.uk/vhg xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From jmiller at mwci.net Wed Dec 31 21:35:01 1997 From: jmiller at mwci.net (Jeremie Miller) Date: Mon Jun 7 16:59:42 2004 Subject: JavaScript XML Parser Message-ID: <011501bd1634$991a25a0$2c01a8c0@jeremie.dbqglass.com> Well, I finally decided to take the plunge and learn XML. As a learning project, I decided to write a simple XML parser in JavaScript(ECMAScript). As it exists now, it doesn't handle many of the more advanced parts of the spec(PI's, CDATA, etc...) and is only trying to be a read-only well-formed parser. I feel its to a point where I can let others play with it and need some good feedback on it. But remember, I have not even read the XML recommendation more than a light glance-through, so the parser it fairly limited yet. The point of it is to take XML fragments and expose them as a parsed object-tree to other javascripts for manipulation/display. Its ~5k, its fast, and it works with any ECMAScript compliant browser(I hope). It will get updated often as I have time to read the spec and to learn more about DTD's. Go play at: http://www.jeremie.com/xparse/ I really need some constructive feedback about what it needs to do, the API, possible uses for it, etc... Thanks! This list has been a very educational tool so far! As I learn more, you'll probably be seeing more of me :) Jeremie Miller jer@jeremie.com xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From clovett at microsoft.com Wed Dec 31 21:40:36 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:59:42 2004 Subject: msxml 1.8 questions Message-ID: <2F2DC5CE035DD1118C8E00805FFE354C099FD5@red-msg-56.dns.microsoft.com> Ah, I think I found the problem. Two things seem to be going wrong: 1) The new xmldso.cab file is not installing into the classes.zip file at all. 2) It looks like jview doesn't use the ClassPath registry setting that install.exe is setting up, but instead jview seems to be using only the DevClassPath that will not be defined if you have not installed Visual J++. So, I have some fixing to do, but in the meantime, the following workaround should get you going: a) launch "regedit" b) open up the the following hierarchy: + HKEY_LOCAL_MACHINE + SOFTWARE + Microsoft c) select "Java VM" d) if there is no DevClassPath string defined then select "Edit/New/String value" and type in the name "DevClassPath" then double click the ClassPath variable and copy it's contents to the new DevClassPath variable. e) Close regedit. Now jview should work correctly. I will fix the installer to do this for you and figure out why the new classes are not going into classes.zip. > -----Original Message----- > From: Chris Lovett > Sent: Wednesday, December 31, 1997 11:16 AM > To: 'Sean Mc Grath'; xml-dev@ic.ac.uk > Subject: RE: msxml 1.8 questions > > All of your questions indicate that the new MSXML 1.8 isn't installed > properly. The MSXML parser included with IE4 did not return whitespace > nodes, and was not case sensitive - so if the XMLViewer is showing tag > names > in uppercase it's probably still using the old parser. > > If you run http://www.microsoft.com/standards/xml/xmlinst.exe it should > install properly (it uses install.exe) > > If you run install.exe manually you should do so from a DOS command prompt > with the current directory set to > c:\msxml (or wherever you installed MSXML). > > > > -----Original Message----- > > From: Sean Mc Grath [SMTP:digitome@iol.ie] > > Sent: Tuesday, December 30, 1997 6:04 AM > > To: xml-dev@ic.ac.uk > > Subject: msxml 1.8 questions > > > > Any help with the following gratefully received! > > > > 1) Is there any way to avoid having to use the /cp switch to specify the > > class path. Setting > > the CLASSPATH environment variable does not help. > > > > 2) What is happening to my whitespace:- > > > > c>type foo.xml > > <Document> > > Some content > > <IAmEmpty/> > > Some more content</Document> > > > > c>jview.exe /cp c:\m;c:\m\classes msxml -d1 foo.xml > > DOCUMENT > > |---ELEMENT Document > > | |---PCDATA " Some content " > > | |---ELEMENT IAmEmpty > > | +---PCDATA " Some more content" > > +---WHITESPACE 0xa > > > > 3) The XMLViewer applet seems to be uppercasing element type names. Can > I > > turn this off? > > > > Sean Mc Grath > > sean at digitome dot com > > > > > > > > > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > > (un)subscribe xml-dev > > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > > message; > > subscribe xml-dev-digest > > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; > (un)subscribe xml-dev > To subscribe to the digests, mailto:majordomo@ic.ac.uk the following > message; > subscribe xml-dev-digest > List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From mike at datachannel.com Wed Dec 31 22:43:53 1997 From: mike at datachannel.com (Mike Dierken) Date: Mon Jun 7 16:59:42 2004 Subject: JavaScript XML Parser Message-ID: <01BD15FA.1E332EC0@NEMO> Jeremie, Beautiful. The only two suggestions I have are: 1 Create DOM methods on the JavaScript objects. This way authors can use any parser without changing their scripts. 2 Don't go overboard with DTD/parameter entities/etc. handling. Full blown parsers already exist, and they can be instantiated as an <OBJECT> and called from JavaScript. What is needed is a nice lightweight way to programmatically read simple XML. If you want to really rock the world, do these two things: 1 Write an XSL processor in JavaScript. 2 Figure out how to read from multiple URLs from JavaScript, without blowing away the current page. Mike D DataChannel xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk) From ahinds at poboxes.com Wed Dec 31 22:58:13 1997 From: ahinds at poboxes.com (Alexander Hinds) Date: Mon Jun 7 16:59:42 2004 Subject: Problems with whitespace and msxml Message-ID: <199712312252.RAA55936@mail1y-int.prodigy.net> Forgive me if this has been discussed before, but I download the latest msxml.tar.gz from Microsoft's web site (release notes dated Dec 4) and am having a devil of a time with getting it to do the right thing with whitespace. For one thing, despite what the docs say, it seems to insist on: <!ATTLIST book xml-space (DEFAULT | FIXED) 'DEFAULT' > instead of "default | preserve". Moreover, no matter what I set it to, I always get back whitespace in my tree, even without a mixed content model (for example, for element book, it's first sib is always whitespace). My question, basically is: how do I eliminate whitespace from my tree entirely? Or failing that how do I get the current value of xml-space in my ElementImpl subclass? It appears that nameXMLSPACE is private, not protected (why?) so a subclass can't really search it. But even when I change the visibility, it's always null anyway. Any help or suggestions would be most appreciated. Thanks in advance. ---book DTD--- <?xml version="1.0" ?> <!DOCTYPE book [ <!ENTITY % block "p | section"> <!ENTITY % flow "#PCDATA | %block;"> <!ELEMENT book (section)+ > <!ELEMENT section (%flow;)* > <!ELEMENT p (#PCDATA) > <!ATTLIST book name CDATA #REQUIRED author CDATA #REQUIRED xml-space (DEFAULT | FIXED) 'DEFAULT' > <!ATTLIST section name CDATA #REQUIRED > ]> xml-dev: A list for W3C XML Developers. To post, mailto:xml-dev@ic.ac.uk Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To (un)subscribe, mailto:majordomo@ic.ac.uk the following message; (un)subscribe xml-dev To subscribe to the digests, mailto:majordomo@ic.ac.uk the following message; subscribe xml-dev-digest List coordinator, Henry Rzepa (mailto:rzepa@ic.ac.uk)