From eliot at isogen.com Sun Jun 1 00:40:45 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:57:53 2004 Subject: XML-LINK Message-ID: <3.0.32.19970531173735.00c2622c@swbell.net> At 10:51 PM 5/30/97 GMT, Peter Murray-Rust wrote: >I am trying to understand how XML-LINK might be used and would be >grateful for some gentle hints. I'll try to offer some guidance. I have implemented support in ADEPT*Editor for HyTime that is roughly equivalent to the types of facilities Peter is asking about for XML Link and JUMBO. Thus, I think I can provide some insight to these issues. Also, trying not to be too pedantic, I've tried to correct Peter's use of terminology where I think Peter's use may be leading to some of his confusion. This is intended to be generally instructive--these misuses are generally endemic and stem from the Web's singular focus on addressing to the exclusion of all else. [NOTE: having written this note, I find I must warn that it is long and somewhat more theoretical than I had intended. Peter: There are useful implementation suggestions in here. Also, the end of this note includes what are effectively suggestions to the editors of the XML spec--Tim and Steve, I've copied you explicitly on this note by way of formal submission of these comments--I found my explanation of my opinions underlying the comments to be generally instructive--normally I wouldn't criticize in public without first conveying the critique directly to the editors. However, in this case my suggestions are neither indicative of serious flaws nor is the acceptance of them a necessary condition for my acceptance of XML Link as a useful spec--it is useful as written (although, like any such spec, including HyTime, it could use clarrification of some of it's intended semantics in places). [to continue...] >A link has ends which are called resources. My current understanding is >that these can be thought of as points in the structure of a document, and >will often coincide with Elements. I am as yet unclear about the total >number of possible topolgies of a link, and ask some questions here. I think it's most useful to think of the resources as nodes in trees ("groves" in the HyTime/DSSSL world) [see terminology discussion below]. This is because before you can resolve an address, you must parse the thing into memory so you have a literal structure your program can address to (e.g., nodes in some data structure). HyTime and DSSSL codify this by defining all of their functioning in terms of operations on nodes in groves (DSSSL and HyTime are both closed over groves). I think it will be helpful to do the same thing here, although we can, for simplicity, just use the general notion of "parse trees" and avoid the complication of the grove formalism. Note that any kind of data can be parsed into a parse tree (although the tree may consist of a single node)--this is an important simplifying generalization. >Structure and Behaviour. > >My understanding is that a hyperdocument can have a link structure which is >independent of behaviour - it simply represents the structure of the >information. True. > I'm happy with this - what I'm less clear about is whether >there are *commonly agreed semantics* for this, or whether it's all >application-dependent. [If the answer to all my concerns is 'application- >dependent' then it will be a pity because everyone will write individual >link processors and there will be no reusability.] I'm aware that all these >concerns are catered for by HyTime, but since I am ignorant of HyTime, >answers which refer to that won't be much use to me - ideally they should >be in the context of the current spec. There are two schools of thought on this: 1. The "links are everything" school. This school makes no distinction between relationships that are purely structural and relationships that are annotative. In this school, all semantics are, by necessity, application dependent, because all relationships are fundamentally annotative and are only made structural by labeling them as such. 2. The "structure and annotation are different" school. In this school, a fundamental distinction is made between purely structural relationships and annotative relationships. The semantics of structural relationships (inclusion) are well defined and not open to interpetation. For example, in SGML, the markup structure defines structural relationships. HyTime augments this by providing a generalized, indirect, structural relationship called a "value reference", which lets you use any form of address to identify the effective value of something, such as an element's content or an attribute's value (as opposed to using direct containment via markup or specifying attribute values directly). Annotative relationships are created using hyperlinks. The rule of thumb for distinquishing hyperlink relationships from other relationships is that if hyperlinks are removed, they don't change the fundamental properties of the data linked (e.g., they don't change it's structure, remove required property specifications, etc.). NOTE: This issue is confused because the same addressing methods (e.g., URLs, IDREFs) may be used for both structural and annotative relationships. In addition, the styles applied to annotative relationships may make them appear to be structural (e.g., "present this anchor at the point of occurence of this other anchor") when they are not. A good example of this latter case is using hyperlinks to associate notes with a source document. Some systems, such as HyBrowse, let me style hyperlinks in various ways, including presenting one anchor at the point of occurrence of another anchor. Using this facility, I can style my "annotation" links such that it appears that the annotation is part of the data annotated, even though it isn't: choose another style and you get a clickable button that takes you to the annotation. Choose a third and the annotation is hidden. Obviously, the annotation is not part of the content of the source document and styling it as though it were doesn't make it so. THUS: the only way to know for sure if a given use of addressing is in the service of structural relationships or annotative relationships (hyperlinks) is to examine the semantics of the thing making the reference: you can't tell from the form of address. It is up to the designers of document types and architectures to define a method for distinquishing structural relationships from annotative. If they fail to do so, they are requiring the processors (browsers, formatters, style sheet writers) to do the defining. [HyTime formalizes the distinction between structure and annotation with the "value reference" facility (nee conloc), which lets you define the structural semantic associated with particular references. Value reference defines structural relationships semantically rather than lexically (as SGML does with markup).] NOTE: Text entity references in SGML are not semantic, they are lexical, being a parser-level include. Data entity references (references to graphics or subdocuments) are not lexical and may be used for either structural relationships or annotative relationships. SGML also makes a clear distinction between addressing storage objects (entities) and addressing semantic objects inside storage objects. The URL mechanism combines storage object reference and semantic object reference into a single, inseparable syntax (one of the reasons URLs are so fragile). >SIMPLE >The simplest link is XML-LINK="SIMPLE" and is an analogue of HTML's >or . My view of it is exemplified by this fictitious XML >document: > >

This is resource A which points to >the foo bird (see picture >) >

> >Here there are two links, both being unidirectional. Any hyperlink is inherently bi-directional, in the sense that knowing where both ends are, you can traverse from one to the other. Whether traversal in both directions is *allowed* is a matter of style or the semantics of particular link type. The directionality of hyperlinks is independent of the directionality of the addressing used to create the link. Note that XML Link does (unnecessarily in my opinion) limit simple links to traversal initiation from the SIMPLE link element. We tend to think of simple links as being directional because it is impractical to resolve all links in order to find the other ends in order to enable traversal from the non-pointing anchor in an unbounded environment like the Web. However, in a closed system (such as within an intranet or a system like Hyper-G) this need not be a problem. In other words, while all links are inherently bi- or multi-directional, the practicalities of address resolution in some environments may preclude making both traversal directions available. If you are at the element making the reference, you know it's an end of the link; the reverse is not always true. I understand the the >ends of the first link are the 'point' described by 'ID=A', and the point >described by ID=foo (though this is still being discussed). If this is true, >then in a **tree-based** tool like JUMBO the ends of the link correspond >to nodes in the tree (labelled by ID=A and ID=foo). The second link is harder >because the resource in foo.gif is not clear (perhaps it is the inode in >the UNIX system?). If we require that all addresses are to nodes in trees, then we have to say that the address "foo.gif" is implicitly a reference to the node in the tree created by "parsing" the gif into memory. If the GIF consists of a single image, the tree may have a single node, it's root, with some properties, one of which is the image data itself. If the GIF consists of multiple images, the tree would have a root and one child for each image. Once you've built the tree, the result of addressing is well defined (possibly through some implicit addressing rules defined for the format, such as "reference to a GIF image is really a reference to the first Image node in the tree produced by interpreting the GIF--note that someone has to define what the rules are for parsing GIFs into trees, but this is probably part of the GIF spec, either explicitly or implicitly in the way GIF data is organized). In HyTime and DSSSL, this concept is generalized through the notion of property sets and "grove constructors", which are nothing more than notation-specific processors that understand that notation and the rules for creating groves from it. The property set is nothing more than a formal class schema that defines the classes and properties of the nodes in the resulting grove. >I have (I believe) implemented SIMPLE links in JUMBO. Each Node has a method >isLink() which says whether it's the start of a SIMPLE link. (I may have to >change this nomenclature when the other links become clearer.). So, for >example, when process()ing a Node, JUMBO looks to see if it isLink() and if so >what does it point at (value of HREF). It seems to work. It might be helpful to generalize this slightly from "isLink()" to "IsEndMember()". In other words, any node in any document may be a member of one or more link ends (remember that XML pointers can address multiple objects). Simple link elements are also members of at least one link end [I say "at least one" because they could themselves be linked to]. By generalizing this question, you don't need to distinguish between simple links and extended links because simple links are simply special cases of extended links. In other words, the core processing semantics for links are the same regardless of whether the links are "simple" (that is, the link is one of its own ends) or "extended" (that is, completely "out of line"). The relationships represented are the same and are independent of both the syntax of link representation and the addressing methods used to address the members of the link ends (including the implicit address of being the link element). [This is why it's impossible for XML Link (or HTML) to not be HyTime conformable: links are links are links, regardless of syntax or addressing. HyTime is now sufficiently general that any syntax of link represenation and any form of addressing can be connected to the linking and addressing semantics defined by HyTime. &Borg-motto;] >Note that in this model, the resource which is pointed to (ID=foo, or foo.gif) >is not required by XML-LINK to know anything about the link. I asumme it could be argued both ways that the pointedAt should/should_not know what is >pointing at it. [SHOW and ACTUATE are deliberatly not discussed, although I >think they are straightforward (at least compared to EXTENDED).] In fact, in the general case, no object can "know" that it is being pointed at--only the "link manager" knows for sure. However, the processing associated with an object should be able to ask the link manager (e.g., JUMBO) "am I being pointed at?", i.e., "am I a member of the ends of any links you know about?" >EXTENDED > >EXTENDED is a container for an indefinite number of LOCATOR links. TERMINOLOGY ALERT: LOCATOR elements are NOT (I repeat ARE NOT) links. They are addresses, semantically equivalent to the HREF attribute of SIMPLE. It is vitally important to maintain a clear distinction between linking, which is the definition of relationships, and addressing, which is the mechanics by which the things related are pointed to. This is important for at least two reasons: 1. Addressing can be used for purposes other than linking. If you conflate linking with addressing, you will conflate linking with things that are not linking (see above). 2. It reminds you that the relationship and its definition is independent of the form of address. If you change an IDREF to a URL, you have changed the form of address but you haven't changed the relationship expressed. [If I move from place to place changes, my address changes, but my relationship to my wife, namely that we are married, does not change just because my address has.] [LOCATOR >has exactly the same syntax as SIMPLE but has presumably different >semanttics.] Not presumably, explicitly. SIMPLE and EXTENDED have *exactly* the same semantics (the representation of a relationship). The difference between them is the *syntax* of how the things related are addressed. For SIMPLE, the link end address is an attribute of the link element (the address of the other end, the SIMPLE element itself, is implicit and thus not specified). For EXTENDED, the addresses of the link ends are specified by subelements. EXTENDED does not by itself define a resource and is normally >remote from the resources. If my memory of the last ERB discussion of this is correct, EXTENDED will be able to be one of its own resources in the next draft of the link standard. In other words, EXTENDED can be used just as SIMPLE is, differing only in the syntax by which the other link ends are addressed. >I can see how a bi-directional link might be constructed from EXTENDED >[It's other multiplicities I don't feel so happy with.] Does this >example capture it? Yep. >

Friends, Romans, Countrymen, lend me your >ears

. >... > > > > >... >We therefore have a bidirectional link between the verb and the noun, so >that each of them can locate the other. Per the discussion of directionality above, it's more useful to say that the ANNOTATION link is a "two end" link, rather than "bi-directional", as the allowed directions of traversal are independent of the number of anchors. Therefore, in JUMBO, there >has to be a pointer which is available to each Node. My temptation would be >for each node to carry a hashtable of links to other nodes so that (say) >when W1 was asked what it linked to it would come up with a list of the >Nodes at the other end of its links. W2 would be such a node. On the other >hand it might point to the LINK (i.e. link1, and it might be clear from the >'contents' of link1, what the other end was. Is this too restricted? The way I implemented this in my ADEPT code was to build the following tables in memory as a result of processing all links in all documents within a bounded document set: 1. For each node, what link ends it is a member of 2. For each link end, what link it is an end of 3. For each link element, what link ends it has (remembering that a link end is an abstract object listing the members of that end) 4. For each link end, its defined role (remembering that each link end has a defined role [the "anchor role" in HyTime terms]). 5. For each link end, objects that are a member of it. 6. For each link end, the values for the various HyTime-defined link end (anchor) properties: link traversal, list traversal, etc. The key to these tables is the management of links by managing link ends as virtual objects, from which all other information can be gleaned. >From these tables, I can get from any object that is a member of any link end to any member of any of the ends of the links it is a member of. Given a node, I look it up in the "node-to-link-end" table. For each link end the node is a member of, I then look up the link end in the "link-end-to-link" table and then look up the other link ends ("link-to-link-ends" table) of that link. For each link end, I look up the members of those link ends ("link-end-to-members") and thus get a list of all the nodes the starting node is linked to, classified by link type and anchor role. I build these tables as a start-up process applied to all documents in the set, but you could also do it only for a single document and then only enable traversal from those link end members you know about from processing the links in that document (thus the motivation in XML Link for having a document that contains nothing but links to be used as a starting point). As links are traversed to new documents, you can process the links in those documents, adding to your tables as you go. >I am not clear how this extends to 'multidirectional links' Here is a typical >problem. > >to bear the slings and >arrows of >... > > > > > >... >Here I want to indicate that the verb 'bear' links to two nouns at the >same time and that each noun points to 'bear'. But it isn't obvious that >this is the case (unless perhaps ROLE is used for that, and that doesn't >seem general). Yes--the use of ROLE is the key: all the members of ends with the same role are members of the same (virtual) link end. Thus, the above is a two-ended link relating the single verb object to the two noun objects. [See discussion below for more on this issue.] If there were three roles (noun, verb, subject), there would be three link ends. If you're interested in my data structures and algorithms, you can find my ADEPT*Editor HyTime code at http://www.isogen.com/demos/hylibcmd.html. ADEPT*Command language is very similar to Perl and C, so anyone familiar with those languages should be able to figure out what's going on. I've tried to comment the code as completely as I could, especially with respect to the data structures. I don't claim that my particular implementation is necessarily the best, but it seems to work so far. I think I need to augment it to better capture the stages of indirection used to address individual nodes--currently I only capture the result of addresses, which limits my ability to delay address resolution and provide complete error reporting and debugging facilities (very important in an editor, if not in a browser). Here is a brief XML-to-HyTime terminology translator (my understanding or use of XML terms may not be accurate, caveat emptor):
XML Term
resource
No direct mapping, as HyTime (and SGML) distinguish storage objects from addressible objects within storage objects. However, resource most closely maps to "node in grove", as that's what HyTime is always ultimately addressing. When storage objects are the thing named by the address syntax (e.g., a URL, entity SYSID, etc.), HyTime (or the notation itself) defines rules for getting a grove from the storage object. XML sometimes uses resource in the way that HyTime uses "anchor" or "anchor member", but doesn't make the same formal distinction between anchors and members of anchors that HyTime does (see below).
linking element
In HyTime, any element derived from any of the HyTime hyperlink forms hylink, clink, agglink, varlink, or ilink. HyTime distinguishes hyperlinks from forms of reference used to establish purely structural relationships ("value reference"). SIMPLE can be derived from hylink in the same way that clink is itself derived from hylink. SIMPLE could also be derived from clink. EXTENDED can be derived from varlink (in fact we designed varlink specifically to enable direct derivation of EXTENDED, see my recent post to the XML WG list). The only difference between these forms is the syntax by which the anchors are addressed (and, in the case of clink and agglink, the fixing of the anchor roles in the HyTime standard to reflect common practice). All HyTime linking forms are semantically identical.
locator
"Location address". HyTime defines the general notation of attributes and content as being potentially "referential", meaning that they contain what XML calls a "locator". HyTime defines a specific element-based syntax for representing indirect location addresses. HyTime also lets you use other forms of address by defining them formally as queries that return nodes in groves. (Thus, XML's locator syntax can be defined as a query notation to HyTime by formally defining how XML locators address nodes in groves--this is done already to a large extent by reference to the underlying TEI spec, which says that TEI extended pointers use the SGML property set and HyTime default grove plan for addressing SGML documents.) My personal recommendation is that the developers of HyTime-aware systems implement support for URLs, TEI extended pointers, and XML pointers as query notations that are integrated out of the box, both because they are in common use and because they provide a convenient syntax for addressing when you don't need HyTime's indirection machinery. Note that the existence of the XML link spec does not preclude the use of HyTime indirect addressing with XML documents. Having implement support for TEI locators, support HyTime's indirection syntax and semantics is not that much more effort.
label
No HyTime analog. HyTime doesn't define a specific mechanism for labeling links or anchors as it's not relevant to the level of semantics HyTime defines and should be left open to specific applications. XML's definition of such an attribute and the meaning for is entirely appropriate and useful.
traversal
HyTime defines the same meaning. In addition, HyTime defines a default mechanism for describing the traversal constraints on anchors. However, this mechanism is probably more than XML link needs and XML Link correctly avoids it in preference to a simpler mechanism that matches the expectations of most Web users and browser vendors.
multi-directional-link
HyTime doesn't formally define this concept in isolation, although the HyTime link traversal rules do define a way to express this constraint. HyTime does make the same distinction between "go back" or "return" and bi-directionality.
in-line link
"Contextual" link. In HyTime, any link can, potentially, be one or more of its own anchors. If that anchor also allows traversal initiation, then the link is said to be "contextual" in that it presumably occurs in a context from which it could be used to initiate traversal, as opposed to being somewhere else (possibly inaccessible to users).
out-of-line link
"independent" link, i.e., a link that is not contextual (because either it is not self anchored at all or it is self anchored but the self anchor does not allow traversal initiation).
HyTime also makes a distinction that the current XML link spec appears not to make between "anchors" of links and the members of those anchors. In HyTime, a link anchor is a virtual object consisting of all the objects addressed as a given anchor role within a single link type for an instance of that type. The XML link spec appears to conflate anchors and the members of anchors into the term "resource" (in that it doesn't distinquish the objects addressed from their organization within a particular role of a link). The current XML Link spec doesn't clearly define the meaning of having multiple locators with the same role. I've interpreted it in the only way that makes sense to me (probably because it's the HyTime way). My logic is that choosing the same role name within a link expresses common grouping under the semantic lable of that role, so it follows that the objects addressed for that role should be grouped together for access. There doesn't appear to be much difference between: resource "W3" role: "verb" resource "W4" role: "noun" resource "W5" role: "noun" And: Role "verb": resource "W3" Role "noun": resource "W4" resource "W5" Note that, baring traversal restrictions, the traversal result (the things you can traverse to) is the same in both cases. The only difference is how the semantic groupings are organized. The real question is not one of traversal, but one of relationship representation: can an observer of the link element tell whether the author meant for the two nouns to be grouped under a common label or was the presense of two nouns a coincidence? With formal anchors, it must be the first, because all resources with the same role are, by definition, semantically grouped under that role. Without formal anchors, it's up to the link creator to indicate what they meant. If your addressing method is incapable of addressing multiple objects (e.g., normal URLs), then you can't depend on addressing multiples from a single Locator to indicate the intended role grouping. Thus, in my opinion, the only reliable interpretation is that roles define semantic groups (anchors) independent of how they are specified syntactically. FWIW. Cheers, E. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtauber at jtauber.com Sun Jun 1 07:12:19 1997 From: jtauber at jtauber.com (James K. Tauber) Date: Mon Jun 7 16:57:53 2004 Subject: Entity replacement Message-ID: <01BC6E8D.A7E3CEA0@dial126.cygnus.uwa.edu.au> Am I correct in thinking that one benefit of requiring entity and element structure to be synchronized is that a well-formed document is also well-formed before general entity replacement; ie you can parse the document before entity replacement, parse the entities and then just insert the parse tree of the later into the former? If this is true, then is there some similar constraint that could be applied to use of parameter entities? This might already have been done in the choice of where to use % in the productions but I can't quite work out the pattern. It would be nice if parameter entity replacement could be described without recourse to the % notation in the spec's productions. Would it be helpful, for example, to increase the number of non-terminal symbols in the grammar and then specify which non-terminal symbols can be replaced by parameter entity reference? PS Can people check out http://www.jtauber.com/xml/ and let me know (off-list) what they think and what could be added? Thanks James -- James K. Tauber / jtauber@jtauber.com Perth, Western Australia xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From peat at erols.com Sun Jun 1 18:13:30 1997 From: peat at erols.com (Peat) Date: Mon Jun 7 16:57:53 2004 Subject: A few thoughts on XML and EDI Message-ID: <199706011613.MAA10043@smtp2.erols.com> XML/EDI Advantages of including Electronic Data Interchange (EDI) entities with eXtensible Markup Language (XML) The advantages of including Electronic Data Interchange (EDI) entities with eXtensible Markup Language (XML) differs for each camp. -- For the EDI camp the unification means making application implementation easier, allowing for quicker reach into vertical markets, reduced message stores when processing transactions, and most importantly enabling document-centric tools such as search engines and Internet "push" products to supplement database mechanisms. -- By assuring EDI compatibility, the XML camp gains almost instance use among thousands of companies. XML will gain a common extensible data entity definition which has under gone the test of time. The bottom line: the XML camp gains Fortune 1000 support and the EDI camp gains a common presentation protocol. If the combination can bring this much to the table why hasn't it been done before now? The attempt to combine structured presentation with structured data for transactions is not new. The last attempt ended a little over a year ago. At that time the researchers of the Joint Electronic Document Interchange (JEDI) project which were managed through the Division of Learning Development Research Group at De Monfort University Leicester, the Computer Science at University College London, and the Document Interchange project at UKERNA completed their study. The project's intent was to analyze the current international and industry de facto standards that are in use for electronic document creation, transfer and presentation. The project was to identify the set of common elements that would allow the conversion of both logical and layout aspects of a document. The documents would then be viewed using a WWW type browser that was available for common computer platforms. The JEDI project concluded that SGML is ideally suited for EDI as it is text based and is independent of platform and operating system. The actual results were a little disappointing in that the world was and is still not ready for an SGML/DSSSL implementation. What has changed, for us to try again? It is a year later, and in the Internet timeframe this is plenty for momentum to shift. Due for release sometime this summer is an important specification to WWW browser-based applications - the eXtensible Markup Language (XML). The intent is to make the rather rich HTTP protocol even richer. It is a scaled down simpler version of SGML, in fact the one of the goals of the specification is to "...be straightforwardly usable over the Internet." The key here is "straightforwardly usable." This flavor in the design of XML which is why the specification will succeed for transactions where the SGML/DSSSL failed. This is not to say that SGML/DSSSL wouldn't work, but more a reflection on us accepting change. Change sometimes needs to be taken in a series of steps - XML is the next step. What about the momentum with XML? XML, managed by the World Wide Web Consortium (W3C) working group, will no doubt become the next significant enabling technology for the Web. XML will provide Web publishers and consumers with unprecedented power, flexibility and control over the creation of and access to Internet and intranet content. To date the XML specification is backed by SoftQuad, Adobe, IBM, HP, Microsoft, Netscape, Lockheed Martin, NCSA, Novell, Sun, Boston University, Oxford University, and the Universities of Illinois and Waterloo. In addition to the authors of the specification, about 30 companies already support the CDF; Channel Definition Format, an XML application which brings to the Internet various "push" operations. Netscape and Microsoft and have already pledged XML support in their future WWW browser releases. And many corporations are being added to the list as they learn of the specification's existence and capability. What could the EDI entities look like? The general format of the transaction would be described in HTML. The EDI segments and elements could go something like this... .... DUNS Number: FR1123456]]> DUNS Number: FR*1*123456]]> The above items are just a thought. Hopefully, when both camps view the above lines, they see only a slight modification to the methods implemented today. To include the right hooks, CDATA or other XML entities might have to include some specific syntax for EDI. The details, though not many, can be ironed out by the excellent authors of both camps. So then XML documents are really just EDI templates, Right? Yes and no. Yes the documents can be used as templates. But in addition to this application, the XML document can also be a transaction itself. XML/EDI would allow in a non-proprietary way, for structured presentation format to be included now in the transaction. Combined effort in template or application form creation and development is estimated in the thousands of man-years, not hundreds. Soon there will be a standard which to share the work others have done, applications need only to simply access WWW browser objects. This object-based approach to applications will make document transaction exchange even easier. Bottom line: The EDI camp could leverage XML to aid in lowering implementation costs. In addition to templates, and transactions, tools are available today to store, search, route, narrowcast and maintain information in document-form. By adding defined data entities, these tools can be enhanced to make EDI processing and integration much easier. Database, EDI specific, and application programming tools were for the longest time the only choices, the only options for EDI administrators. XML/EDI will give the EDI administrator more choices. If presentation elements are included in the transaction what happens to our transmission bandwidth? The transaction would certainly require more bandwidth as compared to EDI specification today. The additional strain on a corporation's infrastructure must be weighed with those advantages gained by the use of XML/EDI on a case by case basis. It is estimated that the XML/EDI-based transactions would add about 15% to the size of the current transactions. In the cases where this increase is significant, the XML/EDI standard documents can replace proprietary templates, which would still allow for use of document-based tools internal to the organization. Where do we go from here? - Introduction of the two camps - XML and EDI - Education of both camps of the others existence, tools and implementation methods - Assure that the proper hooks are in XML to support EDI - Create an EDI application for the Extensible Markup Language (XML) - EDI "mappers" must add XML parsing to their front-end logic. Please reference Joint Electronic Document Interchange (JEDI) http://www.sil.org/sgml/gen-apps.html#jedi EC/EDI References Electronic Commerce Resource Center http://www.ecrc.ctc.com EC/EDI Jumpstation http://www.premenos.com/Resources/Organization Overview of EC can be found at http://www.dmx.com Listing EC sites: http://planetx.bloomu.edu/~jsdutt/EC-urls.html Mailing list devoted to issues of EDI: To subscribe to the list, called EDI-L, send an Email message to listserv@uccvma.ucop.edu with the line subscribe edi-l firstname lastname (l for List, not numeric 1) in the message area; To send a message to the mail list, address messages to edi-l@uccvma.ucop.edu XML References XML Press Release (SoftQuad) http://www.sq.com/press/releases/prmar1197.htm The XML W3C Working Draft is at http://www.w3.org/pub/WWW/TR/WD-xml-961114.html eXtensible Markup Language Site http://www.jtauber.com/xml/ Channel Definition Format application for XML http://www.microsoft.com/standards/cdf.htm Mailing list devoted to issues of XML: To subscribe to the list, called XML-DEV, send an Email message to majordomo@ic.ac.uk with the line subscribe xml-dev name@address (where name@address is your actual email address) in the body of the message. To send a message to the mail list, address messages to xml-dev@ic.ac.uk Bruce Peat peat@erols.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sun Jun 1 20:04:06 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:53 2004 Subject: XML-LINK Message-ID: <7417@ursus.demon.co.uk> Many thanks indeed Eliot, This is extremely useful and I think I follow everything you have put forward. [It's taken me several months to get to that stage, so it's clear that for webhackers, rather than rocket scientists there is a longish learning curve - at least until real applications become common]. I will some back to some of the detalied points shortly. Eliot's reply confirmed my suspicion that there could be different interpretations of the role of XML-LINK ('structure' and 'annotation'). If this is not realised then it would easy to create software for XML-LINK which was inappropriate in the wrong context, and it could be very confusing for newcomers. There is also the strong likelihood that *some* XML-LINK processors will be tightly bound to particular applications (browsers, database engines, etc.). The converse may be that a general XML-LINK engine (covering both approaches above) might be described in language sufficiently abstract that newcomers to XML might fail to understand its purpose and value. It would be extremely useful to see where XML-DEV readers see a link-processor in the XML architecture. From Eliot's reply I see it as a browser-independent engine which answers queries about links regardless of what use they are to be put to. (It presumably *holds* the traversal information, but simply hands it to the querying engine.) So JUMBO should be independent of the link processor. When JUMBO was acting as a generic browser and a node was actuated/processed/arrived_at, etc. JUMBO would query the link processor as to whether it had information about this node. If so, it would decide whether to act upon it. However I assume an application could instruct JUMBO that certain Nodes (or collections of nodes) required information from the link processor, such as whether they were part of a DAG, linked list or whatever. Then it could extract the whole structure independently of behaviour (haven't thought this through in detail :-). So I am extremely wary of starting to code anything more in this area until its limits become clear. It's clearly venturing into rocket science territory. By comparison XML-LINK=SIMPLE is relatively straightforward. Therefore I suspect there will be implementers (like myself) who find that full XML-LINK implementation is too difficult/expensive/undefined, whilst SIMPLE is useful and doable. IMO it will be valuable to have an application-independent link processor for XML-LINK. Is this likely to happen? Or is this only really conceivable in very large organisations? Once again many thanks P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eliot at isogen.com Sun Jun 1 22:06:55 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:57:53 2004 Subject: XML-LINK Message-ID: <3.0.32.19970601150222.006c3028@swbell.net> At 09:53 AM 6/1/97 GMT, Peter Murray-Rust wrote: >So I am extremely wary of starting to code anything more in this area until >its limits become clear. It's clearly venturing into rocket science >territory. By comparison XML-LINK=SIMPLE is relatively straightforward. >Therefore I suspect there will be implementers (like myself) who find that >full XML-LINK implementation is too difficult/expensive/undefined, whilst >SIMPLE is useful and doable. Remember that there are two basic "modes" of link processing: 1. The "I know everything mode", in which you need both to know the boundaries of the documents you need to know about (i.e., a "bounded object set") and a general processor capable of doing the processing and holding the result for some reasonable length of time. This is the HyBrowse/Hyper-G/Intranet approach, in which the bounds of the system are fairly well defined and you have the infrastructure you need to manage all the link information more or less persistenty. 2. The "I only know about what I have seen or am now seeing" mode. This is the normal "Web" mode or the Panorama mode (except that Panorama only remembers what it knows about the current document, unfortunately). If you constrain your links to be in line ("contextual"), whether using the SIMPLE or EXTENDED syntax, then the second mode can always be applied and any XML browser should be capable of handling it (if you can handle SIMPLE you can handle inline EXTENDED links and EXTENDED links in the document you're processing). This is not an unreasonable constraint to have as it greatly simplifies processing (the Web is forced to impose this constraint for the general case, as the Web is so large it is effectively unbounded). Note that the two modes can be combined, such that you might know everything about one set of documents but only about pointers to another set. The data structures needed to manage knowledge of the links is the same in both modes, the only question is when do you gather the knowledge? Cheers, E. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From cbullard at hiwaay.net Mon Jun 2 01:47:01 1997 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 16:57:53 2004 Subject: XML-LINK References: <3.0.32.19970601150222.006c3028@swbell.net> Message-ID: <339209C3.3F5C@hiwaay.net> W. Eliot Kimber wrote: > Remember that there are two basic "modes" of link processing: > > 1. The "I know everything mode", in which you need both to know the > boundaries of the documents you need to know about (i.e., a "bounded object > set") and a general processor capable of doing the processing and holding > the result for some reasonable length of time. This is the > HyBrowse/Hyper-G/Intranet approach, in which the bounds of the system are > fairly well defined and you have the infrastructure you need to manage all > the link information more or less persistenty. Isn't this also the mode of WinHelp? WinHelp uses a restricted target set (footnotes in page delimited chunks), but the rest of the hyperlinking is topical based on the project file (defines the objects whose topics are targets) and the #define files (string and paired ID) for the software to use for contextual help, and the compiled files. > 2. The "I only know about what I have seen or am now seeing" mode. This is > the normal "Web" mode or the Panorama mode (except that Panorama only > remembers what it knows about the current document, unfortunately). How does Panorama store linking information? > Note that the two modes can be combined, such that you might know > everything about one set of documents but only about pointers to another > set. The data structures needed to manage knowledge of the links is the > same in both modes, the only question is when do you gather the knowledge? Well, not just when, but how is it packed if it is in separate files for different processors? Consider the winHelp model. I ask because my guess is a very high percentage of the legacy hypertext in the world right now in need of conversion is WinHelp. That means taking the separate pieces and mapping them to the XML-n models. len xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eliot at isogen.com Mon Jun 2 05:57:14 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:57:53 2004 Subject: XML-LINK Message-ID: <3.0.32.19970601225027.006aa5e0@swbell.net> At 06:46 PM 6/1/97 -0500, len bullard wrote: >> 1. The "I know everything mode", in which you need both to know the >> boundaries of the documents you need to know about (i.e., a "bounded object >> set") and a general processor capable of doing the processing and holding >> the result for some reasonable length of time. This is the >> HyBrowse/Hyper-G/Intranet approach, in which the bounds of the system are >> fairly well defined and you have the infrastructure you need to manage all >> the link information more or less persistenty. > >Isn't this also the mode of WinHelp? WinHelp uses a restricted target >set (footnotes in page delimited chunks), but the rest of the >hyperlinking >is topical based on the project file (defines the objects whose topics >are targets) and the #define files (string and paired ID) for the >software to use for contextual help, and the compiled files. Relating parts of a program to entries in help files is really nothing more than providing a query interface to your browser that lets you find things (the entries) by property (the program object name or menu hierarchy or whatever). Not really hyperlinking in that sense, just plain old access. You could, however, represent the relationship between the program objects and the code objects by creating a hyperlink that used queries that were interdependent, e.g.: select(entries_with_IDs(help_file($help.filename))) for_each(locaddr(has_id("entries")), select(code_objects($code.set), code_object_with_id(current_node())) Because the "code-object" role's query depends on the "help-entry" role's query (the second iterates over the first), this single link defines the assocation between all code objects and all help entries for a single help/program pairing. You could also create one such link for each pair, but we really just need to express the intent, as the implementation will probably be hard coded. Note that this is very similar to using a DynaText style sheet to associate hyperlinking style with element types, except that here the relationship is defined more abstractly and distinct from any particular implementation of it (apart from interpretation of the query notation itself, which in this case, I've made up for this example). For example, given the link above, I could use it as a specification to guide me in creating the equivalent DynaText style functions and SDK extensions to make DynaText into a help system. The link above defines the relationship semantics, the programmer of the DynaText customization implements it. >> 2. The "I only know about what I have seen or am now seeing" mode. This is >> the normal "Web" mode or the Panorama mode (except that Panorama only >> remembers what it knows about the current document, unfortunately). > >How does Panorama store linking information? For documents read in, it just keeps it in memory. For Webs you create with it, it creates HyTime documents with the necessary location addresses. Apart from Webs, it never keeps linking information around from documents opened prior to opening the current document in the same Panorama session. Panorama does provide bi-directional traversal for contextual (in-line) links, but only within a single document--having used a contextual link to traverse from one document to another, it doesn't remember the links that started in the first document in order to provide the back links to any clinks in the first that point to the second. It does, of course, provide a "go-back" feature, but that's different, as going back is not traversal (as both XML Link and HyTime are careful to point out). >> Note that the two modes can be combined, such that you might know >> everything about one set of documents but only about pointers to another >> set. The data structures needed to manage knowledge of the links is the >> same in both modes, the only question is when do you gather the knowledge? > >Well, not just when, but how is it packed if it is in separate files for >different processors? Who cares? We're talking about implementation, not data representation. If you want to interchange the result of building these data structures, use the HyTime property set, build HyTime semantic groves, and interchange those, either as objects using some object interchange standard (e.g., CORBA) or spit out the equivalent canonical grove documents. Or define a standard relational schema. Or define a document type that expresses that tables. We should expect some standardization of the APIs for communicating about hyperlinks and their properties, but not standards for the representation of the internal data structures programs really use (I wouldn't expect most implementors to use the HyTime property set as their object model directly--there's too much room for optimization, especially with respect to tool-specific features). Cheers, E. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Mon Jun 2 13:14:57 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:53 2004 Subject: XML-LINK and IDREF In-Reply-To: <3.0.32.19970601225027.006aa5e0@swbell.net> Message-ID: <0Zm96CAybpkzEwi6@light.demon.co.uk> While we're on the subject of linking, I'm intrigued about the status of the IDREF attribute type in XML. In the SGML world, simple links within a document are mostly done with IDREF -> ID, e.g.: New Beginnings

... ...

As we saw earlier in Chapter 5, ... (where the TARGET attribute has type IDREFS). Browsers such as Panorama recognise and support these links natively. Is the intention in XML that IDREF(S) attributes are only supported "for compatibility", and that the XML simple link should be used instead, i.e.: ...

As we saw earlier in Chapter 5, ... (where REF is now defined as having attribute XML-LINK="simple") ? This obviously has implications for XML processors. Richard Light. >xml-dev: A list for W3C XML Developers >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To unsubscribe, send to majordomo@ic.ac.uk the following message; >unsubscribe xml-dev >List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Mon Jun 2 14:22:01 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:54 2004 Subject: XML-LINK and IDREF Message-ID: <7439@ursus.demon.co.uk> In message <0Zm96CAybpkzEwi6@light.demon.co.uk> Richard Light writes: > While we're on the subject of linking, I'm intrigued about the status of > the IDREF attribute type in XML. Agreed. I had thought about this as well. It's not easy to see how both might be used fruitfully at the same time without confusion. Formally, my understanding of ID/IDREF is that it is part of XML-LANG and must be supported by XML-LANG processors ([50] Validity checks). The IDREF can only point to an ID in the same document (at least how I read it). Therefore one option is for implementers not to use XML-LINK and to use ID/IDREF for whatever purposes they wish (structural, annotation, and with whatever behaviour.) Although I'm not an SGML expert I imagine this is frequently done already. The advantage/characteristic of ID/IDREF is that checking is at syntactic level (i.e. parsers are required to analyse it). [I am not sure what status ID/IDREF has in WF documents, as it will only know which the IDs are if there is an ATTLIST, i.e. a DTD or DTD fragment is included]. > > In the SGML world, simple links within a document are mostly done with > IDREF -> ID, e.g.: It is part of the language. In XML-LINK we are introducing another part of the 'language.' > New Beginnings >

... > ... >

As we saw earlier in Chapter 5, > ... > > (where the TARGET attribute has type IDREFS). Browsers such as Panorama > recognise and support these links natively. 'Support' is presumably application-dependent. Presumably browsers have something like: ACTUATE="USER" SHOW="REPLACE" as default - if an IDREF is discovered it is announced to the user who can navigate from there. > > Is the intention in XML that IDREF(S) attributes are only supported "for > compatibility", and that the XML simple link should be used instead, > i.e.: > > ... >

As we saw earlier in Chapter 5, > ... XML-LINK can also point outside the document. > > (where REF is now defined as having attribute XML-LINK="simple") ? The implementer and author is obviously given more help and guidance for implementing links with XML-LINK sinec there are a number of attributes which will be well documented and where usage will develop. [There would be nothing to stop DTD authors including ROLE, ACTUATE, SHOW with IDREF but it would not be likely to be standard practice, whilst the usage of these with HREF presumably will have a good communality of purpose and implementation.]. > This obviously has implications for XML processors. Yes! The full XML-LINK spec is rather daunting for an implementer. In principle it could require writing something part of the way towards Hyper-G or HyTime. *How* far is what concerns me at present :-) So my present approach is to: - ignore ID/IDREF for my own DTDs - enable JUMBO to locate IDs. [Note this is not trivial, because not all parsers provide this information at present.] - not provide special support in JUMBO [the application programmer can find the IDREFs and build their own stuff if required.] - fully implement XML-LINK=SIMPLE (hopefully more or less on track at present.) - think furiously about EXTENDED. From what Eliot has written, I suspect it will be a lazy implementation - i.e. storing links as they are 'discovered' in documents and adding this information to nodes. It will be Web-like and unlikely to be a complete linkset unless it becomes very clear how these are created and used. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Mon Jun 2 15:06:37 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:54 2004 Subject: A few thoughts on XML and EDI In-Reply-To: <199706011613.MAA10043@smtp2.erols.com> Message-ID: In message <199706011613.MAA10043@smtp2.erols.com>, Peat writes >XML/EDI >Advantages of including Electronic Data Interchange (EDI) >entities with eXtensible Markup Language (XML) > >What could the EDI entities look like? > >The general format of the transaction would be described in HTML. The EDI >segments and elements could go something like this... >.... >DUNS Number: >FR1123456]]> > > > >DUNS Number: >FR*1*123456]]> Bruce, Another approach (which is compatible with that used for the XML linking specification) is to have attributes which identify certain elements as holding EDI information. That way, the EDI information is explicitly labelled, and an XML processor can be asked to return it to an application using standard API calls. This approach means that the EDI information forms part of the logical structure of the XML document, rather than being a CDATA 'implant'. It also means that users can define their own element types to hold EDI information, so long as they label them with the agreed attributes. Furthermore, it allows them to use XML's built-in validation facilities to check for structurally valid input, e.g.: in the DTD: ... in the document: .... FR1123456 Note that the EDI-TYPE information is declared once and once only, in the DTD, and does not add to the markup overhead in the actual document. Richard Light SGML and Museum Information Consultancy richard@light.demon.co.uk 3 Midfields Walk Burgess Hill West Sussex RH15 8JA U.K. tel. (44) 1444 232067 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lex at www.copsol.com Mon Jun 2 18:02:14 1997 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 16:57:54 2004 Subject: ANNOUNCE: DSSSLTK 1.0 Available Message-ID: <199706021559.KAA07346@copsol.com> ANNOUNCEMENT: DSSSLTK Available The DSSSL Developer's Toolkit (DSSSLTK) from Copernican Solutions Incorporated is now available for download. A distribution may be obtained from: http://www.copsol.com/products/index.html This is the first of a set of DSSSL technology releases from Copernican Solutions Incorporated. What is the DSSSL Developer's Toolkit? ========================================================================== This toolkit is similar in nature to the applet or serverlet architectures developed by Sun Microsystems/JavaSoft. This toolkit is a set of abstract interfaces written in Java to allow application developers to work with different Java-based DSSSL environments. What does it do? =========================================================================== This toolkit serves as an interface between difference DSSSL components. It represents an architecture for building DSSSL-oriented systems using the Java programming language. What is available? ========================================================================== The DSSSL Developer's Toolkit contains the following: * Full source code to the interfaces and classes. * Javadoc for the API reference. * Configuration and makefile utilities for building the distribution. * A prebuilt zip file containing all the classes. What is the purpose of the DSSSL Developer's Toolkit ========================================================================== The DSSSL Developer's Toolkit was developed as part of the Seng DSSSL Environment. One of the design constraints for the Seng engine was a completely componentized system such that developers could integrate their own implementations of components such as parsers, grove, processing engines and the other components would not be affected. In solving this problem, Copernican Solutions developed the DSSSL Developer's Toolkit as a set of abstract interfaces for accessing DSSSL constructs. These interfaces were developed under the premise that they should be standardized and include the requirements of more than the development efforts at Copernican Solutions. Developers interested in standardizing the DSSSLTK should contact Alex Milowski at alex@copsol.com. What are the licensing restrictions for the toolkit? =========================================================================== All the source is available free of charge and may be integrated into other systems without licensing. Is there an implementation of this toolkit? =========================================================================== Yes, our Seng DSSSL engine implements this toolkit. Included in Seng is the Java SGML Parser Interface (JSPI) which builds groves from SGML document sources using a native library based on James Clark's SP SGML parser. Both will be available for download soon. ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eliot at isogen.com Tue Jun 3 16:58:07 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:57:54 2004 Subject: XML-LINK and IDREF Message-ID: <3.0.32.19970603084223.00f4ed80@mail.swbell.net> At 12:58 PM 6/2/97 GMT, Peter Murray-Rust wrote: >In message <0Zm96CAybpkzEwi6@light.demon.co.uk> Richard Light writes: >> While we're on the subject of linking, I'm intrigued about the status of >> the IDREF attribute type in XML. > >Agreed. I had thought about this as well. It's not easy to see how >both might be used fruitfully at the same time without confusion. > >Formally, my understanding of ID/IDREF is that it is part of XML-LANG and >must be supported by XML-LANG processors ([50] Validity checks). The IDREF >can only point to an ID in the same document (at least how I read it). >Therefore one option is for implementers not to use XML-LINK and to use >ID/IDREF for whatever purposes they wish (structural, annotation, and >with whatever behaviour.) Although I'm not an SGML expert I imagine this >is frequently done already. There are two problem with IDREFS and XML: 1. Without DTDs, it may not be possible to know what attributes are IDs and which are references. 2. IDREFs provide no direct way to address elements in other documents. Therefore, if you want to enable IDREFs, you have to provide some indirection mechanism that can transform an IDREF to an address into other documents. This is what HyTime and the TEI do by providing various location address element forms. If you don't do this, then you require documents to have different element types for elements that use IDREFs and elements that don't. This has the effect of necessarily binding element types to the forms of address they use, which should not normally be necessary (because addressing is distinct from the semantics of reference and therefore shouldn't necessarily influence the element type). Unless I've misunderstood the current spec, XML Link doesn't provide any ID-based indirection method, so that pretty much elimitates direct ID reference in the general case. [However, using the pointer syntax, you can address elements with IDs, but only through the use of an XML Link URL.] Indirect addressing certainly complicates the processing--it requires you to build recursive processes and may impose significant processing overhead. On the other hand, indirect addressing is very powerful and lets you do things that are difficult or impossible otherwise, especially in terms of managing links and addresses automatically, largely because you can isolate initial references from the details of the addresses of the things referenced. Cheers, E. --

W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com
xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Wed Jun 4 17:02:15 1997 From: nmikula at edu.uni-klu.ac.at (Norbert Mikula) Date: Mon Jun 7 16:57:54 2004 Subject: (NXP)/Java/XML parser : Passing Error Information Message-ID: NXP, as of today, simply prints error messages to Stderr. This is fine for now, but it is certainly not the best way to do things. There was a suggestion made to me, to throw an exception, but I think exceptions are not the best solution as recovery from them is practically not possible (From the level of the application programm) To my understanding there are several classes of errors that can be passed along 1.) Warnings 2.) WF violations 3.) Violations with respect to the DTD 4.) In general these errors that are reportable - if the user wishes Should they be handled differently ? I was thinking in terms of "callback" functions. Like I do it right now with the "Esis" interface. How would you, as the user/developer community envision handling this. What information would you like to have passed along to an application ? Error code, textual description (what about localization ..). Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtauber at jtauber.com Wed Jun 4 17:23:45 1997 From: jtauber at jtauber.com (James K. Tauber) Date: Mon Jun 7 16:57:54 2004 Subject: Entity replacement Message-ID: <01BC713E.361E55C0@dial130.cygnus.uwa.edu.au> > The % notation does, in effect, specify which non-terminal symbols > can be replaced by p.e. references. Not really because you get cases of %(...) Now admittedly, productions that include this in their RHS could be rewritten with an additional non-terminal symbol, so that production [43] could be written choice::='(' S? choicelist S? ')' choicelist::=cps ('|' cps)+ And this is exactly what I would like to see done because you could then simply list (apart from the productions themselves) those non-terminal symbols that can be replaced by PEs. Do other developers feel this would make it easier to go from spec to implementation? Now, relating my previous parsing/GE query to PEs: Is it easy, given the current syntax spec, to build a correct parse tree of a DTD before PE replacement? If not, should it be? James K. Tauber / jtauber@jtauber.com Perth, Western Australia xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ddb at criinc.com Wed Jun 4 22:30:13 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:57:54 2004 Subject: (NXP)/Java/XML parser : Passing Error Information Message-ID: <3.0.32.19970604132934.00a1a290@mailhost.criinc.com> At 05:02 PM 6/4/97 +0200, Norbert Mikula wrote: >To my understanding there are several classes >of errors that can be passed along > >1.) Warnings >2.) WF violations >3.) Violations with respect to the DTD >4.) In general these errors that are reportable - if the user wishes > >Should they be handled differently ? > >I was thinking in terms of "callback" functions. Like >I do it right now with the "Esis" interface. I would tend to agree that a callback mechanism would be the most useful way to accomplish error handling. I strongly believe that it should be possible to break up error messages into different types, with all the necessary information for a application to process the error itself (and build it's own error message, open an editor to the correct file & character in that file, etc). It should be easy for an application to ignore certain types of error messages (or, reverse that, it should be easy to only pay attention to the ones it cares about). Not having looked at NXP's code, I am not sure how it's current ESIS interface works, but this is also a reasonable application of inheritance. There might be a general HandleError() method, which (by default) is just a multiplexor to call separate methods for each type of error. This is akin to the AWT 1.0 event model. Alternatively you could have a more callback-like (AWT 1.1 like) model, which would be marginally more difficult to code (for Norbert) but is slightly more elegant. From my point of view it is an even call, since both work and neither is particularly horrid. (Most of the issues for why the AWT event model was changed do not apply here since this is a very specific case, not a general event model.) My main interest, with regard to what I would like in any parser I would use, would be a clean mechanism to be able to handle errors in an application specific way. This is more than just, 'where do I print the error message?' and includes the ability to write an editor application which could use the parser to validate and then jump directly to the line/character of any errors. SP goes a ways toward that, but I would prefer a hierarchy of errors, similar to some of the object oriented event models that I have seen recently. -derek -------------------------------------------------------------- ddb@criinc.com || software-engineer || www/sgml/java/perl/etc. "Just go that way, really fast. When something gets in your way, turn." -- _Better_Off_Dead_ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From peat at erols.com Wed Jun 4 23:20:48 1997 From: peat at erols.com (Peat) Date: Mon Jun 7 16:57:54 2004 Subject: XML and EDI Message-ID: <199706042120.RAA12840@smtp2.erols.com> I wanted to share this with you. It is from David Weber posted early May on the EDI-L mailing list. - Bruce Peat ---------------------------------------------------------------------------- ------------------ BOO!!! ARE WE ALL HISTORY??? This message is in two parts, in the first part I argue why DISA is about to become history, hit by a truck marked 'Microsoft - Internet EC', just another piece of road kill along the path of computing history. Therefore we should just switch tacks and pack our bags and figure out how to go and build EC based systems. In the second part I argue the opposite and introduce my vision of a 4 Tier EDI that can become an active part of EC, and migrate from traditional EDI to the new EC model and retain a role for DISA as a facilitator of future commerce. OK - first lets look at some recent press from the 10th DISA conference: >>>>>> "EDI Expansion" "The EDI industry is currently growing at over 40 percent annually," says Harvey Seegers, president and chief operating officer of Rockville, Md.-based GE Information Services. Seegers is a keynote speaker at the conference. Seegers' message is "EDI has a future -- particularly if it embraces Internet technology -- to extend its capabilities." An important component of a successful future for EDI is Web technologies such as TCP/IP, which enable data to be moved across diverse computing environments: from desktops to mainframes, from Windows to UNIX. Browser interfaces are another innovation that can breathe new life into EDI by giving the Internet "a much friendlier interface to EDI," Seegers says. Other industry leaders are seeing examples of EDI growth as well. For instance, Stamford, Conn.-based Frontec is growing at a rate of 40 percent annually, says Peter Stiles, Frontec's senior vice president of consulting, whose presentation at EC/EDI `97 is about using EDI in transportation planning and supply chain management. Wilton, Conn.-based APL Group reports "a tremendous" growth spurt in the past six months, especially in its consulting business. A software and services provider, APL specializes in personal computer- based EDI translation software and is preparing to move into offering Internet and Web-based EDI systems. APL is one of the exhibitors at the DISA show, where it plans to announce a new electronic commerce initiative with Washington-based telecommunications giant, MCI. <<<<<< Wake up and smell the Roses! This says to me that all the big players couldn't give a fig about old EDI. New EC based stuff is going to use HTML, CGI and even more importantly the upcoming XML, extendable next generation HTML, along with perhaps some Java objects to exchange business data, business facts, and processes. (XML - eXtenible Markup Language : http://www.gca.org/conf/xml/xml_what.htm) This will make DISA style traditional EDI obsolete within a year. Especially as the delivery mechanism between trading partners now becomes an NT box running a Web Server and a hook to your friendly local ISP for $20 a month. Add in Microsofts 'Normandy' MCIS product and now you can click, drag-drop out of Access or VB into your Web Browser and your data is delivered. Total integration of MS Office and the backend delivery channels. (Full details at www.microsoft.com, search on MCIS). XML provides the glue to allow the receiving system to successfully interpret the received information and store it in the correct places in their own databases. Cross-platform is a snap, since if you need to go to a mainframe, the Web server gives you that easily. You can pass the data to a COBOL or CICS process over on the big iron top do final updates. Also - you can query in real time. Your Web page can send an Account # to a remote Web Server for validation, before continuing the data entry process off your local Web Server with the end user. Totally seamless. Where does traditional EDI fit into this picture? It DOES NOT! It just gets in the way. It's faster to put a Web page and some HTML together to capture the information that you need, and get people to interface that way EVEN if they are using a BATCH process to generate HTML from their database, instead of an expensive translate into EDI format. So, OK, there is message overhead this way, but who cares? Bandwidth is cheap these days. When EDI was first done 2400 bps was state of the art. Now ISDN costs less than 2400 did 10 years ago. 115200 bps can take alot of HTML overhead. Better still even new programmers straight out of college know HTML inside and out and there are great GUI tools for slapping fields onto forms and creating the programs to drive them so easily. Plus, when you send someone your sample HTML form they can bring it up in their FREE browser and understand it instantly. (Try that with your proprietary format EDI Implementation Guideline and EDI message format that only works with the EDI tool you have purchased). OK - so lets read some more news - smell some more Roses -> >>>>>>>>>>" Like many companies, Mobil Corp. was tantalized by the idea of using electronic data interchange to swap information with business partners. But because of the pitfalls--high costs, burdensome software maintenance and lack of real-time information--Mobil is eschewing traditional hard-wired EDI and using an intranet for electronic commerce in what analysts call a "leading-edge approach" to an area of EDI that's now under intense scrutiny. After a successful pilot, Mobil this month began rolling out the new system to its more than 300 lube distributors, the independent businesspeople who handle Mobil's "heavy products"--packaged or bulk industrial oils and greases. The intranet integrates Mobil's mainframe data with an Oracle Corp. database that holds product information. <<<<<<<<<<< Can I call them or what? There's more: >>>>>>>>>>> Previously, Mobil had implemented two different EDI systems--one DOS-based and the other Windows-based--that transmitted business documents over a VAN (value-added network). Mobil encountered many of the problems that have stymied the growth of EDI. VAN charges for using the hard-wired networks topped $100,000 a year. Maintenance was burdensome: Every time Mobil changed a business rule, new software had to be sent to each dealer and installed on their desktops. Inventory information was updated only once a week. Because of this, dealers communicated with Mobil through a hodgepodge of EDI, phone calls and faxes: a system of redundant data input that led to many time-consuming errors. Mobil began looking for a new system when its lube group made improving communications with dealers a top priority. The Internet was an obvious solution. An intranet approach for EDI didn't require a hard-wired network, so VAN charges disappeared. When business rules are altered, Mobil only has to make changes once, test the new rules and put them on the server--they become immediately available. "Our customer support people are excited because distributors can look up the information and make the transaction electronically, so the support people's phones won't be ringing off the hook with questions," said Hawkins. Mobil's business rules are embedded in the system's Java applets. The system immediately alerts a distributor if he or she is entering an incorrect order--say, asking for a product in an unavailable package size or making an order that is too large for a truck shipment. <<<<<<<<<<<<<<< Just what I've been saying. This last piece is key. The Java is the transport layer that links business rules and data handling into the whole. OK, but the NAY sayers can still point to some holes " >>>>>>>>>>>>> "Mobil's approach is a leading-edge way to do this kind of application," agreed Rick Drummond, a consultant in Forth Worth, Texas, who has been helping develop security standards for EDI over the Internet. Yet he still has reservations. Said Drummond: "The impact outside this limited application is pretty low because it's not clear if it's dealing with the interoperative issues" raised by EDI systems that are not as closed as Mobil's. <<<<<<<<<<<<<<<<< Yep, but guess what, this is just a matter of time, NOT technology. XML is with us, and it will provide that missing piece, along with use of Java, to be able to link components of one system to another. So - I can embed rules in XML or Java into my LOCAL data processing system, and have Mobil et al send me those when they update them. Thus allowing this thing to move to the next level in a way that EDI was never able to. In fact it gets even worse. The new XML provides all that missing Object Orientated transport layer that DISA has been haggling over, built right into your Web Browser and Web server. (As an aside DCOM and or CORBA? Who cares? As an end user, there is no need to worry about such MIDDLEWARE issues, since the browser companies will always have to provide a layer that can transport your objects at the Web server end. DCOM, CORBA? You never see this! Your JAVA or XML toolkit and execution environment handle all this 'bits and bytes' stuff). Microsoft and Netscape have had to address these issues for their new V4 browsers, not because of EC or EDI but for distributed programming and client/server deployment reasons. Just so happens they nailed the EC and EDI side too. So where does DISA fit into this model? It does not! If I'm Mobil and I'm implementing the next application, I just create my HTML, XML, Java and do it. Then my trading partners collect those components off my Web server and use them to exchange the information we need. I can even generate all this stuff straight off the SQL database definitions I already have loaded up in my CASE tools and database dictionaries. DISA, who him?? ========================================================================== The Second Part of this story: 4 Tier EDI to the Rescue! >From DISA's 10th conference I see alot of haggling over who is right. (Kind a like Nero fiddling while Rome burns around them?) Let's roll the video tape: >>>>>>>>>>>>> Modelling a Better EDI Many in the EDI standards community hope the creation of a new type of EDI standard will open up big new markets for the standardized technology among the nation's estimated 4 million small and medium- sized businesses. The new EDI would have to be simple to implement at little cost. An important aspect of that future EDI standard will be that it must also maintain backwards compatibility with the existing versions of EDI standards, including the standards of ASC X12 and the United Nations. Standards discussions at the DISA show include one by Klaus- Deiter Naujok, standards manager at Concord, Calif.-based Premenos Corp. His subject is Object Oriented EDI, a new proposal for future EDI developed under the auspices of the United Nations EDIFACT's CEFACT. The proposal, made public for the first time at the conference, makes use of object-oriented techniques to model business scenarios into business objects. Naujok says object-oriented modeling holds the possibility of making EDI easier to use and less costly. More Than One Way Dan Codman, co-founder of the Wilton, Conn.-based APL Group and chairman of X12C, the communications and controls subcommittee of ASC X12, says he is not confident object-oriented techniques are relevant to EDI standards. However, ASC X12's Strategic Implementation Task Group, formed to represent U.S. positions on EDI to the CEFACT, will look at modeling techniques to aid the development of a new EDI standard. It will have to be able to be used around the world, cheaply implemented, and understood by anyone who uses it, Codman says. David Files, the leader of ASC X12's Business Information Modeling Group, says Object Oriented EDI fails to address the needs of large companies doing high-volume EDI. Such companies, the majority of those already EDI-enabled, will find that OOEDI is less efficient for large volumes, he warns. Estimates put EDI usage in the United States at only 5 percent of the potential. Reaching the hundreds of thousands of small and mid- sized companies not yet doing EDI is a crucial factor in future growth, industry experts say, and a new EDI model is one of the key linchpins in that endeavor. And, of course, different alternatives will benefit different segments of the industry. <<<<<<<<<<<<<<<< So we have at least three camps! Well, what if all three can live together under the one roof? And the fourth 'grizzly bear' called Internet EC can also be made a player too? Enter 4 Tier EDI. Here's a picture of this, and here's how it works. Layer 1 - Traditional EDI | ------------------------------- | Layer 2 - Rule Based EDI/EC | ------------------------------- | Layer 3 - Process Based EDI | ------------------------------- | Layer 4 - Object Based EDI | ------------------------------- V Now, what this means is that your total EDI message can consist of some or all of these components AS YOUR BUSINESS NEEDS require. Layer 2 is absolutely the KEY LAYER. (Layer 3 and Layer 4 are in fact implemented and done with the tools in Layer 2). Layer 2 supports both XML and Java as the means to define your complete EDI message, including the data. You can either embed an EDI message itself using the standard HTML comment token, i.e. :

This is just some text


Or, you can use the newer HTML/XML methods of identifying data fields and their content within your business forms. What is more, HTML already has a convenient 'Process' level mechanism - the URL to the next, or previous linked form, and the POST/SEND mechanism as a way of telling you where the message came from, plus status information. XML of course allows you to roll your own more extensive features. XML also allows you to transport binary information, and is also fully multi-lingual compliant. OK - so you get the picture. The ability to define your own message sets, structure, rules, objects, whatever grabs your fancy. And then send this to your nearest neighbour via Web Server technology. Also, LAYERS, so that if an Object Orientated approach is meaningless for your business needs you DO NOT need to use it, or burden your messaging with any overhead or complexity associated with OO needs. By defining Layers 3 and 4 using Layer 2, you allow people to choose to what level they wish to use each messaging component. So - how does DISA get into the middle of all this? Two ways. First XML is a virgin territory, therefore DISA can step in and define XML components and standards for enclosing EDI fields, and mapping Traditional EDI to XML/HTML. Then DISA can define process components in XML that facilitate EDI, that for example an applet or object called: date_format(), or entity_characteristics(), and so on, that reference the existing EDI data entity rules that have been loving crafted over the last twenty years. This means DISA can publish a CDROM of XML/Java components that describe and reference existing EDI entities. This will make everyones life easier, and speed implementation of EC. How? Because right now, programmers are using Sun's Java library as the next best thing, plus whatever they can grab off the Java/JavaScript language sites of 'canned' code. I.e. I need to check two dates are valid in my Web form, and so I hook in either an Applet, or JavaScript module that does this, pass the applets dates as parameters, and presto, my dates are OK. Well, instead, on the CDROM is a nice set of routines called Java_valid_EDI_date(), and XML_valid_EDI_date(), that do this for me, but now I know my dates are also fully EDI compliant. And so on. Tool vendors can build plug-ins to Web Browsers that automatically associate these kinds of properties to EDI compliant fields. Also the Web Servers provide services such as Transaction Logging (alot of them can do that now) and of course message routing to different backend servers based off content and addressing, security, database interfacing. In short everything one would expect from a mature communications server platform. The second piece is then obvious for DISA. Having stepped into the middle of the EC process by extending XML, and migrating EDI standards over to this transport layer, DISA then has an on going role. Providing both support for this, and also maintaining products such as the Universal Entity Dictionary, and also defining new methods, processes, and objects that are specific to EDI for use with XML/Java. OK, I hope this is fairly clear. 4 Tier EDI, founded on merging EDI formatting and transport methods with EC Web based methods, where DISA provides the lead in this, and then maintains the standards and certifies vendors products as being compliant, et al. A migration to Web EC based messaging standards as the foundation for future EDI. Understandably there is a section of the existing DISA membership that has to be resistant to this, because their entrenched commercial position is exposed to newer Web vendors and a completely different business and trading model. A brief consideration of the alternatives however should bring everyone into focus. This is not really a choice! If Microsoft and Netscape saw a role for DISA in the future of EC they would already be very active members of this forum. As they are NOT, I can only conclude that DISA needs to quickly make itself part of the mainstream EC agenda, before it vanishes into the mists of time. Otherwise I foresee that Microsoft will shortly be hosting its on Electronic Commerce Symposium for EC Business Partners, and signing up vendors to support its MMSP (Microsoft Messaging Standards Protocol) that it built-in to the Microsoft Web Server engines and Browsers. Certainly one EDI stalwart has already made the move, two months ago "EDI World" magazine was renamed "Electronic Commerce International". ========================================================================= David Webber. p.s. ---------- I shall also be cross posting this to EDI-L, as the broader issues fall into their bag. p.p.s. ------- Just went up on the Microsoft Site to verify some details. Their product is MICS, now (changed from MCIS), and they have a 570K White Paper you can download. Microsoft Internet Commerce Server. One paragraph in it says "but this does not mean the end of traditional EDI". Yeah, right, one paragraph in 570k, and no mention of why not, or how to link the two! Excuse me for not believing that for one second, and for believing that really Microsoft is now setting the agenda. The URL is : http://www.microsoft.com/commerce/whitepaper.htm David Webber. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed Jun 4 23:51:09 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:54 2004 Subject: XML and EDI Message-ID: <7596@ursus.demon.co.uk> In message <199706042120.RAA12840@smtp2.erols.com> "Peat" writes: > I wanted to share this with you. It is from David Weber posted early May on > the EDI-L mailing list. > > - Bruce Peat Thanks for your posting. I don't want to sound negative, but pieces like this need some tailoring before posting to this list, which is aimed at developers. It's important that general news stories do not get posted to XML-DEV - if RobinC doesn't pick them up on www.sil.org, then they either to go c.t.s. or you make stronger bids for comp.text.xml. A cursory reading (I find the metaphors tough going :-) suggests that EDI (about which I know nothing :-) can provide information objects for direct realisation in XML (?and Java), and the piece could perhaps could have been condensed to show this. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jimg at digitalthink.com Thu Jun 5 01:46:11 1997 From: jimg at digitalthink.com (Jim Gindling) Date: Mon Jun 7 16:57:54 2004 Subject: (NXP)/Java/XML parser : Passing Error Information In-Reply-To: Message-ID: Norbert, I am very happy to hear that you are thinking about adding better error notification to NXP. I like your suggestion of adding callback methods to the "Esis" interface for error notification. However, I would really like the interfaces defined such that we can throw exceptions from these notification methods, which would then bubble up to the place where the parser is invoked. For example, I suggest methods similar to the following be added to the "Esis" interface: public void onWarning(ErrorInfo pInfo) throws ParseException Then modify XMLParser.startParsing() so that it also throws ParseException. ErrorInfo should contain all relevant information, such as file name, line number, column number, ... Maybe it makes sense to have a hierarchy of ErrorInfo classes for different types of errors. Just my 2 cents. Thanks again for making such a wonderful tool available to the XML community. Jim >NXP, as of today, simply prints error messages to Stderr. >This is fine for now, but it is certainly not the best >way to do things. > >There was a suggestion made to me, to throw >an exception, but I think exceptions are not >the best solution as recovery from them is practically >not possible (From the level of the application >programm) > >To my understanding there are several classes >of errors that can be passed along > >1.) Warnings >2.) WF violations >3.) Violations with respect to the DTD >4.) In general these errors that are reportable - if the user wishes > >Should they be handled differently ? > >I was thinking in terms of "callback" functions. Like >I do it right now with the "Esis" interface. > >How would you, as the user/developer community envision >handling this. > >What information would you like to have passed along >to an application ? Error code, textual description (what >about localization ..). > >Best regards, >Norbert H. Mikula > >===================================================== >= SGML, XML, DSSSL, Intra- & Internet, AI, Java >===================================================== >= mailto:nmikula@edu.uni-klu.ac.at >= http://www.edu.uni-klu.ac.at/~nmikula >===================================================== > > >xml-dev: A list for W3C XML Developers >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To unsubscribe, send to majordomo@ic.ac.uk the following message; >unsubscribe xml-dev >List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From srn at techno.com Thu Jun 5 17:48:34 1997 From: srn at techno.com (Steven R. Newcomb) Date: Mon Jun 7 16:57:54 2004 Subject: XML and EDI In-Reply-To: <199706042120.RAA12840@smtp2.erols.com> (peat@erols.com) Message-ID: <199706051543.LAA02284@bruno.techno.com> Dear Developers of XML : XML really, really needs notation data attributes. Without them, you can't do object inheritance from architecture (DTD) to architecture (document, whether it has its own DTD or not). An inheritable architecture is, in fact, a notation. We already support notations in XML. What we don't have is the ability to declare the mappings between the inherited architecture's objects (elements and attributes) and the document's objects (elements and attributes). For that, we need notation data attributes. It's a small thing, really, but, wow, what a difference it makes! The usefulness of inheritance for all kinds of purposes (and not least for EDI) is too great to ignore; it is one of the most useful and attractive aspects of SGML. There is no good reason not to do it in XML. So, how about it, ERB? For a discussion of why architectural inheritability is overwhelmingly important, you may want to read my (now slightly dated) paper, "SGML Architectures: Implications and Opportunities for Industry" at http://www.techno.com/sgmlarchitecture.html. Best regards, --Steve Steven R. Newcomb President voice +1 716 271 0796 TechnoTeacher, Inc. fax +1 716 271 0129 (courier: 23-2 Clover Park, Internet: srn@techno.com Rochester NY 14618) FTP: ftp.techno.com P.O. Box 23795 WWW: http://www.techno.com Rochester, NY 14692-3795 USA xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jeanpa at microsoft.com Sat Jun 7 05:50:11 1997 From: jeanpa at microsoft.com (Jean Paoli) Date: Mon Jun 7 16:57:55 2004 Subject: Microsoft XML Parser in Java is Available Message-ID: <78DFE33066ABD0118B9200805FD431BA5EC1EB@RED-16-MSG.dns.microsoft.com> ANNOUNCEMENT: Microsoft XML Parser in Java is Available I am *really* pleased to announce : The XML Parser in Java (MSXML) from Microsoft Corporation is now available for download from: http://www.microsoft.com/standards/xml/xmlparse.htm This is the second piece of XML technology from Microsoft, the first being the Channel Definition Format support in Internet Explorer 4.0. The Microsoft XML Parser is a validating XML parser written in Java. Once parsed, the XML document is exposed as a tree through a simple set of Java methods. We are actively working with the W3C to standardize an XML API (See the W3C overview page for the Document Object Model http://www.w3.org/MarkUp/DOM/. The DSSSL/grove Object Model is carefully studied by the DOM group). These methods support reading and/or writing XML structures, such as the Channel Definition Format (CDF) or other text formats based on XML. This version (Alpha 1.0) of the parser implements the W3C working draft of the XML specification dated March 31, 1997 (http://www.w3.org/TR/WD-xml-961114.html) and will be revised to reflect future W3C changes to the specifications. The following components of the XML spec have not yet been implemented (but will be soon) : * XML-SPACE (for control over white space handling) * XML encoding declaration () * Conditional sections in the DTD (INCLUDE & IGNORE keywords) * Required Markup Declaration 'RMD' Full source code is provided, royalty free, and will be updated frequently to fix bugs and to reflect future W3C changes to the specifications.(read the Microsoft XML Parser in Java license agreement http://www.microsoft.com/standards/xml/xmllic.htm). Bugs should be sent to Istvan Cseri (istvanc@microsoft.com) or Chris Lovett (clovett@microsoft.com). Enjoy, and let us make XML a success story! -Jean Paoli jeanpa@microsoft.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat Jun 7 07:16:09 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:55 2004 Subject: Microsoft XML Parser in Java is Available Message-ID: <7729@ursus.demon.co.uk> Jean In message <78DFE33066ABD0118B9200805FD431BA5EC1EB@RED-16-MSG.dns.microsoft.com> Jean Paoli writes: > ANNOUNCEMENT: Microsoft XML Parser in Java is Available > > I am *really* pleased to announce : I am *really* pleased to read your announcement! (and am replying even before downloading your parser). This will be a tremendous boost towards an API for XML-* modules and their interoperation. I shan't go back to bed until I have looked at it! > [...] > Full source code is provided, royalty free, and will be updated > frequently to fix bugs > and to reflect future W3C changes to the specifications.(read the This is a very constructive and laudable approach. > Microsoft XML Parser in Java license agreement > http://www.microsoft.com/standards/xml/xmllic.htm). > > Bugs should be sent to Istvan Cseri (istvanc@microsoft.com) or Chris > Lovett (clovett@microsoft.com). > > Enjoy, and let us make XML a success story! I am very pleased that this has been announced on XML-DEV as it encourages us all to promote an open approach to software development. [...] -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tallen at sonic.net Sat Jun 7 20:46:57 1997 From: tallen at sonic.net (Terry Allen) Date: Mon Jun 7 16:57:55 2004 Subject: MSXML, WF, and Validity Message-ID: <199706071847.LAA31370@bolt.sonic.net> Jean Paoli wrote: | The Microsoft XML Parser is a validating XML parser written in Java. | Once parsed, the XML document is exposed as a tree through a simple set | of Java methods. After playing with it for awhile this morning I found myself wondering about WF and validity; I don't know if the following counts as a bug, but it would be useful to hear what other think. My input is: ]> Palmy Days One Frond at a Time It was a dark and stormy night. The crows clattered amongst the fronds. &foo; I stuck the DTD in the internal subset because I couldn't get the parser to find an external DTD. The output of jview msxml -d palmy is ]> Palmy Days One Frond at a Time It was a dark and stormy night. The crows clattered amongst the fronds. bar Now the declarations in the internal subset have been read (and munged), and the foo:bar entity expansion has been performed. Yet the instance does not conform to the "DTD" in the internal subset, although taken on its own it is well formed. Is the input file "palmy" a valid XML document? The VC comment following [36] indicates not. Is it WF? I can't find a WF comment indicating that the document must conform to the DTD (which is reasonable, although perhaps this point should be covered explictly). Is MSXML only parsing "palmy" as WF? If not, is this error recovery? These (real, not rhetorical) questions are of interest whether or not this is the intended behavior of MSXML. Regards, Terry Allen Electronic Publishing Consultant tallen[at]sonic.net http://www.sonic.net/~tallen/ Davenport and DocBook: http://www.ora.com/davenport/index.html T.A. at Passage Systems: terry.allen[at]passage.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sun Jun 8 20:33:05 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:55 2004 Subject: Microsoft XML Parser in Java is Available Message-ID: <7758@ursus.demon.co.uk> There are a few minor tweaks required to run or compile MSXML in a Solaris environment - I have posted these to JeanP. [The filenames need to be case sensitive and correspond the class names; the JDK is stricter on casting; and it also requires the constants to be declared before use.] I'd be grateful for any pointers on Java portability, and it's a good place to re-emphasise the value of test data. I've been porting JUMBO to run under J++, and running into a number of problems that don't arise in W95 browsers. These primarily include the use of '/' or '\' in addressing files, but I also have a feeling that some static initialisation may occur differently. Any pointers to experience on this or WWW pages would be valuable. The '/' problem causes me some confusion. When addressing a File, I appear to end up with constructs like: URL context; ... URL u = new URL(context, "jumbo.gif"); I find I have to replace it with URL u = new URL(context+File.separator+"jumbo.gif"); to get it working under J++. The question as to when separators are governed by URL syntax, and when by file syntax is a difficult borderline. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sun Jun 8 20:33:14 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:55 2004 Subject: MSXML, WF, and Validity Message-ID: <7760@ursus.demon.co.uk> In message <199706071847.LAA31370@bolt.sonic.net> Terry Allen writes: > > Jean Paoli wrote: > | The Microsoft XML Parser is a validating XML parser written in Java. > | Once parsed, the XML document is exposed as a tree through a simple set > | of Java methods. > > After playing with it for awhile this morning I found myself wondering > about WF and validity; I don't know if the following counts as a bug, > but it would be useful to hear what other think. I have worried about this as well - I may have mentioned it on the XML-WG. I don't think it's a bug, but rather that the spec does not give a clear guideline on *when* validation is expected. I am sure some ERB members will see this discussion. > > My input is: > > > > > ]> > Palmy Days > One Frond at a Time > It was a dark and stormy night. The crows clattered > amongst the fronds. > > &foo; > > IMO this is a WF document, but not a valid one. > > I stuck the DTD in the internal subset because I couldn't get the > parser to find an external DTD. The output of > > jview msxml -d palmy > > is [... normalised expanded prettyprinted output deleted...] > > Now the declarations in the internal subset have been read (and munged), > and the foo:bar entity expansion has been performed. Yet the instance > does not conform to the "DTD" in the internal subset, although taken > on its own it is well formed. Is the input file "palmy" a valid > XML document? The VC comment following [36] indicates not. Is it > WF? I can't find a WF comment indicating that the document must ^^^ It's certainly WF as far as I see it. > conform to the DTD (which is reasonable, although perhaps this point > should be covered explictly). Is MSXML only parsing "palmy" as WF? > If not, is this error recovery? > > These (real, not rhetorical) questions are of interest whether or > not this is the intended behavior of MSXML. > My view is based on Norbert's NXP which has a commandline switch -v (i.e. require validation). This is run clientside. IOW if the document above had been run through NXP it would have passed it as WF, but failed it IFF the -v flag was set. There are three possible places to request validation: - at author level (i.e. some instruction in the document stating that the document is validatable. The ERB may wish to include this as a component in the XMLDecl or RMDecl (or elsewhere) - at human client level (e.g. -v in NXP) - at software/application level (i.e. this software will ONLY work with valid documents Note that an internal subset may be present for other reasons than validation (adding attribute values and types, as required for XML-LINK, for example). Therefore I do not think the author's intentions can be deduced from the presence of an internal subset. Presumably a pointer (SYSTEM) to an external DTD is likely to refer to a DTD which can be used for validation, but I'm not sure whether this is explicit. In summary I think that MSXML is capable of validation - I'm not clear whether it *always* tries to validate, and if it can't decides simply to check for WF. I think we need guidance on this. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tallen at sonic.net Sun Jun 8 22:17:06 1997 From: tallen at sonic.net (Terry Allen) Date: Mon Jun 7 16:57:55 2004 Subject: Re WF, V, and MSXML Message-ID: <199706081956.MAA31488@bolt.sonic.net> Peter Murray-Rust wrote: >Note that an internal subset may be present for other reasons than validation (adding attribute values and types, as required for XML-LINK, for example). Therefore I do not think the author's intentions can be deduced from the presence of an internal subset. Presumably a pointer (SYSTEM) to an external DTD is likely to refer to a DTD which can be used for validation, but I'm not sure whether this is explicit. Yes, I think there is a somewhat different information model in XML than in SGML, and this parser (whether it's doing all the right things or not) is useful for learning and thinking about the differences. I, too, think that my "palmy" input document is invalid but WF. Thus, if MSXML is parsing to validate, it is (due to a bug or two) doing error recovery (and should be fixed on this point not to do so). I can also see some gotchas for early adopters, such as that a WF document that makes reference to the wrong DTD is still WF. And the WF-parser will check the WFness of the element declarations (even in the right DTD) even if it isn't going to use them, at least in the internal subset. Also, the internal subset is part of the XML document, and, as the spec is written, the parser must parse the subset and deliver it as part of the output (as MSXML does), even though the same is not true of an external subset. (Right?) Doesn't it seem as though the reasons for conveying the internal subset information to the application (such as those you mention) are also reasons for extracting the same information from the external subset and conveying it to the application, too? whether the document is dealt with as WF or not? IOW, an SGML parser such as nsgmls combines both subsets into a DTD and deals with information following as another unit, the "document instance set" (if I have the terminology right, per 8879 production 2), which is the part of an SGML document entity *following* the prologue. But for an XML parser, the boundaries are shifted, because it has to deal with an XML document that *includes* the prologue (XMLlang production 23, where "element" corresponds to the SGML "document instance set", I think). I don't know whether this is a good idea or not, just trying to understand it as an early adopter. (I also notice now that per productions 23 and 27, white space after the end of the end-tag of the root element is also part of the document, which is okay by me; but this seems not to be dealt with explicitly s.v. 2.8, "White Space Handling." I read that section to mean that such white space must be passed to the application by a WF-parser [the language referring to "processors which ... read the DTD" or not should be changed, because, as we see, a WF parser must read at least the internal subset part of the DTD], whereas a validating parser must not pass such white space to the application.) Regards, Terry Allen Electronic Publishing Consultant tallen[at]sonic.net http://www.sonic.net/~tallen/ Davenport and DocBook: http://www.ora.com/davenport/index.html T.A. at Passage Systems: terry.allen[at]passage.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sun Jun 8 23:32:38 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:55 2004 Subject: Re WF, V, and MSXML Message-ID: <7764@ursus.demon.co.uk> In message <199706081956.MAA31488@bolt.sonic.net> Terry Allen writes: [...] > > Yes, I think there is a somewhat different information model in XML > than in SGML, and this parser (whether it's doing all the right things > or not) is useful for learning and thinking about the differences. My problem is more basic - I don't think that there are (yet) 'right and wrong things'. That is why I have been so keen on implementation, because it's only when we get to this stage that the problems of the WF/V boundary come out. > I, too, think that my "palmy" input document is invalid but WF. Thus, > if MSXML is parsing to validate, it is (due to a bug or two) doing > error recovery (and should be fixed on this point not to do so). I think this is more a question of terminology. NXP (Norbert Mikula) is a 'validating parser', but the validation can be switched off. This is a client-side decision. So with NXP 'palmy' could be either invalid or WF according to the reader's wishes > > I can also see some gotchas for early adopters, such as that a WF document > that makes reference to the wrong DTD is still WF. And the WF-parser will ^^^^^^^^^^^^^^^^^^^^^ I'd agree with this, and I don't necessarily think it's wrong until the ERB tells us it is. Its validity is presently decidable from axioms and we are waiting for the ERB to think about the problem. > check the WFness of the element declarations (even in the right DTD) even > if it isn't going to use them, at least in the internal subset. Also, the > internal subset is part of the XML document, and, as the spec is > written, the parser must parse the subset and deliver it as part of > the output (as MSXML does), even though the same is not true of an > external subset. (Right?) I don't think so. My formal reading of the spec is that no 'output' is defined. [After all, processing of an XML document can be done by a human reader :-)]. I think the ERB has been careful to say nothing about output, implementation, APIs, etc. My own view has been that the scope for confusion has been sufficient (as in the present case) that guidance is important. At present we do not know what documents are validatable, what the validity criterion can be computed to be, etc. Note that NXP and Lark do not have 'outputs', they have APIs. NXP allows the programmer to subclass at the Esis level, whilst lark provides a tree of Elements. Neither passes any DTD information. In Lark I suspect this is discarded - in NXP it is requires a bit of digging to extract. NSXML comes closer to delivering the whole grove, I think. (It subclasses PIs and DOCTYPE from Element). > > Doesn't it seem as though the reasons for conveying the internal > subset information to the application (such as those you mention) > are also reasons for extracting the same information from the external > subset and conveying it to the application, too? whether the document > is dealt with as WF or not? Again, the spec (and the ERB) are unclear about conveying this information to the application at all. > > IOW, an SGML parser such as nsgmls combines both subsets > into a DTD and deals with information following as another unit, > the "document instance set" (if I have the terminology right, per > 8879 production 2), which is the part of an SGML document entity > *following* the prologue. nsgmls attempts to validate *every* document it receives. XML parsers need not. It's not clear whether an XML parser can insist on validating every document. [The spec says nothing about *parsers* - agina I have been asking for more concrete terminology than 'processor']. > > But for an XML parser, the boundaries are shifted, because > it has to deal with an XML document that *includes* the prologue > (XMLlang production 23, where "element" corresponds to the SGML > "document instance set", I think). I don't know whether this is a good > idea or not, just trying to understand it as an early adopter. > > (I also notice now that per productions 23 and 27, white space > after the end of the end-tag of the root element is also part > of the document, which is okay by me; but this seems > not to be dealt with explicitly s.v. 2.8, "White Space Handling." > I read that section to mean that such white space must be passed > to the application by a WF-parser [the language referring to > "processors which ... read the DTD" or not should be changed, > because, as we see, a WF parser must read at least the internal I am actually unclear whether a WF-only parser (e.g. Lark) has to read the internal subset at all, other than skipping to the ']>' at the end. If it *does* read and parse it, what does it do with the information. For example, what is the implied structure of the document in: ]> Can we assume that FOO (which has no Element declaration) has an ATTLIST as given, and that therefore it inherits the SHOW and ACTUATE attributes? IOW *must* a parser decorate all matching elements with the ATTLISTS in the internal subset? > subset part of the DTD], whereas a validating parser must not > pass such white space to the application.) My confusion on this issue is well publicised :-) P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tallen at sonic.net Mon Jun 9 01:54:56 1997 From: tallen at sonic.net (Terry Allen) Date: Mon Jun 7 16:57:55 2004 Subject: (correction) Matching is defined Message-ID: <199706082348.QAA08892@bolt.sonic.net> "Match" is defined in the Terminology section, 1.3, contrary to what I wrote. "A string matches a grammatical production if it belongs to the language generated by that production." So if "the l g by that p" means that you expand all the tokens it contains, and an XML document is a string, then WFness applies to the internals of prolog. Perhaps a clause here to deal specifically with documents would be a good idea. Regards, Terry Allen Electronic Publishing Consultant tallen[at]sonic.net http://www.sonic.net/~tallen/ Davenport and DocBook: http://www.ora.com/davenport/index.html T.A. at Passage Systems: terry.allen[at]passage.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tallen at sonic.net Mon Jun 9 01:55:19 1997 From: tallen at sonic.net (Terry Allen) Date: Mon Jun 7 16:57:55 2004 Subject: Re WF, V, and MSXML Message-ID: <199706082339.QAA08654@bolt.sonic.net> Peter Murray-Rust replying to me to him etc. [Terry:] | > Yes, I think there is a somewhat different information model in XML | > than in SGML, and this parser (whether it's doing all the right things | > or not) is useful for learning and thinking about the differences. | | My problem is more basic - I don't think that there are (yet) 'right and | wrong things'. That is why I have been so keen on implementation, because it's | only when we get to this stage that the problems of the WF/V boundary come out. Right. That's why the IETF assigns such importance to running code. | > I, too, think that my "palmy" input document is invalid but WF. Thus, | > if MSXML is parsing to validate, it is (due to a bug or two) doing | > error recovery (and should be fixed on this point not to do so). | | I think this is more a question of terminology. NXP (Norbert Mikula) is a | 'validating parser', but the validation can be switched off. This is a | client-side decision. So with NXP 'palmy' could be either invalid or WF | according to the reader's wishes Agreed, but from the viewpoint of the document preparer, it is both. MSXML needs the switch NXP has. I think the behavior is unintentional, but I would be alarmed at a processor/parser (they mean the same to me in this context) that attempted to parse for validity, and if it found an error, silently switched to WF-parse mode. | > I can also see some gotchas for early adopters, such as that a WF document | > that makes reference to the wrong DTD is still WF. And the WF-parser will | ^^^^^^^^^^^^^^^^^^^^^ | I'd agree with this, and I don't necessarily think it's wrong until the ERB | tells us it is. Its validity is presently decidable from axioms and we are | waiting for the ERB to think about the problem. Agreed, it's just something to watch out for and perhaps to guard against (by not reusing entity names in different DTDs, etc.) | > check the WFness of the element declarations (even in the right DTD) even | > if it isn't going to use them, at least in the internal subset. Also, the | > internal subset is part of the XML document, and, as the spec is | > written, the parser must parse the subset and deliver it as part of | > the output (as MSXML does), even though the same is not true of an | > external subset. (Right?) | | I don't think so. My formal reading of the spec is that no 'output' is | defined. [After all, processing of an XML document can be done by a human | reader :-)]. I think the ERB has been careful to say nothing about output, | implementation, APIs, etc. My own view has been that the scope for | confusion has been sufficient (as in the present case) that guidance is | important. At present we do not know what documents are validatable, what | the validity criterion can be computed to be, etc. Point taken; but the spec is not entirely clean on this point. If the application requests the processor to process, the processor must inform the application of certain things. And it is hard to get around "*An XML processor which does not read the DTD must always pass all characters in a document that are not markup through to the application.* An XML processor which does read the DTD must always pass all characters in mixed co ntent that are not markup through to the application. It may also choose to pass white space ocurring in element content to the application; if it does so, it must signal to the application that ..." [2.8, truncated para, emphasis added] | Note that NXP and Lark do not have 'outputs', they have APIs. NXP allows | the programmer to subclass at the Esis level, whilst lark provides a | tree of Elements. Neither passes any DTD information. In Lark I suspect this | is discarded - in NXP it is requires a bit of digging to extract. NSXML comes | closer to delivering the whole grove, I think. (It subclasses PIs and DOCTYPE | from Element). Right. My problem as a document preparer is that I don't know what an application may request the processor to do, so I must guard against any kind of failure. ... | > IOW, an SGML parser such as nsgmls combines both subsets | > into a DTD and deals with information following as another unit, | > the "document instance set" (if I have the terminology right, per | > 8879 production 2), which is the part of an SGML document entity | > *following* the prologue. | | nsgmls attempts to validate *every* document it receives. XML parsers need | not. It's not clear whether an XML parser can insist on validating every | document. [The spec says nothing about *parsers* - agina I have been asking | for more concrete terminology than 'processor']. | | > But for an XML parser, the boundaries are shifted, because | > it has to deal with an XML document that *includes* the prologue | > (XMLlang production 23, where "element" corresponds to the SGML | > "document instance set", I think). I don't know whether this is a good | > idea or not, just trying to understand it as an early adopter. ... | I am actually unclear whether a WF-only parser (e.g. Lark) has to read the | internal subset at all, other than skipping to the ']>' at the end. If it | *does* read and parse it, what does it do with the information. For example, The soft spot here is the first line of 2.2, where "match" is not defined except that later in that section it "implies" a few things, which are not apparently meant to be a complete set. What the WF document matches is production 23, Prolog element Misc*. As the processor attempting to determine WFness must look inside element to determine WFness, presumably the same is true of prolog. ... unless I determine WFness by *parsing* with a *real parser* which the processor is not meant to be ... | what is the implied structure of the document in: | | | ]> | | | Can we assume that FOO (which has no Element declaration) has an ATTLIST as | given, and that therefore it inherits the SHOW and ACTUATE attributes? | IOW *must* a parser decorate all matching elements with the ATTLISTS in the | internal subset? No, not per XMLlang alone. FOO's only declared attribute has as its name the unreserved string "XML-LINK" although it uses an undeclared attribute name "HREF". So it is WF but not valid. As for whether you can have attlists without element decls, the 2nd sentence following production 47 (emended for entity>element) reads "At user option, an XML processor may issue a warning if attributes are declared for an [element] type not itself declared, but this is not an error", so the document is still WF but not valid per XMLlang alone. Were the XMLlink spec to contain language such that the processor is supposed to go out and fetch the attribute declarations implied by the use of the FIXED attribute (implied by the XMLlink spec, that is), then the document shown is not only WF but perhaps even valid! But it doesn't, and barely talks of validity and processing by a *processor*. That's my take, anyway. Maybe the SGML ERB will want to revise the language about validity in XMLlang, or create new concepts of validity in XMLlink. Regards, Terry Allen Electronic Publishing Consultant tallen[at]sonic.net http://www.sonic.net/~tallen/ Davenport and DocBook: http://www.ora.com/davenport/index.html T.A. at Passage Systems: terry.allen[at]passage.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Mon Jun 9 11:30:07 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:55 2004 Subject: Re WF, V, and MSXML Message-ID: <7771@ursus.demon.co.uk> In message <199706082339.QAA08654@bolt.sonic.net> Terry Allen writes: > > Peter Murray-Rust replying to me to him etc. [... and hoping the WG/ERB are reading this ...] > [Terry:] [...] > > Right. That's why the IETF assigns such importance to running code. Good point. That is why XML-DEV is important and why we need people to create prototypes at this stage. [Most XML-related software and documents come into this category because the problems we are encountering may have implications on the language.] [...] > | > | I think this is more a question of terminology. NXP (Norbert Mikula) is a > | 'validating parser', but the validation can be switched off. This is a > | client-side decision. So with NXP 'palmy' could be either invalid or WF > | according to the reader's wishes > > Agreed, but from the viewpoint of the document preparer, it is both. MSXML > needs the switch NXP has. I think the behavior is unintentional, but > I would be alarmed at a processor/parser (they mean the same to me in > this context) that attempted to parse for validity, and if it found > an error, silently switched to WF-parse mode. I'd agree with this analysis, and haven't been silent on the issue. IMO it is more important for the WG/ERB to address *this* problem than some of the proposed extensions. The concept of WFness is NEW!! It is more subtle than people realise. A fundamental problem is that there is no clear internal flag in the document stating what the validity/WFness of the current document is, is meant to be, was, etc. As Terry says, it's particularly likely that a WF document could (possibly erroneously) mutate into a valid one. I am sure that any confusion about MSXML is not intentional and is due to the issue not be prominent in the spec. All parsers (i.e. tools that take XML documents and apply the criteria in XML-LANG only) should state their attitude and behaviour to WFness and validity. The possible options include at least: - nsgmls-like. Full validation is the only option. Any non-valid dcoument is flagged and appropriate error messages or error action is initiated. - Lark-like (at least V0.88 - I think there is another coming). No validation can be attempted. Any 'output' can only be WF or in error. NOTE: what does Lark do with the internal subset? - NXP-like. Validation can be switched on or off by the 'client'. How this is transmitted to the application is application dependent at present. - MSXML-like. Undocumented at present. Possibly [though Terry and I hope not] validating by default, and changing to WF if this fails. > [...] > Point taken; but the spec is not entirely clean on this point. If the > application requests the processor to process, the processor must > inform the application of certain things. And it is hard to get > around > > "*An XML processor which does not read the DTD must always pass all > characters in a document that are not markup through to the application.* Ah! I had assumed the internal subset as 'markup' - you see it as part of the document. We need a ruling on this :-). Obviously if the DTD appears ***in the processed document***, then it could be interpreted as having been read and used for validation. [...] > > | what is the implied structure of the document in: > | > | | > | ]> > | > | > | Can we assume that FOO (which has no Element declaration) has an ATTLIST as > | given, and that therefore it inherits the SHOW and ACTUATE attributes? > | IOW *must* a parser decorate all matching elements with the ATTLISTS in the > | internal subset? > > No, not per XMLlang alone. FOO's only declared attribute has as its name My mistake. I shouldn't have brought the others in. > the unreserved string "XML-LINK" although it uses an undeclared attribute > name "HREF". So it is WF but not valid. Agreed. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From michael at textscience.com Mon Jun 9 12:43:54 1997 From: michael at textscience.com (Michael Leventhal) Date: Mon Jun 7 16:57:55 2004 Subject: Re WF, V, and MSXML In-Reply-To: <7771@ursus.demon.co.uk> Message-ID: <3.0.1.32.19970609184342.006aca40@aimnet.com> At 10:00 AM 6/9/97 GMT, Peter Murray-Rust wrote: >> I would be alarmed at a processor/parser (they mean the same to me in >> this context) that attempted to parse for validity, and if it found >> an error, silently switched to WF-parse mode. > >I'd agree with this analysis, and haven't been silent on the issue. IMO it >is more important for the WG/ERB to address *this* problem than some of the >proposed extensions. The concept of WFness is NEW!! It is more subtle than >people realise. A fundamental problem is that there is no clear internal >flag in the document stating what the validity/WFness of the current document >is, is meant to be, was, etc. As Terry says, it's particularly likely that >a WF document could (possibly erroneously) mutate into a valid one. I am >sure that any confusion about MSXML is not intentional and is due to the issue >not be prominent in the spec. But "silently switching" is exactly the behavior that is wanted for most output oriented operations, e.g., browsing. WF is only new formally but informally it has been the default mode of operation for HTML. I don't think a flag stating the intention of the author could ever be supposed to actual represent the wishes of the current user of the document or that we could expect the majority of users to understand the underlying concept. It is up to the user of the tool to select the mode they want if a choice exists. Validate and switch to well-formed "silently" is a possible mode of operation. But I agree on requesting that each application formerly state its possible modes of operations. > > >All parsers (i.e. tools that take XML documents and apply the criteria in >XML-LANG only) should state their attitude and behaviour to WFness and validity. > > >The possible options include at least: > - nsgmls-like. Full validation is the only option. Any non-valid > dcoument is flagged and appropriate error messages or error > action is initiated. > - Lark-like (at least V0.88 - I think there is another coming). No > validation can be attempted. Any 'output' can only be WF or > in error. NOTE: what does Lark do with the internal subset? > - NXP-like. Validation can be switched on or off by the 'client'. > How this is transmitted to the application is application > dependent at present. > - MSXML-like. Undocumented at present. Possibly [though Terry and I > hope not] validating by default, and changing to WF if this > fails. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Mon Jun 9 12:51:24 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:55 2004 Subject: Re WF, V, and MSXML In-Reply-To: <199706082339.QAA08654@bolt.sonic.net> Message-ID: In message <199706082339.QAA08654@bolt.sonic.net>, Terry Allen writes >| >| > But for an XML parser, the boundaries are shifted, because >| > it has to deal with an XML document that *includes* the prologue >| > (XMLlang production 23, where "element" corresponds to the SGML >| > "document instance set", I think). I don't know whether this is a good >| > idea or not, just trying to understand it as an early adopter. I don't see _any_ difference between SGML and XML on this front. SGML parsers also have to deal with the prolog: the formal syntax of an "SGML document entity" is: S SGML declaration, prolog, document instance set, 'entity end' signal (so in fact they also have to deal with the SGML declaration as well!) The fact that the default ESIS output from the parser doesn't include any DTD-related information shouldn't be taken to mean the parser hasn't processed this information. >| I am actually unclear whether a WF-only parser (e.g. Lark) has to read the >| internal subset at all, other than skipping to the ']>' at the end. If it >| *does* read and parse it, what does it do with the information. For example, > >The soft spot here is the first line of 2.2, where "match" is not >defined except that later in that section it "implies" a few things, >which are not apparently meant to be a complete set. What the >WF document matches is production 23, Prolog element Misc*. As >the processor attempting to determine WFness must look inside element to >determine WFness, presumably the same is true of prolog. > > ... unless I determine WFness by *parsing* with a *real parser* which >the processor is not meant to be ... I would read the existing XML spec in a stricter spirit than you have done. To me, "match" means just that, i.e. that _if_ a WF document has an internal or an external DTD, these should be parsed as though for a valid XML document. Any _syntactic_ errors in the DTD should be flagged, even in 'WF' mode. (Bear in mind that no-one is forcing WF documents to have a DTD at all, except for entity declarations.) If you try to adopt a 'don't care' mode of parsing for the DTD when dealing with WF documents, you probably create many more problems than you solve. The only difference is the use that is made of the DTD information: in a WF document only the entity declarations matter to the parser. >| what is the implied structure of the document in: >| >| | >| ]> >| >| >| Can we assume that FOO (which has no Element declaration) has an ATTLIST as >| given, and that therefore it inherits the SHOW and ACTUATE attributes? >| IOW *must* a parser decorate all matching elements with the ATTLISTS in the >| internal subset? > >No, not per XMLlang alone. FOO's only declared attribute has as its name >the unreserved string "XML-LINK" although it uses an undeclared attribute >name "HREF". So it is WF but not valid. .. and since it is only well-formed and not valid, it cannot (in my view) partake in any operations that require knowledge of Message-ID: <339BE74F.4015@hiwaay.net> Peter Murray-Rust wrote: > > IMO it > is more important for the WG/ERB to address *this* problem than some of the > proposed extensions. That is right. Until the core is worked out and clearer, proposing extensions is premature. > The concept of WFness is NEW!! No it is not. This is the way that IADS and IDE/AS work today and have since 1990. The question is, what does one do with the DTD. In these products, parsing for the instance is internal to the product. DTD-centric parsing is done in batch. That may not be the solution people want, but it is one way. Well-formedness also has precedents in Xerox systems of the period. T'is new to thee, Miranda. > > All parsers (i.e. tools that take XML documents and apply the criteria in > XML-LANG only) should state their attitude and behaviour to WFness and validity. > You mean you REALLY want interoperable tools? How quaint. > The possible options include at least: > - nsgmls-like. Full validation is the only option. Any non-valid > dcoument is flagged and appropriate error messages or error > action is initiated. IOW, always parse using a DTD. Does the presence of the DOCTYPE indicate that one exists, and maybe, where to find it? Is the presence of the DOCTYPE enough to tell the system that one ought to exist? I don't want to always send a DTD. I do want to be able to use SGML techniques that worked in the past and still work sensibly. > - Lark-like (at least V0.88 - I think there is another coming). No > validation can be attempted. Any 'output' can only be WF or > in error. NOTE: what does Lark do with the internal subset? Nyet. > - NXP-like. Validation can be switched on or off by the 'client'. > How this is transmitted to the application is application > dependent at present. This is the best approach if the flag is clear to all. > - MSXML-like. Undocumented at present. Possibly [though Terry and I > hope not] validating by default, and changing to WF if this > fails. Ok. len xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Mon Jun 9 15:14:05 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:56 2004 Subject: Re WF, V, and MSXML Message-ID: <3.0.32.19970609150843.00b53528@pop.intergate.bc.ca> The fact that this debate can exist is kind of puzzling to me. Check out section 5, "Conformance". A processor can either be validating or non-validating. At no point in the spec does anything say or suggest that whether or not the processor validates has anything to do with what is in the document being processed. I haven't looked at MSXML closely, but NXP's behavior is obviously correct in this respect - it validates or not at user request. What am I missing? -Tim xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tallen at sonic.net Mon Jun 9 17:08:23 1997 From: tallen at sonic.net (Terry Allen) Date: Mon Jun 7 16:57:56 2004 Subject: Re WF, V, and MSXML Message-ID: <199706091508.IAA30900@bolt.sonic.net> Peter Murray-Rust wrote re me: | > Point taken; but the spec is not entirely clean on this point. If the | > application requests the processor to process, the processor must | > inform the application of certain things. And it is hard to get | > around | > | > "*An XML processor which does not read the DTD must always pass all | > characters in a document that are not markup through to the application.* | | Ah! I had assumed the internal subset as 'markup' - you see it as part | of the document. We need a ruling on this :-). Obviously if the DTD appears | ***in the processed document***, then it could be interpreted as having been | read and used for validation. No, I agree it's markup; the quote is meant to establish the point that the spec does talk about the processor sending stuff (output) to the application (in response to your statement that the spec was neutral on this issue). Tim Bray asked, without specific context: | | The fact that this debate can exist is kind of puzzling to me. Check | out section 5, "Conformance". A processor can either be validating | or non-validating. At no point in the spec does anything say or suggest | that whether or not the processor validates has anything to do with | what is in the document being processed. I haven't looked at MSXML | closely, but NXP's behavior is obviously correct in this respect - | it validates or not at user request. | | What am I missing? -Tim Clarity in writing. If a processor is nonvalidating, must it examine the document for WFness? may it? may it not? I understood (part of) what Peter and I were discussing to be whether and what the XMLlang spec requires a processor to send to an application, and under what conditions. MSXML sends a munged version of the infernal subset, which I first thought must be required by the spec. I now see it doesn't. We also pondered whether a processor that is nonvalidating must examine for WFness (a) the internal subset and, or, (b) the external subset. I am pretty sure that (a) is required, but don't know about (b). The spec speaks of processors that don't "read the DTD", yet the internal subset is part of the DTD and apparently must "match" the prolog production. I suggest that all passages mentioning "processors" and "DTDs" be reviewed for consistency. Regards, Terry Allen Electronic Publishing Consultant tallen[at]sonic.net http://www.sonic.net/~tallen/ Davenport and DocBook: http://www.ora.com/davenport/index.html T.A. at Passage Systems: terry.allen[at]passage.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From clovett at microsoft.com Tue Jun 10 04:08:09 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:57:56 2004 Subject: WF, V, and MSXML Message-ID: <41135C785691CF11B73B00805FD4D2D702A3F8A5@RED-17-MSG.dns.microsoft.com> > I stuck the DTD in the internal subset because I couldn't get the parser > to find an external DTD. I think there's a problem with resolving relative URL's when the XML file is local and this only happens under certain versions of the Java VM. I never have a problem if the DTD is specified with a full URL. A fix will be posted when one is found. > WF versus Validity... I agree with what seems to be a general consensis that DTD compliance should be switchable. Currently the MSXML parser handles internal subsets the same as external DTD's, and we decided not to try and do any error recovery, so it is possible that there are also bugs in the MSXML validity code. These will be fixed promptly. > Outputting the internal subset... The thinking here is that the MSXML "Document" and "Element" classes should represent a complete object model for tools and applications that wish to manipulate XML documents, which means being able to recreate a complete XML file after being manipulated. This is different from the traditional "filter" approach where the XML processor is a one-way filter. I think the "object model" approach is a good one for the encouragement of an XML-based application development environment. It just so happens that the command line "msxml" tool that we shipped with the parser (so people could easily play with the parser) does a full dump of the XML document - which includes any internal subset. In fact, the Document class separates out the DTD from the XML Data. If you want to get the XML data only, call Document.getRoot. Eventually if people want to build tools to manipulate the external DTD, it should also be possible to re-publish that DTD using the Object Model API. Currently the Document.save method doesn't do this, but eventually we may add that feature. People have also requested other options on the Document.save method, like whether to pretty-print or not. See http://www.w3.org/MarkUp/DOM/ for more on this topic. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From cbullard at hiwaay.net Tue Jun 10 05:50:21 1997 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 16:57:56 2004 Subject: WF, V, and MSXML References: <41135C785691CF11B73B00805FD4D2D702A3F8A5@RED-17-MSG.dns.microsoft.com> Message-ID: <339CCEE1.6CF3@hiwaay.net> Chris Lovett wrote: > People have also requested other options on > the Document.save method, like whether to pretty-print or not. See > http://www.w3.org/MarkUp/DOM/ for more on this topic. I haven't seen this before. It adds a few wrinkles. What does this mean? "5.Events will bubble through the structural hierarchy of the document." len bullard xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From davidsch at microsoft.com Tue Jun 10 17:19:32 1997 From: davidsch at microsoft.com (David Schach) Date: Mon Jun 7 16:57:56 2004 Subject: Re WF, V, and MSXML Message-ID: <011290D45A8ACF119B8B00805FD471D6033DA955@RED-24-MSG.dns.microsoft.com> > > > > > >All parsers (i.e. tools that take XML documents and apply the > criteria in > >XML-LANG only) should state their attitude and behaviour to WFness > and > validity. > > > > > >The possible options include at least: > > - nsgmls-like. Full validation is the only option. Any > non-valid > > dcoument is flagged and appropriate error messages or > error > > action is initiated. > > - Lark-like (at least V0.88 - I think there is another coming). > No > > validation can be attempted. Any 'output' can only be > WF or > > in error. NOTE: what does Lark do with the internal > subset? > > - NXP-like. Validation can be switched on or off by the > 'client'. > > How this is transmitted to the application is > application > > dependent at present. > > - MSXML-like. Undocumented at present. Possibly [though Terry > and I > > hope not] validating by default, and changing to WF if > this > > fails. > [David Schach] The XML spec seems to address this issue in section 2.20 Required Markup Declaration. In an RMD, the value NONE indicates that an XML processor can parse the document correctly without first reading any part of the DTD. The value INTERNAL indicates that the XML processor must read and process the internal subset of the DTD, if provided, to parse the containing document correctly. The value ALL indicates that the XML processor must read and process the declarations in both the subsets of the DTD, if provided, to parse the containing document correctly. ... If no RMD is provided, an XML processor must behave as though an RMD had been provided with the value ALL. [David Schach] (emphasis added) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Tue Jun 10 23:36:06 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:56 2004 Subject: Re WF, V, and MSXML Message-ID: <7833@ursus.demon.co.uk> In message <011290D45A8ACF119B8B00805FD471D6033DA955@RED-24-MSG.dns.microsoft.com> David Schach writes: [...] > [David Schach] The XML spec seems to address this issue in > section 2.20 Required Markup Declaration. My problem is with the equivalence or not of the words 'parse', 'process' and 'validate'. I hope this isn't being seen as mindless pickiness. > > In an RMD, the value NONE indicates that an XML > processor can parse the document correctly without first reading any ^^^^^ If RMD=NONE then the document cannot be validated. Therefore "parse"!="validate" > part of the DTD. The value INTERNAL indicates that the XML processor > must read and process the internal subset of the DTD, if provided, to ^^^^^^^ Presumable means extract the structure of the DTD for 'processing' the document. > parse the containing document correctly. The value ALL indicates that > the XML processor must read and process the declarations in both the ^^^^^^^ i.e. interpret the DTD subset(s) > subsets of the DTD, if provided, to parse the containing document ^^^^^ > correctly. > > ... > > If no RMD is provided, an XML processor must behave as > though an RMD had been provided with the value ALL. [David Schach] > (emphasis added) Here is a possible document ]> Now, on the argument above (document is in control) the processor parses the document. It cannot be valid, but does the processor try? If yes, it fails. The result is either a null document, *or* error recovery to WF parsing. If the parser does not try to validate, the result is However, although the spec [5] mentions processors that validate and non-validate, in other places (e.g. [2.8]) it uses the phrase 'reads the DTD'. This implies that there are (possibly) three classes of processor: - a validator (which must always read the DTD) - a busy non-validator (which reads the DTD not for validation, but for extracting DTD-based markup) - a lazy non-validator (which does not read the DTD). The lazy non-validator will produce a different output from the busy non-validator, i.e.: The lazy non-validator could be in violation of the spec if the RMD requires it to parse the DTD subset(s). Maybe it parses them but throws them away (i.e. 'does not read' == 'reads and forgets'). P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jjc at jclark.com Wed Jun 11 04:34:28 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:57:56 2004 Subject: Re WF, V, and MSXML Message-ID: <2.2.32.19970611021638.017716d4@jclark.com> At 10:00 09/06/97 GMT, Peter Murray-Rust wrote: >The possible options include at least: > - nsgmls-like. Full validation is the only option. Any non-valid > dcoument is flagged and appropriate error messages or error > action is initiated. The current version of nsgmls (the one in jade 0.8) supports a -wno-valid which disables most validation. With this option it doesn't complain about undeclared element types and attributes. However, - if you supply an attribute definition, then it will check that instances of that attribute conform (it will of course continue parsing even if they don't) - if you declare a content model for an element type, then it will check that the content of the element matches the content model, except that it will not complain about the occurrence of any element types for which no content model has been declared (again it will recover from errors of this sort). James xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Wed Jun 11 14:48:28 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:56 2004 Subject: Repeating attribute specifications Message-ID: Hi, Is there anything in the XML spec which corresponds to the SGML stricture that "there can only be one attribute specification for each attribute definition", i.e. that you can't have repeated attribute specifications within a single start-tag? If not, XML will allow e.g. while SGML won't. Which would be a 'for compatibility' issue. Richard Light SGML and Museum Information Consultancy richard@light.demon.co.uk 3 Midfields Walk Burgess Hill West Sussex RH15 8JA U.K. tel. (44) 1444 232067 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From davidsch at microsoft.com Wed Jun 11 18:04:05 1997 From: davidsch at microsoft.com (David Schach) Date: Mon Jun 7 16:57:56 2004 Subject: Repeating attribute specifications Message-ID: <011290D45A8ACF119B8B00805FD471D60341356F@RED-24-MSG.dns.microsoft.com> See Section 3.1 Start and End Tags Validity Constraint - Unique Att Spec: No attribute may appear more than once in the same start-tag. > -----Original Message----- > From: Richard Light [SMTP:richard@light.demon.co.uk] > Sent: Wednesday, June 11, 1997 4:55 AM > To: xml-dev@ic.ac.uk > Subject: Repeating attribute specifications > > Hi, > > Is there anything in the XML spec which corresponds to the SGML > stricture that "there can only be one attribute specification for each > attribute definition", i.e. that you can't have repeated attribute > specifications within a single start-tag? > > If not, XML will allow e.g. > > > > while SGML won't. Which would be a 'for compatibility' issue. > > Richard Light > SGML and Museum Information Consultancy > richard@light.demon.co.uk > 3 Midfields Walk > Burgess Hill > West Sussex RH15 8JA > U.K. > tel. (44) 1444 232067 > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Wed Jun 11 23:34:45 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:56 2004 Subject: Repeating attribute specifications Message-ID: <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca> >Is there anything in the XML spec which corresponds to the SGML >stricture that "there can only be one attribute specification for each >attribute definition", i.e. that you can't have repeated attribute >specifications within a single start-tag? No. This is legal in XML. And in SGML, with the recent TC. -T. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From clovett at microsoft.com Thu Jun 12 00:32:47 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:57:56 2004 Subject: Event Bubbling... Message-ID: <41135C785691CF11B73B00805FD4D2D702A3F8B3@RED-17-MSG.dns.microsoft.com> > > People have also requested other options on > > the Document.save method, like whether to pretty-print or not. See > > http://www.w3.org/MarkUp/DOM/ for more on this topic. > > I haven't seen this before. It adds a few wrinkles. > > What does this mean? > > "5.Events will bubble through the structural hierarchy of the > document." > See http://www.microsoft.com/workshop/prog/inetsdk/docs/inet0505.htm xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From bdonoghoe at spin.net.au Thu Jun 12 01:14:53 1997 From: bdonoghoe at spin.net.au (Bill Donoghoe) Date: Mon Jun 7 16:57:57 2004 Subject: Repeating attribute specifications Message-ID: <199706112312.JAA19010@spin.net.au> Hello, I believe the answer to the question appears in Section 3.3 of the XML syntax spec When more than one AttlistDecl is provided for a given element type, the contents of all those provided are merged. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and later declarations are ignored. For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, and at most one attribute definition for a given attribute name. An XML processor may, at user option, issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition for a given attribute, but this is not an error. Therefore, in the DTD you can have multiple attribute list declarations for an element (even multiple declarations of the same attribute). However, in XML documents an attribute can only occur once inside a start tag. Example: In the DTD the following is valid but the second declaration of the attribute role will be ignored. The SGML (before the changes from the HyTime T.C. flow through) equivalent is: At 09:03 11/06/97 -0700, you wrote: >See Section 3.1 Start and End Tags > > Validity Constraint - Unique Att Spec: > > No attribute may appear more than once in the same start-tag. > >> -----Original Message----- >> From: Richard Light [SMTP:richard@light.demon.co.uk] >> Sent: Wednesday, June 11, 1997 4:55 AM >> To: xml-dev@ic.ac.uk >> Subject: Repeating attribute specifications >> >> Hi, >> >> Is there anything in the XML spec which corresponds to the SGML >> stricture that "there can only be one attribute specification for each >> attribute definition", i.e. that you can't have repeated attribute >> specifications within a single start-tag? >> >> If not, XML will allow e.g. >> >> >> >> while SGML won't. Which would be a 'for compatibility' issue. >> >> Richard Light >> SGML and Museum Information Consultancy >> richard@light.demon.co.uk >> 3 Midfields Walk >> Burgess Hill >> West Sussex RH15 8JA >> U.K. >> tel. (44) 1444 232067 >> >> xml-dev: A list for W3C XML Developers >> Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >> To unsubscribe, send to majordomo@ic.ac.uk the following message; >> unsubscribe xml-dev >> List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > >xml-dev: A list for W3C XML Developers >Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ >To unsubscribe, send to majordomo@ic.ac.uk the following message; >unsubscribe xml-dev >List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > > Bill Donoghoe email: bdonoghoe@acslink.net.au Systems Analyst & SGML Consultant "Do you want some information or all of the data?" xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Thu Jun 12 18:02:32 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:57 2004 Subject: Repeating attribute specifications In-Reply-To: <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca> Message-ID: In message <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca>, Tim Bray writes >>Is there anything in the XML spec which corresponds to the SGML >>stricture that "there can only be one attribute specification for each >>attribute definition", i.e. that you can't have repeated attribute >>specifications within a single start-tag? > >No. This is legal in XML. And in SGML, with the recent TC. -T. The other answer I got to this question quoted the XML Lang spec (section 3.1): "Validity constraint - Unique Att Spec: No attribute may appear more than once in the same start-tag." This seemed to deal with the issue pretty conclusively: I had just failed to look under "start-tags" while thinking about attributes ;-) Is this all about to change with the 30 June update? Richard Light SGML and Museum Information Consultancy richard@light.demon.co.uk 3 Midfields Walk Burgess Hill West Sussex RH15 8JA U.K. tel. (44) 1444 232067 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From davidsch at microsoft.com Thu Jun 12 19:15:33 1997 From: davidsch at microsoft.com (David Schach) Date: Mon Jun 7 16:57:57 2004 Subject: Repeating attribute specifications Message-ID: <011290D45A8ACF119B8B00805FD471D60344E247@RED-24-MSG.dns.microsoft.com> I think Tim misunderstood your question. In the XML DTD, it is legal to have multiple AttistDecl's for a given element type (see section 3.3). This doesn't change the validity constraint of section 3.1. Attributes in tags have to be unique. > -----Original Message----- > From: Richard Light [SMTP:richard@light.demon.co.uk] > Sent: Thursday, June 12, 1997 1:19 AM > To: xml-dev@ic.ac.uk > Subject: Re: Repeating attribute specifications > > In message <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca>, Tim > Bray writes > >>Is there anything in the XML spec which corresponds to the SGML > >>stricture that "there can only be one attribute specification for > each > >>attribute definition", i.e. that you can't have repeated attribute > >>specifications within a single start-tag? > > > >No. This is legal in XML. And in SGML, with the recent TC. -T. > > The other answer I got to this question quoted the XML Lang spec > (section 3.1): > > "Validity constraint - Unique Att Spec: > No attribute may appear more than once in the same start-tag." > > This seemed to deal with the issue pretty conclusively: I had just > failed to look under "start-tags" while thinking about attributes ;-) > > Is this all about to change with the 30 June update? > > Richard Light > SGML and Museum Information Consultancy > richard@light.demon.co.uk > 3 Midfields Walk > Burgess Hill > West Sussex RH15 8JA > U.K. > tel. (44) 1444 232067 > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Thu Jun 12 20:01:01 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:57 2004 Subject: Repeating attribute specifications Message-ID: <7923@ursus.demon.co.uk> In message <011290D45A8ACF119B8B00805FD471D60344E247@RED-24-MSG.dns.microsoft.com> David Schach writes: > I think Tim misunderstood your question. In the XML DTD, it is legal to > have multiple AttistDecl's for a given element type (see section 3.3). > This doesn't change the validity constraint of section 3.1. Attributes > in tags have to be unique. I think I have misunderstood the answers as well :-) I'd be grateful for a very simple explanation. I assumed that the multiple attributes was so that if (say) occurs in the external DTD and occurs in the internal subset then this is now legal whereas it wasn't before. But what is now the default value of BAR? I assumed it was the later declaration ("XYZZY"). Please disabuse me if this is wrong. [I assume that is illegal, still. If not we have some software to rewrite.] P. > > > -----Original Message----- > > From: Richard Light [SMTP:richard@light.demon.co.uk] > > Sent: Thursday, June 12, 1997 1:19 AM > > To: xml-dev@ic.ac.uk > > Subject: Re: Repeating attribute specifications > > > > In message <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca>, Tim > > Bray writes > > >>Is there anything in the XML spec which corresponds to the SGML > > >>stricture that "there can only be one attribute specification for > > each > > >>attribute definition", i.e. that you can't have repeated attribute > > >>specifications within a single start-tag? > > > > > >No. This is legal in XML. And in SGML, with the recent TC. -T. > > > > The other answer I got to this question quoted the XML Lang spec > > (section 3.1): > > > > "Validity constraint - Unique Att Spec: > > No attribute may appear more than once in the same start-tag." > > > > This seemed to deal with the issue pretty conclusively: I had just > > failed to look under "start-tags" while thinking about attributes ;-) > > > > Is this all about to change with the 30 June update? > > > > Richard Light > > SGML and Museum Information Consultancy > > richard@light.demon.co.uk > > 3 Midfields Walk > > Burgess Hill > > West Sussex RH15 8JA > > U.K. > > tel. (44) 1444 232067 > > > > xml-dev: A list for W3C XML Developers > > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > > To unsubscribe, send to majordomo@ic.ac.uk the following message; > > unsubscribe xml-dev > > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Thu Jun 12 20:24:02 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:57 2004 Subject: Repeating attribute specifications Message-ID: <3.0.32.19970612111857.00a54820@pop.intergate.bc.ca> At 06:53 PM 12/06/97 GMT, Peter Murray-Rust wrote: >[I assume that > >is illegal, still. If not we have some software to rewrite.] Yes, it still is. Yes, I screwed up. Sigh. As Michael pointed out, we have a spec bug in that this is a WFC, not a VC. -T. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From clovett at microsoft.com Thu Jun 12 21:18:51 1997 From: clovett at microsoft.com (Chris Lovett) Date: Mon Jun 7 16:57:57 2004 Subject: Re WF, V, and MSXML Message-ID: <41135C785691CF11B73B00805FD4D2D702A3F8B9@RED-17-MSG.dns.microsoft.com> Regading all the discussion about the RMD attribute and switching validation on and off and error recovery and so on.... The reason MSXML doesn't implement RMD yet is because there are problems with the RMD=IGNORE concept since ignoring the DTD can result in different data being given to the application - which generally is a bad thing. The spec says it is an error to specify RMD=IGNORE if the DTD contains any declarations of: 1) attributes with default values, if elements to which these attributes apply appear in the document instance without specifying values for these attributes, or 2) entities. (other than the built in entities), if references to those entities appear in the document instance, or 3) element types with element content, if white space occurs in the document instance directly within any instance of those types. The problem is that if the parser ignores the DTD, how can it detect #1 above ? Also, the white space handling can be ambiguous. So, MSXML currently takes the following approach: - RMD attribute is not implmented yet, so if a DTD is there it uses it. - If an error is found it stops. No error recovery is attempted. - If you don't want validation, remove the DTD. - It is ok to not define some of the elements in the DTD. This simply means that in the same document there is certain data that you want to guarantee to be correct, and other data that is more unknown in structure (but still well-formed). This is simply a side effect of being able to parse a document without a DTD. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From paul at arbortext.com Thu Jun 12 23:49:50 1997 From: paul at arbortext.com (Paul Grosso) Date: Mon Jun 7 16:57:57 2004 Subject: Repeating attribute specifications Message-ID: <3.0.32.19970612174836.006d473c@pophost.arbortext.com> At 18:53 1997 06 12 GMT, Peter Murray-Rust wrote: >I assumed that the multiple attributes was so that if (say) > > > >occurs in the external DTD and > > > >occurs in the internal subset >then this is now legal whereas it wasn't before. But what is now the default >value of BAR? I assumed it was the later declaration ("XYZZY"). Please >disabuse me if this is wrong. Your answer seems to be in 3.3 of the XML-lang spec: When more than one AttlistDecl is provided for a given element type, the contents of all those provided are merged. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and later declarations are ignored. For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, and at most one attribute definition for a given attribute name. An XML processor may, at user option, issue a warning when more than one attribute-list declar ation is provided for a given element type, or more than one attribute definition for a given attribute, but this is not an error. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From davidsch at microsoft.com Thu Jun 12 23:57:39 1997 From: davidsch at microsoft.com (David Schach) Date: Mon Jun 7 16:57:57 2004 Subject: Repeating attribute specifications Message-ID: <011290D45A8ACF119B8B00805FD471D603459316@RED-24-MSG.dns.microsoft.com> Per section 3.3 When more than one AttlistDecl is provided for a given element type, the contents of all those provided are merged. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and the later declarations are ignored. In your example, the definition in the internal DTD, is processed first so it takes precedence over the definition in the external DTD. > -----Original Message----- > From: Peter@ursus.demon.co.uk [SMTP:Peter@ursus.demon.co.uk] > Sent: Thursday, June 12, 1997 11:53 AM > To: xml-dev@ic.ac.uk > Subject: RE: Repeating attribute specifications > > In message > <011290D45A8ACF119B8B00805FD471D60344E247@RED-24-MSG.dns.microsoft.com > > David Schach writes: > > I think Tim misunderstood your question. In the XML DTD, it is > legal to > > have multiple AttistDecl's for a given element type (see section > 3.3). > > This doesn't change the validity constraint of section 3.1. > Attributes > > in tags have to be unique. > > I think I have misunderstood the answers as well :-) I'd be grateful > for a > very simple explanation. > > I assumed that the multiple attributes was so that if (say) > > > > occurs in the external DTD and > > > > occurs in the internal subset > then this is now legal whereas it wasn't before. But what is now the > default > value of BAR? I assumed it was the later declaration ("XYZZY"). > Please > disabuse me if this is wrong. [I assume that > > > > is illegal, still. If not we have some software to rewrite.] > > P. > > > > > > > -----Original Message----- > > > From: Richard Light [SMTP:richard@light.demon.co.uk] > > > Sent: Thursday, June 12, 1997 1:19 AM > > > To: xml-dev@ic.ac.uk > > > Subject: Re: Repeating attribute specifications > > > > > > In message <3.0.32.19970611161725.00b540ac@pop.intergate.bc.ca>, > Tim > > > Bray writes > > > >>Is there anything in the XML spec which corresponds to the SGML > > > >>stricture that "there can only be one attribute specification > for > > > each > > > >>attribute definition", i.e. that you can't have repeated > attribute > > > >>specifications within a single start-tag? > > > > > > > >No. This is legal in XML. And in SGML, with the recent TC. -T. > > > > > > The other answer I got to this question quoted the XML Lang spec > > > (section 3.1): > > > > > > "Validity constraint - Unique Att Spec: > > > No attribute may appear more than once in the same start-tag." > > > > > > This seemed to deal with the issue pretty conclusively: I had just > > > failed to look under "start-tags" while thinking about attributes > ;-) > > > > > > Is this all about to change with the 30 June update? > > > > > > Richard Light > > > SGML and Museum Information Consultancy > > > richard@light.demon.co.uk > > > 3 Midfields Walk > > > Burgess Hill > > > West Sussex RH15 8JA > > > U.K. > > > tel. (44) 1444 232067 > > > > > > xml-dev: A list for W3C XML Developers > > > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > > > To unsubscribe, send to majordomo@ic.ac.uk the following message; > > > unsubscribe xml-dev > > > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > > > xml-dev: A list for W3C XML Developers > > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > > To unsubscribe, send to majordomo@ic.ac.uk the following message; > > unsubscribe xml-dev > > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > > > > > -- > Peter Murray-Rust, domestic net connection > Virtual School of Molecular Sciences > http://www.vsms.nottingham.ac.uk/ > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri Jun 13 00:14:38 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:57 2004 Subject: Re WF, V, and MSXML Message-ID: <7934@ursus.demon.co.uk> In message <41135C785691CF11B73B00805FD4D2D702A3F8B9@RED-17-MSG.dns.microsoft.com> Chris Lovett writes: > > Regading all the discussion about the RMD attribute and > switching validation on and off and error recovery and so on.... I think MSXML has taken a reasonable position given the ambiguities... > > The reason MSXML doesn't implement RMD yet is because there are > problems with the RMD=IGNORE concept since ignoring the DTD can result Agreed. [I'm working from XML-lang-970331, which doesn't use RMD="IGNORE". Is this the same as "NONE"?] > in different data being given to the application - which generally is a > bad thing. The spec says it is an error to specify RMD=IGNORE if the ^^^^^^^^^^^ I would have said it was always a bad thing! > DTD contains any declarations of: > 1) attributes with default values, if elements to which > these attributes apply appear in the document instance without > specifying values for these attributes, or > 2) entities. (other than the built in entities), if > references to those entities appear in the document instance, or > 3) element types with element content, if white space > occurs in the document instance directly within any instance of those > types. > > The problem is that if the parser ignores the DTD, how can it > detect #1 above ? Also, the white space handling can be ambiguous. Agreed. I think the ERB have to consider this. I cannot see how a parser (even with RMD="NONE") may not read the DTD. I think the option is really related only to #3. > > So, MSXML currently takes the following approach: > - RMD attribute is not implmented yet, so if a DTD is > there it uses it. ...........^^^^ This is an ambigous word :-) It can mean either the creation of the proper document content and/or validation. > - If an error is found it stops. No error recovery is > attempted. :-) > - If you don't want validation, remove the DTD. Ah, but you cannot use entities or default attribute values. > - It is ok to not define some of the elements in the > DTD. This simply means that in the same document there is certain data > that you want to guarantee to be correct, and other data that is more > unknown in structure (but still well-formed). This is simply a side > effect of being able to parse a document without a DTD. This implies partial validation, which we don't have. There is no reason for defining any ELEMENTs if the document is not validated (and the element content not analysed). P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From davidsch at microsoft.com Fri Jun 13 04:27:39 1997 From: davidsch at microsoft.com (David Schach) Date: Mon Jun 7 16:57:57 2004 Subject: Entities in Attribute Values Message-ID: <011290D45A8ACF119B8B00805FD471D603463877@RED-24-MSG.dns.microsoft.com> Because entity references are allowed inside ot attribute values, it is not possible to store an unmodified URL with data in an attribute. For example, the following XML is not valid because the '&''s are not escaped inside of SELF's HREF value. Daily Comics This makes it inconvenient to store URL's in XML files. Would it anyone be interested in changing entity processing to fix this? xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From aray at q2.net Fri Jun 13 06:51:45 1997 From: aray at q2.net (Arjun Ray) Date: Mon Jun 7 16:57:57 2004 Subject: Entities in Attribute Values In-Reply-To: <011290D45A8ACF119B8B00805FD471D603463877@RED-24-MSG.dns.microsoft.com> Message-ID: On Thu, 12 Jun 1997, David Schach wrote: > Because entity references are allowed inside ot attribute values, it is > not possible to store an unmodified URL with data in an attribute. For > example, the following XML is not valid because the '&''s are not > escaped inside of SELF's HREF value. This is a problem only if '&' *must* be the field separator. Why not something else,, like ';' ? > HREF="http://someserver/scripts/oleisapi2.dll/comics.custom.cdf?comics=o > n&dilbert=on&calvin=on&peanuts=on" /> > > This makes it inconvenient to store URL's in XML files. Would it anyone > be interested in changing entity processing to fix this? IMHO, there's no need for that. Or, at any rate, there shouldn't be. Using '&' as a field separator in "query URLs" is a historical artefact of lack of RTFM. The problem was recognized reasonably early too, and a fix was proposed, but no HTML browser implementor of, ah, consequence ever got a Round Tuit. >From RFC 1866, Section 8.2.1 "The form-urlencoded Media Type": NOTE - The URI from a query form submission can be used in a normal anchor style hyperlink. Unfortunately, the use of the `&' character to separate form fields interacts with its use in SGML attribute values as an entity reference delimiter. For example, the URI `http://host/?x=1&y=2' must be written `'. HTTP server implementors, and in particular, CGI implementors are encouraged to support the use of `;' in place of `&' to save users the trouble of escaping `&' characters this way. We're not committed to perpetauting mistakes, are we? Arjun xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtigue at datachannel.com Fri Jun 13 06:57:47 1997 From: jtigue at datachannel.com (John Tigue) Date: Mon Jun 7 16:57:57 2004 Subject: CDF DTD Message-ID: <33A0D377.7FE9707F@datachannel.com> I've got an incomplete DTD for CDF. I'd like to check some CDF files for validity. Does anyone know of a complete version? -- John Tigue Programmer jtigue@datachannel.com DataChannel (http://www.datachannel.com) 206-462-1999 -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 316 bytes Desc: Card for John Tigue Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970613/1595bd60/vcard.vcf From jjc at jclark.com Fri Jun 13 14:43:21 1997 From: jjc at jclark.com (James Clark) Date: Mon Jun 7 16:57:57 2004 Subject: Re WF, V, and MSXML Message-ID: <199706131242.GAA21549@jclark.com> >The reason MSXML doesn't implement RMD yet is because there are >problems with the RMD=IGNORE concept since ignoring the DTD can result >in different data being given to the application - which generally is a >bad thing. The spec says it is an error to specify RMD=IGNORE if the >DTD contains any declarations of: >1) attributes with default values, if elements to which >these attributes apply appear in the document instance without >specifying values for these attributes, or >The problem is that if the parser ignores the DTD, how can it >detect #1 above ? Obviously it can't. If a parser wants to fully validate an XML document it has to read the entire DTD. One of the things it must validate, if the document has RMD=IGNORE, is that the DTD could be ignored without changing the data the application received. A parser that is not validating, on the other hand, can choose to take advantage of the RMD decl and not parse the DTD. Provided that the document has been validated, the non-validating parser will be guaranteed to get the same results as the validating parser. James xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From davidsch at microsoft.com Fri Jun 13 18:21:58 1997 From: davidsch at microsoft.com (David Schach) Date: Mon Jun 7 16:57:57 2004 Subject: Entities in Attribute Values Message-ID: <011290D45A8ACF119B8B00805FD471D60346D795@RED-24-MSG.dns.microsoft.com> The use of the & as the CGI separator character is a well established convention and unlikely to change. It will continue whether we support it or not. > -----Original Message----- > From: Arjun Ray [SMTP:aray@q2.net] > Sent: Thursday, June 12, 1997 10:03 PM > To: xml-dev@ic.ac.uk > Subject: Re: Entities in Attribute Values > > > > On Thu, 12 Jun 1997, David Schach wrote: > > > Because entity references are allowed inside ot attribute values, it > is > > not possible to store an unmodified URL with data in an attribute. > For > > example, the following XML is not valid because the '&''s are not > > escaped inside of SELF's HREF value. > > This is a problem only if '&' *must* be the field separator. Why not > something else,, like ';' ? > > > > > HREF="http://someserver/scripts/oleisapi2.dll/comics.custom.cdf?comics > =o > > n&dilbert=on&calvin=on&peanuts=on" /> > > > > This makes it inconvenient to store URL's in XML files. Would it > anyone > > be interested in changing entity processing to fix this? > > IMHO, there's no need for that. Or, at any rate, there shouldn't be. > Using > '&' as a field separator in "query URLs" is a historical artefact of > lack > of RTFM. The problem was recognized reasonably early too, and a fix > was > proposed, but no HTML browser implementor of, ah, consequence ever got > a > Round Tuit. > > From RFC 1866, Section 8.2.1 "The form-urlencoded Media Type": > > NOTE - The URI from a query form submission can be > used in a normal anchor style hyperlink. > Unfortunately, the use of the `&' character to > separate form fields interacts with its use in SGML > attribute values as an entity reference delimiter. > For example, the URI `http://host/?x=1&y=2' must be > written ` href="http://host/?x=1&y=2">'. > > HTTP server implementors, and in particular, CGI > implementors are encouraged to support the use of > `;' in place of `&' to save users the trouble of > escaping `&' characters this way. > > We're not committed to perpetauting mistakes, are we? > > > Arjun > > > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Fri Jun 13 21:06:36 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:57 2004 Subject: Repeating attribute specifications In-Reply-To: <7923@ursus.demon.co.uk> Message-ID: In message <7923@ursus.demon.co.uk>, Peter Murray-Rust writes > >I assumed that the multiple attributes was so that if (say) > > > >occurs in the external DTD and > > > >occurs in the internal subset >then this is now legal whereas it wasn't before. But what is now the default >value of BAR? I assumed it was the later declaration ("XYZZY"). Please >disabuse me if this is wrong. Yes, you're right, but the reason why should be made clear. (It is _not_ because it's "the later declaration"!): The attribute-list declaration for element type FOO: is read _first_ because it is in the internal DTD subset, which is processed before the external DTD subset. So the attribute definition for the element type BAR takes precedence over that given in the other attribute-list declaration for FOO: because "the first declaration is binding and later declarations are ignored". Note that it is the _whole_ attribute definition: BAR CDATA "XYZZY" which is used, not just the default value as you suggest. (The two attribute declarations might have specified different attribute types.) Sorry to have caused confusion in the first place! Richard Light SGML and Museum Information Consultancy richard@light.demon.co.uk 3 Midfields Walk Burgess Hill West Sussex RH15 8JA U.K. tel. (44) 1444 232067 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From cbullard at hiwaay.net Sat Jun 14 03:01:22 1997 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 16:57:57 2004 Subject: Entities in Attribute Values References: <011290D45A8ACF119B8B00805FD471D60346D795@RED-24-MSG.dns.microsoft.com> Message-ID: <33A1E667.49B8@hiwaay.net> David Schach wrote: > > The use of the & as the CGI separator character is a well established > convention and unlikely to change. It will continue whether we support > it or not. The use of the & character is a well established convention and was before the query URL designers made their mistake. It will continue to be so. Impasse. It happens in all cases of failure to RTFM. The question now is what to do about it. Since as Arjun has shown, it is a documented mistake, now is the time to fix that. len bullard xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From indigo at MIT.EDU Sat Jun 14 08:01:23 1997 From: indigo at MIT.EDU (Hyung-Jin Kim) Date: Mon Jun 7 16:57:58 2004 Subject: hi! Message-ID: <9706140601.AA10887@MIT.MIT.EDU> I'm new to this list so I apologize if this question has been answered already: I was wondering if anyone knew of an parser that made well-formed XML files from HTML files. I know of a few tools that can DETECT mal-formed tags in HTML (i.e. weblint) but is there a tool that will do the conversion? Thanks! Please reply directly to me. -jim ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Hyung-Jin Kim 407 Memorial Dr. M.I.T. ,,, Cambridge, MA 02139 Cambridge, MA (o o) (617)494-9907 ~~~~~~~~~~~~~~~~~~~~~~~~~~oOOo(_)oOOo~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat Jun 14 11:05:44 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:58 2004 Subject: hi! Message-ID: <7993@ursus.demon.co.uk> In message <9706140601.AA10887@MIT.MIT.EDU> Hyung-Jin Kim writes: > I'm new to this list so I apologize if this question has been answered already: We have not had enough discussion about HTML on this list - and I, for one, would like version(s) of XMLised DTDs and documents. > > I was wondering if anyone knew of an parser that made well-formed XML files > from HTML files. I know of a few tools that can DETECT mal-formed tags in > HTML (i.e. weblint) but is there a tool that will do the conversion? > Thanks! Please reply directly to me. Mal-formed HTML (i.e. non-conforming SGML) is outside the scope of this list :-). However, converting legal HTML (i.e. conforming SGML) to XML is a valid activity and it could be useful to get feedback. It normally requires a DTD (for example means that tags are frequently omitted. There is also the question of what to do with EMPTY tags such as
. Does it matter if they are rendered as
or, say
? I convinced myself that it did, in that the first has no child, while the second could have a PCDATA child of value "\n" - at least in WF documents? What is its value in
? Could someone more authoritative give an overview of the XML-isation of HTML? I need HT(X)ML to provide the text sections for CML... P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat Jun 14 12:50:37 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:58 2004 Subject: hi! Message-ID: <8005@ursus.demon.co.uk> Welcome, In message ross@mpce.mq.edu.au (Ross Moore) writes: [...] > > Currently I'm putting the finishing touches on the latest version of LaTeX2HTML. This is a noble effort. > > Later this year I hope to tackle LaTeXML for which I would like to be > able to use > existing DTDs as much as possible --- especially for portions of MathML --- > rather than having to write my own. I have always admired the (La)TeX virtual community of volunteers and presumably they will be keen to learn about XML and how it applies to LaTeX. In which case this represents a significant pool of potential XML-friendly hackers :-) I'm thinking as I write, but it seems as if there should be 'a' LaTeX DTD (possibly modular), which interoperates with the MathML DTD. I think it's important to keep them distinct because there are many people who don't use LaTeX for maths, but as a general authoring tool. Since MathML specifically mentions TeX as a NOTATION, and as isomorphic to mathML in some parts, the clear separation of all components (LaTeXML, MathML, TeX) is critical. > > Having a reliable HTML --> XML ought to be an option too. > > Indeed this would probably be the easiest way to go for a first working version, > given the effort that has already gone into LaTeX2HTML . I'd agree. LaTeX is an excellent tool, but it doesn't have the full structuring power of XML unless it's specifically thought of at the start. I speak from experience as I wrote a complex book in LaTeX, with outputs as *.dvi, *.html, and several implied conditional sections. That was before I discovered the point of SGML - I spent many midnights writing programs to restructure the book :-( > > Ultimately a scheme will be needed whereby (partial) DTDs can be > constructed automatically > from any \newenvironment commands that the user devises for the LaTeX > typeset version. Yes - I think that a current LaTeX user can probably devise structuring like this that makes the transformation much easier. Among the things that are difficult to convert are paragraph/line breaks (when not explicitly marked up) > > > I'd love to hear from anyone else interested in: > > 1. converting existing LaTeX documents into XML ; I'd agree that LaTeX->HTML/XML is a useful start. One discussion would be whether one had to have a DTD that supported all constructs in the LaTeX manual, or whether there was a more generic DIV-like container. Another would be how to support user-defined macros. Also, would you work on the authored document, or some later normalised/expanded version (I've lost touch with Latex2html, but I assume that it works on some normalised version which has lost the author's macros). For scientific technical documents this is a highly desirable goal :-) > > 2. using LaTeX syntax as a front-end to XML for documents on the Web . Do you mean transforming XML documents into LaTeX (I tend to think of this as a back-end) or as a way of authoring XML documents using LaTeX? The latter is rather similar to (1). The second will require a transformation engine which most people would approach through DSSSL styleheets, I imagine. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tikvas at agentsoft.com Sun Jun 15 07:55:17 1997 From: tikvas at agentsoft.com (Tikva Schmidt) Date: Mon Jun 7 16:57:58 2004 Subject: White spaces in Dtd. Message-ID: <33A3808F.2841@agentsoft.com> I'm a developer at AgentSoft Ltd. We create tools for Web Automation and are now trying to make our inteligant agents work with XML. In trying to create a Dtd parser I came across several unclear things for example the usage of white spaces in the dtd.It seems like the grammer has set rules for where space are allowed or needed,and when they can be replaced by an entity refering to space.I also thought the dtd was supposed to be easy to parse.I came across something which is either a mistake or an unclear design.For example the rule for elementdecl is " '' " this looks like an entity reference for the %S following the name would have to be directly after the name with no space between them. This makes parsing more dificult for machines and the human eye.Perhaps there is a mistake in defining the meaning of %a,or perhaps the % shouldn't appear before the S... What should I expect for $S ??? Tikva Schmidt. -------------------------------------------------------------------- Tikva Schmidt. email: tikvas@agentsoft.co.il corp: Agentsoft Ltd. http://www.agentsoft.co.il Phone: 972-2-6480573 --------------------------------------------------------------------- xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sun Jun 15 13:56:53 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:58 2004 Subject: White spaces in Dtd. Message-ID: <8028@ursus.demon.co.uk> In message <33A3808F.2841@agentsoft.com> "Tikva Schmidt" writes: > I'm a developer at AgentSoft Ltd. We create tools for Web > Automation and are now trying to make our inteligant agents work > with XML. > In trying to create a Dtd parser I came across several unclear things > for example the usage of white spaces in the dtd.It seems like the > grammer has set rules for where space are allowed or needed,and when > they can be replaced by an entity refering to space.I also thought the > dtd was supposed to be easy to parse.I came across something which .........................^^^^^^^^^^^^^ I sympathise with this :-). Parameter Entities (PEs) initially gave parser writers and the ERB/WG a lot of problems. The current rules (I refer to 970331) are simpler than initially, but I must admit that I don't find that particular part of the spec easy to understand. I think it's fair to say that **apart from PEs** the DTD is easy to parse. It may also be that the current rules for PE substitution can be described in a simple way and I just haven't picked this up. I am sure that you will get answers from people more knowledgeable than me, but also remember that a revision of the spec is due on July 1. What is in it is determined by the ERB, but I would be surprised if there were not clarifications relating to PEs :-). P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From antheae at wrox.com Mon Jun 16 17:46:59 1997 From: antheae at wrox.com (Anthea Elston) Date: Mon Jun 7 16:57:58 2004 Subject: Developing a book on XML Message-ID: Hi I'm a development editor with Wrox Press, based in Birmingham. I head up a team which tries to produce books on the latest developments in programming, particularly web based programming. XML looks like being the next big thing, so if anyone out there is interested in writing a Programmer's Reference book on XML, please contact me for further details. Anthea Anthea Elston Wrox Press Ltd, 30 Lincoln Road, Olton, Birmingham UK Tel: 0121 706 6826 http://www.wrox.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Fri Jun 13 21:06:48 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:58 2004 Subject: Repeating attribute specifications In-Reply-To: <7923@ursus.demon.co.uk> Message-ID: In message <7923@ursus.demon.co.uk>, Peter Murray-Rust writes > >I assumed that the multiple attributes was so that if (say) > > > >occurs in the external DTD and > > > >occurs in the internal subset >then this is now legal whereas it wasn't before. But what is now the default >value of BAR? I assumed it was the later declaration ("XYZZY"). Please >disabuse me if this is wrong. Yes, you're right, but the reason why should be made clear. (It is _not_ because it's "the later declaration"!): The attribute-list declaration for element type FOO: is read _first_ because it is in the internal DTD subset, which is processed before the external DTD subset. So the attribute definition for the element type BAR takes precedence over that given in the other attribute-list declaration for FOO: because "the first declaration is binding and later declarations are ignored". Note that it is the _whole_ attribute definition: BAR CDATA "XYZZY" which is used, not just the default value as you suggest. (The two attribute declarations might have specified different attribute types.) Sorry to have caused confusion in the first place! Richard Light SGML and Museum Information Consultancy richard@light.demon.co.uk 3 Midfields Walk Burgess Hill West Sussex RH15 8JA U.K. tel. (44) 1444 232067 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at Eng.Sun.COM Tue Jun 17 08:19:35 1997 From: Jon.Bosak at Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:57:58 2004 Subject: Reminder: XML online Message-ID: <199706170617.XAA00821@boethius.eng.sun.com> I announced this about three months ago and am just repeating it for the benefit of anyone who recently subscribed to the list. If you are looking for generated XML to test your XML application, you can find it at our Sun documentation server, docs.sun.com. This site (not yet widely publicized) exists primarily to serve out HTML generated on the fly from our SolBook (DocBook) database of Solaris manuals. To see it operating in normal mode, just point your Web browser at http://docs.sun.com. In addition to its normal HTML output, our AnswerBook2 team has rigged docs.sun.com to generate an unsophisticated but copious alternative XML data stream if you know how to ask for it. HOW TO GET XML The AnswerBook2 (ab2) manuals on docs.sun.com are organized into several large categories (alluser, sysadmin, etc.) with a number of books in each catagory. Thus, the Solaris Advanced User's Guide is referred to in URLs as /ab2/alluser/ADVOSUG. Two forms of XML access are currently supported: TOCs and document chunks. TOCs are accessed via the @xmlToc template, and chunks are accessed via the @xmlChunk template. The @xmlToc template always shows a table of contents down to the chapter level, no matter what level it is invoked at. Some examples: 1. To get a chapter-level TOC of the entire contents of the server: http://docs.sun.com/ab2/@xmlToc 2. To get a chapter-level TOC of the manuals in the alluser category: http://docs.sun.com/ab2/alluser/@xmlToc 3. To get a chapter-level TOC of the Solaris Advanced User's Guide: http://docs.sun.com/ab2/alluser/ADVOSUG/@xmlToc 4. To get a particular chapter from the manual (as listed in the TOC): http://docs.sun.com/ab2/alluser/ADVOSUG/@xmlChunk/1120 Jon ---------------------------------------------------------------------- Jon Bosak, Online Information Technology Architect, Sun Microsystems ---------------------------------------------------------------------- 2550 Garcia Ave., MPK17-101, Mountain View, California 94043 Davenport Group::SGML Open::NCITS V1::ISO/IEC JTC1/SC18/WG8::W3C XML If a man look sharply and attentively, he shall see Fortune; for though she be blind, yet she is not invisible. -- Francis Bacon ---------------------------------------------------------------------- xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tikvas at agentsoft.com Tue Jun 17 08:50:07 1997 From: tikvas at agentsoft.com (Tikva Schmidt) Date: Mon Jun 7 16:57:58 2004 Subject: White spaces in Dtd. References: <199706161530.KAA91668@tigger.cc.uic.edu> Message-ID: <33A63054.3AB@agentsoft.com> C M Sperberg-McQueen wrote: > > On Sun, 15 Jun 1997 08:41:35, "Tikva Schmidt" > wrote: > > > In trying to create a Dtd parser I came across several unclear things > >for example the usage of white spaces in the dtd.It seems like the > >grammer has set rules for where space are allowed or needed,and when > >they can be replaced by an entity refering to space.I also thought the > >dtd was supposed to be easy to parse.I came across something which > >is either a mistake or an unclear design.For example the rule for > >elementdecl is " '' " > >this looks like an entity reference for the %S following the name would > >have to be directly after the name with no space between them. This > >makes parsing more dificult for machines and the human eye.Perhaps there > >is a mistake in defining the meaning of %a,or perhaps the % shouldn't > >appear before the S... > > > > What should I expect for $S ??? > > Thanks for the observation. If the S following Name is replaced by > a parameter entity reference, it should *not* be required to come > immediately after the Name. Declarations of the form > > > > should be legal. The intention, in allowing this particular S to be > parameterized, is to make it possible to parameterize the tag > omissibility indications needed in most production Full-SGML DTDs > while being able to use the same DTD source also for XML. It was > added very late, and in my haste I made a mistake. (Tim is wholly > blameless in this.) > > We should be able to fix this error in the next release of the spec. > > -C. M. Sperberg-McQueen Thanks . This mean the following examples are legal as well. Is that going to change? Tikva. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Tue Jun 17 13:21:52 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:57:58 2004 Subject: XML, HTML and LaTeX2HTML Message-ID: <8093@ursus.demon.co.uk> Just to add my own interest in this (having written a book in LaTeX and used LaTeX2HTML. That was 3yrB4XML, so now I would use XML :-)). LaTeX is an established and powerful authoring tool, especially for scientific mathematical and technical documents. TeX has been unrivalled as a typesetting language in these disciplines. LaTeX2HTML (which Ross is looking after) is a useful tool for publishing HTML. TeX provides a widely suported output format (*.dvi) for many systems. So my interests are: what is the a role for LaTeX2XML (sic)? what is the role for XML2TeX? and has anyone been developing tools in this area? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From norm at berkshire.net Tue Jun 17 15:58:53 1997 From: norm at berkshire.net (Norman Walsh) Date: Mon Jun 7 16:57:58 2004 Subject: XML, HTML and LaTeX2HTML In-Reply-To: Peter@ursus.demon.co.uk's message of Tue, 17 Jun 1997 10:42:00 GMT References: <8093@ursus.demon.co.uk> Message-ID: <6906-Tue17Jun1997095643-0400-norm@berkshire.net> > what is the a role for LaTeX2XML (sic)? Conversion of legacy to XML? ;-) > what is the role for XML2TeX? > > and has anyone been developing tools in this area? JadeTeX provides a TeX backend for XML documents. I wrote a suite of tools for doing SGML publishing that would do SGML (and trivially XML) to LaTeX, but I've abandoned them in favor of Jade. Faster, more portable, and easier to explain ;-) Cheers, norm -- Norman Walsh | Whatever you do may seem Senior Application Analyst | insignificant, but it is most ArborText, Inc. (www.arbortext.com) | important that you do it -- Ghandi 413.549.3868 Voice/FAX | xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From davidsch at microsoft.com Tue Jun 17 18:33:55 1997 From: davidsch at microsoft.com (David Schach) Date: Mon Jun 7 16:57:58 2004 Subject: XML online and Entities Message-ID: <011290D45A8ACF119B8B00805FD471D6034D890E@RED-24-MSG.dns.microsoft.com> I noticed that XMLToc contains unescaped &'s inside of PCData. This is legal SGML but prohibited in XML per section 2.4. I think this example shows the need to revisit the entity expansion rules in XML. > -----Original Message----- > From: Jon.Bosak@Eng.Sun.COM [SMTP:Jon.Bosak@Eng.Sun.COM] > Sent: Monday, June 16, 1997 11:18 PM > To: xml-dev@ic.ac.uk > Subject: Reminder: XML online > > I announced this about three months ago and am just repeating it for > the benefit of anyone who recently subscribed to the list. > > If you are looking for generated XML to test your XML application, you > can find it at our Sun documentation server, docs.sun.com. This site > (not yet widely publicized) exists primarily to serve out HTML > generated on the fly from our SolBook (DocBook) database of Solaris > manuals. To see it operating in normal mode, just point your Web > browser at http://docs.sun.com. > > In addition to its normal HTML output, our AnswerBook2 team has rigged > docs.sun.com to generate an unsophisticated but copious alternative > XML data stream if you know how to ask for it. > > HOW TO GET XML > > The AnswerBook2 (ab2) manuals on docs.sun.com are organized into > several large categories (alluser, sysadmin, etc.) with a number of > books in each catagory. Thus, the Solaris Advanced User's Guide is > referred to in URLs as /ab2/alluser/ADVOSUG. Two forms of XML access > are currently supported: TOCs and document chunks. TOCs are accessed > via the @xmlToc template, and chunks are accessed via the @xmlChunk > template. The @xmlToc template always shows a table of contents down > to the chapter level, no matter what level it is invoked at. > > Some examples: > > 1. To get a chapter-level TOC of the entire contents of the server: > > http://docs.sun.com/ab2/@xmlToc > > 2. To get a chapter-level TOC of the manuals in the alluser category: > > http://docs.sun.com/ab2/alluser/@xmlToc > > 3. To get a chapter-level TOC of the Solaris Advanced User's Guide: > > http://docs.sun.com/ab2/alluser/ADVOSUG/@xmlToc > > 4. To get a particular chapter from the manual (as listed in the TOC): > > http://docs.sun.com/ab2/alluser/ADVOSUG/@xmlChunk/1120 > > > Jon > > ---------------------------------------------------------------------- > Jon Bosak, Online Information Technology Architect, Sun Microsystems > ---------------------------------------------------------------------- > 2550 Garcia Ave., MPK17-101, Mountain View, California 94043 > Davenport Group::SGML Open::NCITS V1::ISO/IEC JTC1/SC18/WG8::W3C XML > If a man look sharply and attentively, he shall see Fortune; for > though she be blind, yet she is not invisible. -- Francis Bacon > ---------------------------------------------------------------------- > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Tue Jun 17 18:52:29 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:57:58 2004 Subject: HTML2_X.DTD Message-ID: Hi, I'm probably not the only person to have done this, but I had a go at XML-izing the HTML 2.0 DTD. Most of the job was straightforward (although a recent exchange suggests that I would have been better advised to leave the tag omission rules in as parameter entities!). However, two issues that remain are the use of '&' in the content model for , and the liberal use of inclusion and exclusion exceptions. Both are invalid in XML, and neither can be trivially re-mapped to an XML-compliant equivalent. Is anyone else interested in this sort of issue? Any thoughts on how these problems should be addressed? I don't want to waste bandwidth by copying the whole DTD, but if anyone wants it, I'll happily forward a copy offline. Here are the relevant sections: 1) This is the relevant fragment for the first issue (the '&' content models have not been changed): ]]> 2) ... and this goes on to show a couple of the exceptions: These are the others (all of them, I think): ... ... ... Richard Light SGML and Museum Information Consultancy richard@light.demon.co.uk 3 Midfields Walk Burgess Hill West Sussex RH15 8JA U.K. tel. (44) 1444 232067 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lee at sq.com Tue Jun 17 19:30:25 1997 From: lee at sq.com (lee@sq.com) Date: Mon Jun 7 16:57:58 2004 Subject: HTML2_X.DTD Message-ID: <9706171730.AA22656@sqrex.sq.com> Richard Light wrote: > 1) This is the relevant fragment for the first issue (the '&' content > models have not been changed): > > > Well, I originally wrote the & content model for HTML 2.0. We had to have a model that reflected the idea that you could have * zero or one BASE * zero or one ISINDEX * exactly one TITLE * any number of META elements interspersed in any order. You could try META*, ( (ISINDEX, META* TITLE) | (TITLE, META*, ISINDEX?) ) META* but this is ambiguous in SGML and requires lookahead, because if you get a META after a TITLE, you don't know if there is goiong to be an ISINDEX following. The following might work: META*, ( (ISINDEX, META* TITLE, META*) | (TITLE, META*, (ISINDEX, META*)?) ) but this doesn't allow for BASE or NEXTID. I am not sure how to write a content model for HTML's HEAD in XML that allows for all the things you might want to put in it. The trouble is that & isn't a very good way to say "I want one of these anywhere in this soup", because it can connect any two expressions of arbitrary complexity. The best thing to do is probably and require the application to do the additional checking. Lee xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jenglish at crl.com Tue Jun 17 21:42:45 1997 From: jenglish at crl.com (Joe English) Date: Mon Jun 7 16:57:59 2004 Subject: HTML2_X.DTD In-Reply-To: References: Message-ID: <199706171927.AA04297@mail.crl.com> Richard Light wrote: > I'm probably not the only person to have done this, but I had a go at > XML-izing the HTML 2.0 DTD. [...] > > However, two issues that remain are the use of '&' in the content model > for , and the liberal use of inclusion and exclusion exceptions. > > Both are invalid in XML, and neither can be trivially re-mapped to an > XML-compliant equivalent. Is anyone else interested in this sort of > issue? Any thoughts on how these problems should be addressed? For the HEAD content model: (TITLE & ISINDEX? & BASE?) +(META|LINK) you can get rid of the inclusion exceptions by changing this to: ( (meta|link)*, ( (TITLE, (meta|link)*) & (ISINDEX, (meta|link)*)? & (BASE, (meta|link)*)? ) ) then use the standard transformation on AND groups to get: (A question of my own: Why does SP complain about e.g., "%base;?" but not "(%base;)?" I can't find the reason for this in the Standard.) Addition of NEXTID, SCRIPT, and STYLE is left as an excercise to the reader (GAAAH!). Or, more sensibly, you can follow Naggum's First Law of AND groups: If the order doesn't matter, you might as well pick one and stick with it: In this case the order does matter to some degree, since there are metadata schemes which require groups of METAs and LINKs to appear in a certain order, so this is probably better: This is stricter than HTML 2, but most HTML will need to be modified anyway to be XMLized. Inclusion and exclusion exceptions have to be treated on a case-by-case basis. The exclusion exceptions in HTML 2.0 are used primarily to limit recursion (e.g., to make sure that an "A" element can't appear inside another "A"), and in some cases to undo the effects of inclusion exceptions (e.g., on TITLE and SELECT to undo the inclusions on HEAD and FORM, respectively). For the FORM elements you should do what HTML 3.2 does: Instead of making (INPUT|SELECT|TEXTAREA) inclusions on the FORM element and then excluding them from SELECT and TEXTAREA, just add them to the '%text;' parameter entity so they can appear anywhere in content. (That they must appear inside a FORM element is still enforced, but as an application convention rather than by the DTD). Once the inclusions are taken care of, all the exclusions can be safely removed, since this yields a less restrictive DTD. --Joe English jenglish@crl.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lee at sq.com Tue Jun 17 21:57:00 1997 From: lee at sq.com (lee@sq.com) Date: Mon Jun 7 16:57:59 2004 Subject: HTML2_X.DTD Message-ID: <9706171956.AA26341@sqrex.sq.com> Joe English wrote: > Or, more sensibly, you can follow Naggum's First Law of AND groups: > If the order doesn't matter, you might as well pick one and stick > with it: > > > > In this case the order does matter to some degree, since there > are metadata schemes which require groups of METAs and LINKs > to appear in a certain order, so this is probably better: > > It turns out that some of the widely used HTML authoring tools (no, not HoTMetaL!) automatically add one or more adverts for their manufacturers by adding META elements, usually immediately after the title or right before it or, in at least one case (Microsoft's) on either side of the title. I stand by Lee xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From dlapeyre at mulberrytech.com Tue Jun 17 23:19:11 1997 From: dlapeyre at mulberrytech.com (Deborah Aleyne Lapeyre) Date: Mon Jun 7 16:57:59 2004 Subject: Call for Participation SGML/XML'97 Message-ID: Yet Another CALL FOR PARTICIPATION (YACP) I beg the indulgence of this list to post this call. Not all of you are on the regular SGML mailing lists and I wanted to make sure the word got out. If you've seen it, this announcement is just like all the others, delete with my apologies. Otherwise, please pass this announcement along to your staff, your friends, and any mailing lists you feel might appreciate the news. Then come to SGML/XML'97 in Washington in December and help make this year the biggest and most exciting SGML conference ever! --Debbie Lapeyre ----------------------------------------------------- ****** Call for Participation for SGML/XML'97 ******* ----------------------------------------------------- Soliciting presentations on SGML and XML theory, tools, techniques, and experience for the annual SGML technical conference. WHEN: December 8-11, 1997 WHERE: Sheraton Washington Hotel, Washington D.C. USA (near the zoo and on Metro's Red line) SPONSOR: Graphic Communications Association (GCA) WHAT: Request for proposals to speak, give a poster, present an evening session, or participate in the New Technology Nursery HOW: Submit proposals via HTML form at http://www.mulberrytech.com/sgml97 or in SGML according to the submission DTD and sent via email to: sgml97@mulberrytech.com Guidelines for Submission and the DTD for are available by email: sgml97@mulberrytech.com or at http://www.mulberrytech.com/sgml97 (If you do not have access to the Web, cannot create a proposal in SGML, or need to ftp the DTD, contact Tommie Usdin by phone at +1 301/231-6934, or by fax at +1 301/231-6935.) SCHEDULE: Proposals Due...............30 JUN, 1997 Speakers Notified...........30 AUG, 1997 Preliminary Program.........15 SEPT, 1997 Full papers due.............17 OCT, 1997 Poster abstracts due........21 NOV, 1997 QUESTIONS: Email to sgml97@mulberrytech.com or call Tommie Usdin +1 301/231-6930 MORE INFORMATION: For participation details and current information on the conference, see http://www.mulberrytech.com/sgml97 To receive an Advance Program and Registration Information when they are available, send email to sgml97@gca.org or call the Graphic Communications Association at +1 703/519-8160 or 1-888-SGMLGCA (1-888/746-5422). ------------ End SGML/XML'97 Announcement ----------- ===================================================================== SGML/XML'97 Conference Committee Chair: B. Tommie Usdin, Mulberry Technologies Co-Chairs: Deborah A. Lapeyre, Mulberry Technologies C. M. Sperberg-McQueen, University of Illinois at Chicago Email: sgml97@mulberrytech.com Phone: 301/231-6930 Fax: 301/231-6935 Registration & Vendor Information: Marion Elledge, GCA, 703/519-8160 ===================================================================== ====================================================================== Deborah A. Lapeyre Phone: 301/231-6933 Mulberry Technologies, Inc. Fax: 301/231-6935 6010 Executive Blvd., Suite 608 E-mail: dalapeyre@mulberrytech.com Rockville, MD 20852 WWW: http://www.mulberrytech.com ====================================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at Eng.Sun.COM Wed Jun 18 06:26:46 1997 From: Jon.Bosak at Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:57:59 2004 Subject: XML online and Entities In-Reply-To: <011290D45A8ACF119B8B00805FD471D6034D890E@RED-24-MSG.dns.microsoft.com> (message from David Schach on Tue, 17 Jun 1997 09:27:10 -0700) Message-ID: <199706180425.VAA08044@boethius.eng.sun.com> [David Schach:] | I noticed that XMLToc contains unescaped &'s inside of PCData. This | is legal SGML but prohibited in XML per section 2.4. Yup. This is (as we say in the software business) a known bug in the process used to compile the DynaText binaries from the DocBook SGML source. It will get fixed the next time the books are rebuilt, which unfortunately may be a little while. | I think this example shows the need to revisit the entity expansion | rules in XML. Maybe. Or maybe it just means that we have to live with a slightly buggy test bed for the moment. The team in charge of this is in the toils of Solaris 2.6 finalization and probably won't be able to deal with this for the next month or so. Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From michael at textscience.com Wed Jun 18 16:05:12 1997 From: michael at textscience.com (Michael Leventhal) Date: Mon Jun 7 16:57:59 2004 Subject: Comercial XML editor recommendations In-Reply-To: <7353@ursus.demon.co.uk> Message-ID: <3.0.1.32.19970618220714.007bd7f0@aimnet.com> Grif has announced that it will have XML-related extensions in the next release of our HTML editor Symposia Pro and Symposia Doc+ at the end of this month. While Symposia is a commercial-grade product the XML extensions are primarily designed to give our customers the opportunity to begin experimenting with XML. Our hope is that this will help to give a larger audience a concrete idea of what XML is all about. We have chosen to introduce XML in an HTML product in the same spirit, we think, with which Yuri Rubinsky introduced SGML to HTML users in his book "SGML on the Web". And in the spirit of the undertaking we cordially invite your comments on the proposed XML extensions to our product. 1. Read, either off the Web or locally, edit, and create well-formed XML documents. The DTD, if present, is not read. The document should use the ASCII character set and HTML character entities. UTF-8 encodings above 127 will be preserved but may not display correctly. 2. Save a document either as XML or HTML format with respect to the syntax of empty tags and other syntactical differences. Both types of saved documents will be ASCII with HTML character entities except that UTF-8 encodings that were present in the text as it was read in will be unchanged. Note that nothing prevents the user from mixing HTML and XML, as is currently done in many applications already on the Web. But the user must decide which one it is when the document is saved. 3. Create new element and attribute definitions. These are "definitions" in the simple syntax implied by the concept of "well-formedness", not DTD fragments. These definitions may be saved in project folders and used in documents at will. 4. Add new XML elements and attributes, either from a set stored in a project folder or ad-hoc. 5. Create CSS stylesheets, and CSS definitions for any element, HTML or XML. Symposia uses CSS as its own stylesheet language and will display XML CSS specifications correctly. In effect, it is an XML browser for the Web, albeit without certain functionality such as JavaScript interpretation. We have decided not to offer XML-LINK in this version even though we have completed an early implementation. Michael Leventhal ______________________________________________________________________ Michael Leventhal Internet : http://www.grif.fr G R I F , S. A. Email : Michael.Leventhal@grif.fr VP, Technology Telephone : 510-444-2962 1800 Lake Shore Ave Ste 14 Fax : 510-444-1672 Oakland, California 94606 France : (011) 33 1 30121430 (fr US) ______________________________________________________________________ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From davidsch at microsoft.com Wed Jun 18 19:16:36 1997 From: davidsch at microsoft.com (David Schach) Date: Mon Jun 7 16:57:59 2004 Subject: XML online and Entities Message-ID: <011290D45A8ACF119B8B00805FD471D603500B30@RED-24-MSG.dns.microsoft.com> This kind of bug will be common because SGML allows the & to be used this way. I think this difference in entity processing unnecessarily complicates SGML to XML conversion. > -----Original Message----- > From: Jon.Bosak@Eng.Sun.COM [SMTP:Jon.Bosak@Eng.Sun.COM] > Sent: Tuesday, June 17, 1997 9:25 PM > To: xml-dev@ic.ac.uk > Subject: Re: XML online and Entities > > [David Schach:] > > | I noticed that XMLToc contains unescaped &'s inside of PCData. This > | is legal SGML but prohibited in XML per section 2.4. > > Yup. This is (as we say in the software business) a known bug in the > process used to compile the DynaText binaries from the DocBook SGML > source. It will get fixed the next time the books are rebuilt, which > unfortunately may be a little while. > > | I think this example shows the need to revisit the entity expansion > | rules in XML. > > Maybe. Or maybe it just means that we have to live with a slightly > buggy test bed for the moment. The team in charge of this is in the > toils of Solaris 2.6 finalization and probably won't be able to deal > with this for the next month or so. > > Jon > > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Wed Jun 18 19:26:58 1997 From: nmikula at edu.uni-klu.ac.at (Norbert Mikula) Date: Mon Jun 7 16:57:59 2004 Subject: XML online and Entities References: <011290D45A8ACF119B8B00805FD471D603500B30@RED-24-MSG.dns.microsoft.com> Message-ID: <33A81A4E.59E2@edu.uni-klu.ac.at> David Schach wrote: > This kind of bug will be common because SGML allows the & to be used > this way. I think this difference in entity processing unnecessarily > complicates SGML to XML conversion. Rather this way, than having to deal with context dependencies in a language :) The ERB did a great job when thinking about how to define the language to ease the construction of lightweight and fast parsers (in a contemporary fashion). -- Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Wed Jun 18 20:38:24 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:57:59 2004 Subject: XML online and Entities Message-ID: <3.0.32.19970618113611.00a4de30@pop.intergate.bc.ca> At 10:09 AM 18/06/97 -0700, David Schach wrote: >This kind of bug will be common because SGML allows the & to be used >this way. I think this difference in entity processing unnecessarily >complicates SGML to XML conversion. Yes, this kind of bug will be common. The chance of changing this in XMl-lang is very small. One of the bogosities that make SGML parsers hard to write is that entity references are nontrivial to recognize. When I'm teaching XMl, it's *so nice* to be able to say: all markup without exception starts with '<' or '&', and anything that starts with '<' or '&', without exception, is markup. End of story. Users like it, programmers like it. So, what's the solution? If all else fails, a postprocessor that takes
and changes the ;'s to &'s on the way off to the server? -Tim xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtigue at datachannel.com Thu Jun 19 03:28:18 1997 From: jtigue at datachannel.com (John Tigue) Date: Mon Jun 7 16:57:59 2004 Subject: XML Java API Standardization Message-ID: <33A88B52.9FCC78DC@datachannel.com> Now that the number of XML processor implementations is increasing rapidly, I would like to continue the subject of API standardization. I have written a document which discusses the issue and presents an informal proposal which continues the discussion of API standardization for Java. The document is located at: http://www.datachannel.com/ChannelWorld/XML/dev The first goal is to find a lowest common denominator for the current implementations and abstract that to a set of interfaces such that a developer could use this new API independent of an underlying implementation of the XML processor and/or invest in learning the particular benefits a specific implementation provides. I hope the site will serve as a convenience to the community and I will maintain it as a summary of what is going on in this list. Any feedback would be greatly appreciated. This is a work in progress. The greater the contributions, the better it will serve its purpose. -- John Tigue Programmer jtigue@datachannel.com DataChannel (http://www.datachannel.com) 206-462-1999 -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 316 bytes Desc: Card for John Tigue Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970619/b229574b/vcard.vcf From tbray at textuality.com Thu Jun 19 06:54:44 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization Message-ID: <3.0.32.19970618215226.00a4dde0@pop.intergate.bc.ca> At 06:28 PM 18/06/97 -0700, John Tigue wrote: I would like to say that modulo a few quibbles, it seems that John's proposal is sensible. It would send a REALLY STRONG message to the world if all the XML parsers just happened to interoperate effortlessly. So I hereby commit to changing Lark's interface to be compatible with this, after it's been kicked around here for a while and any technical gotchas have been aired out. I'd love to see similar commitments from the other parser builders. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Thu Jun 19 09:02:46 1997 From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization References: <33A88B52.9FCC78DC@datachannel.com> Message-ID: <33A95563.71CA@edu.uni-klu.ac.at> John Tigue wrote: > > Now that the number of XML processor implementations is increasing > rapidly, I would like to continue the subject of API standardization. I > have written a document which discusses the issue and presents an > informal proposal which continues the discussion of API standardization > for Java. > > The document is located at: > http://www.datachannel.com/ChannelWorld/XML/dev Following John's original posting and Tim's "call for commitments (CFC)", I first want to say that I applaud John for having started this initiative. I certainly will contribute to this effort as much as I can. I also will be happy to modify NXP so that it follows a *standardized* and well-designed API. I hereby also invite all users of NXP to express on this list what experience they have made with my approach to this issue. You application developers are the real experts on this. Please share with us your thoughts ! -- Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lex at www.copsol.com Thu Jun 19 16:00:24 1997 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization In-Reply-To: <33A88B52.9FCC78DC@datachannel.com> from "John Tigue" at Jun 18, 97 06:28:50 pm Message-ID: <199706191357.IAA14480@copsol.com> > > Now that the number of XML processor implementations is increasing > rapidly, I would like to continue the subject of API standardization. I > have written a document which discusses the issue and presents an > informal proposal which continues the discussion of API standardization > for Java. > > The document is located at: > http://www.datachannel.com/ChannelWorld/XML/dev > > The first goal is to find a lowest common denominator for the current > implementations and abstract that to a set of interfaces such that a > developer could use this new API independent of an underlying > implementation of the XML processor and/or invest in learning the > particular benefits a specific implementation provides. > > I hope the site will serve as a convenience to the community and I will > maintain it as a summary of what is going on in this list. Any feedback > would be greatly appreciated. This is a work in progress. The greater > the contributions, the better it will serve its purpose. After having read the above document, I like to say: "You missed one!" The DSSSL Developer's Toolkit covers some of what the above document is trying to address (actually more since it is standardizing DSSSL). I developed this toolkit to be standardized and serve as a standard DSSSL API. The dsssl.grove package is intended to provide standardized programatic access to groves--the result of processing an SGML document. IMHO, it would be ideal if XML processors could produce a grove that a DSSSL processor could use. What is not contained in the current DSSSLTK distribution but will be in the next is a standardize parser interface. That is, access to some implementation that can be told to parse some system identifier and produce a grove. Also, note that in DSSSLTK there is a construct called a "Grove Constructor". This interface provides a means for groves to be build on different implementation technologies and used by the same parser without changing the interface. It is different than the "event handler" model but it shares some similarities. Essentially, the parser is abstracted from grove construction. Hence, you can build groves in databases as well as in-memory or whatever technology you choose without changing the parser. Also, all constructs in the DSSSLTK are based on interfaces. This allows different inheritance hierarchies to be used within the same distribution or for different class libraries to be mixed without getting into multiple inheritance issues. A node in a grove must implement two interfaces: node and its specific class. For example, an Element node *must* implement the dsssl.grove.node and dsssl.grove.Element interface. Remember, the DSSSL standard *has* a data model for SGML that can be pruned to provide a "lowest common denominator" data model for XML. Full source code and javadoc are available in the DSSSLTK distribution located at: http://www.copsol.com/products/ This is start at standardization for DSSSL from my point of view. I put this distribution together to allow others to contribute and create a standard API governed by some "higher body" and not Copernican Solutions or myself. ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at Eng.Sun.COM Thu Jun 19 16:29:59 1997 From: Jon.Bosak at Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:00 2004 Subject: Tcl XML parser Message-ID: <199706191427.HAA00683@boethius.eng.sun.com> A Tcl-based package for parsing XML documents and DTDs has just been made available. See http://tcltk.anu.edu.au/XML/ Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Thu Jun 19 16:57:46 1997 From: nmikula at edu.uni-klu.ac.at (Norbert Mikula) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization References: <199706191357.IAA14480@copsol.com> Message-ID: <33A948D6.41C6@edu.uni-klu.ac.at> Alex Milowski wrote: > The dsssl.grove package is intended to provide standardized programatic access > to groves--the result of processing an SGML document. IMHO, it would be ideal > if XML processors could produce a grove that a DSSSL processor could use. Alex, I certainly agree, that a (complete) grove is probably the most powerful and complete way of accessing a documents data. I am not convinced however, that it is always necessary to built a grove. My view on this is : ----------------------------------------------- - application - ---------------------------------- - - grove/tree builder - - ----------------------------------------------- - event stream (Esis++) - ---------------------------------- NXP - - core parser - ----------------------------------------------- You can always built a more powerful layer on top of an event stream. Furthermore we should also consider the work of the DOM group. Their results will have a considerable impact on our work as well. If we can provide a flexible low level layer, we can always add more fancy and specialized post-processors on top of it. -- Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lex at www.copsol.com Thu Jun 19 17:57:04 1997 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization In-Reply-To: <33A948D6.41C6@edu.uni-klu.ac.at> from "Norbert Mikula" at Jun 19, 97 04:57:26 pm Message-ID: <199706191554.KAA14554@copsol.com> > Alex Milowski wrote: > > The dsssl.grove package is intended to provide standardized programatic access > > to groves--the result of processing an SGML document. IMHO, it would be ideal > > if XML processors could produce a grove that a DSSSL processor could use. > > Alex, > > I certainly agree, that a (complete) grove is probably > the most powerful and complete way of accessing > a documents data. > > I am not convinced however, that it is always necessary > to built a grove. In my experience, the need to have a grove is often more the case then the need to have an event stream. I rarely have built SGML applications where a grove did not simplify the processing. Event streams are good for extracting simple information or processing documents in a linear. I like to view a document as a data structure that I can manipulate. I would be open to having a two-tiered API where there was an event oriented API. In fact, the GroveConstructor interface could be considered to be this kind of low level API. I we don't standardize grove access, we will all have to build our own grove implementations at some point in time. In addition, can you imagine the possibilities if simple applets could turn around to a server, load a grove, and receive structured information rather than name value pairs? We need to address issues beyond "quick browsing/processing" in a standardized API. So, essentially, I agree. It is not always necessary to have a grove. It a complex application, it is most certainly necessary. Hence, we should standardize that as well. > > My view on this is : > > ----------------------------------------------- > - application - > ---------------------------------- - > - grove/tree builder - - > ----------------------------------------------- > - event stream (Esis++) - > ---------------------------------- NXP - > - core parser - > ----------------------------------------------- I can envision a similar but more complete structure: DSSSL Application ---------------- - DSSSL API - Complex Application ------------------------------------------------ - Standard Grove API - ------------------------------------------------ - Grove Implementation - - (Implementation dependent) - ------------------------------------------------ - Grove Builder API - Simple Application --------------------------------------------------------------------- - Event Stream API - --------------------------------------------------------------------- - Parser API - --------------------------------------------------------------------- - Parser Implementation - - (Implementation dependent) - --------------------------------------------------------------------- The DSSSLTK covers most of the above with the except of an event oriented API. The GroveConstructor interface is really the Grove Builder API in the above diagram. Hence, what should be standardized is: * Parser API * Event Stream API * Grove Builder API * Standard Grove API * DSSSL API > You can always built a more powerful layer > on top of an event stream. Furthermore we > should also consider the work of the DOM > group. Their results will have a considerable > impact on our work as well. Yes, but only if we properly componentize the APIs. I'm not so certain about DOM. It would be nice if it was a little more open of a process. DOM essentially means grove to me. > If we can provide a flexible low level > layer, we can always add more fancy and > specialized post-processors on top > of it. Yes, but I want to standardize the "specialized" processors. For example, consider the situation where as an application developer you could be assured that a grove implementation is available in a browser framework. In that situation, you could deliver an applet (or whatever) that loaded a grove but relied on the browser to provide the infrastructure for "knowing" how to load/store/etc. a grove. We are developing and standardizing infrastructure as well as APIs. ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ebaatz at barbaresco.East.Sun.COM Thu Jun 19 17:57:47 1997 From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization Message-ID: Alex Milowski wrote: > ...it would be ideal if XML processors could produce a grove... Alex, Do you mean that the only output of an XML parser would be a grove? My use of XML is very lightweight and, from my position of minimal knowledge about groves, seems like I would have to pay some price in processing time or system resources for an XML parser to produce a grove for one of my "documents" when some very simple output would do. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From peter at techno.com Thu Jun 19 18:24:06 1997 From: peter at techno.com (Peter Newcomb) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization Message-ID: <199706191620.MAA14915@exocomp.techno.com> > Date: Thu, 19 Jun 1997 16:57:26 +0200 > From: Norbert Mikula > > Alex Milowski wrote: > > The dsssl.grove package is intended to provide standardized programatic access > > to groves--the result of processing an SGML document. IMHO, it would be ideal > > if XML processors could produce a grove that a DSSSL processor could use. > > I certainly agree, that a (complete) grove is probably > the most powerful and complete way of accessing > a documents data. > > I am not convinced however, that it is always necessary > to built a grove. [snip] > You can always built a more powerful layer > on top of an event stream. Furthermore we > should also consider the work of the DOM > group. Their results will have a considerable > impact on our work as well. > > If we can provide a flexible low level > layer, we can always add more fancy and > specialized post-processors on top > of it. I believe it it is important not only to design the low-level interface such that a grove (or other-high level interface) can be implemented on top of it, but also to design the low-level interface such that _it_ (at least the relevant portions of it: i.e. the event stream and associated classes) can be implemented on top of a grove interface. Another concern I have is that the terminology used for the two interfaces (low and high) be consistent. A programmer who learns one interface should not have to learn a different vocabulary in order to use the other. This is also true across languages: a person using an XML parser in Java should not have to learn a different vocabulary in order to use an XML parser from C++ or Perl. As the SGML property set has already been published (in DSSSL, and soon in the HyTime 2nd Edition) and is in use, I suggest that it be used as a terminology reference for new SGML and XML interface design. -peter -- Peter Newcomb TechnoTeacher, Inc. 233 Spruce Avenue P.O. Box 23795 Rochester, NY 14611-4041 USA Rochester, New York 14692-3795 USA +1 716 529 4303 (home) +1 716 464 8696 (direct) +1 716 755 8698 (cell) +1 716 271 0796 (main) +1 716 529 4304 (fax) +1 716 271 0129 (fax) peter@petes-house.rochester.ny.us peter@techno.com http://www.petes-house.rochester.ny.us http://www.techno.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ebaatz at barbaresco.East.Sun.COM Thu Jun 19 18:57:52 1997 From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization Message-ID: John Tigue, Although it isn't part of the abstract XML parsing issue, I was struck by your proposal's use of streams for input. Given the Unicode-ness of Java and XML's explicit support for multiple character encodings, the JDK 1.1 Reader class seems like a perfect fit for XML parsers. They are more efficient than byte streams, and they handle conversion between local encodings and Unicode. With streams, it seems like your document also needs to proscribe how Unicode characters (or other multi-byte encodings) are encoded in byte streams or how to identify the encoding of a stream, so I can use the same input for different parsers. P.S. Yes, my current XML "documents" include characters outside of Latin-1, so I have to convert them before passing them through the parser I've been using, NXP. Eric Baatz Sun Microsystems Laboratories 2 Elizabeth Drive, MS UCHL03-207 (508) 442-0257 Chelmsford, MA 01824 fax: (508) 250-5067 USA Internet: eric.baatz@east.sun.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lex at www.copsol.com Thu Jun 19 19:21:55 1997 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization In-Reply-To: from "Eric Baatz - Sun Microsystems Labs BOS" at Jun 19, 97 11:56:09 am Message-ID: <199706191719.MAA14647@copsol.com> > Alex Milowski wrote: > > > ...it would be ideal if XML processors could produce a grove... > > Alex, > > Do you mean that the only output of an XML parser would be a grove? Well, no, not necessarily. I think that such an API standardization should also standardize grove production/use. I would like to be able to guarantee that any conformant XML/Java/API environment is able to produce groves if I *need* them. > My use of XML is very lightweight and, from my position of minimal > knowledge about groves, seems like I would have to pay some > price in processing time or system resources for an XML parser to > produce a grove for one of my "documents" when some very simple > output would do. Yes, you pay *some* price. There is a point in which the grove-based processing paradigm is far more efficient than event oriented for more complex tasks. The definition of "more complex" isn't that big of a leap. Simply put: If you want to do *any* non-linear processing of XML, you are going to find groves *far* easier and potentially, with SDQL (Standard Document Query Language -- from DSSSL), it may be more efficient than building ancillary data structures in addition to the events being received. In a previous e-mail, I detailed an API architecture that I think would work. Essentially, it is DSSSLTK with another couple of APIs on the bottom of the stack. In my development, I made the design decision that groves were what I needed to standardize since everything in DSSSL is groves. I'm certain willing to add to this and standardize everything before that as well. ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Thu Jun 19 19:38:56 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization Message-ID: <3.0.32.19970619103633.00886180@pop.intergate.bc.ca> If you want a full-featured API that is going to interoperate for SGML and XML docs as well, the grove is the only way to go, so there is no need to have this discussion here on that subject. What we're trying to do is, specifically for the case of Java XML processors, which evidence would suggest are going to be large in number and relatively lightweight, is simply to give them some shared machinery as regards elements and attributes. For this kind of purpose, I think the grove formalism is massive overkill; right now people can whip off XML parsers in a week, if we require them to master grove plans and property sets and so on, we're tripling the amount of time that has to be invested. At 12:20 PM 19/06/97 -0400, Peter Newcomb wrote: >As the SGML property set has already been published (in DSSSL, and >soon in the HyTime 2nd Edition) and is in use, I suggest that it be >used as a terminology reference for new SGML and XML interface >design. This is part of the problem; last time I looked, the SGML property set was over 75 pages in length, and most of what it contains is just not interesting for XML parsers. If we could just agree, specifically for Java, how to talk to a few basic things (Element, Attribute, etc), this would be a huge step forward. -Tim xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lex at www.copsol.com Thu Jun 19 20:02:55 1997 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization In-Reply-To: <3.0.32.19970619103633.00886180@pop.intergate.bc.ca> from "Tim Bray" at Jun 19, 97 10:36:42 am Message-ID: <199706191800.NAA14675@copsol.com> Ok, I'm going to write about the "vision" thing... so you have been warned! ;-) > If you want a full-featured API that is going to interoperate for > SGML and XML docs as well, the grove is the only way to go, so there > is no need to have this discussion here on that subject. What we're > trying to do is, specifically for the case of Java XML processors, which > evidence would suggest are going to be large in number and relatively > lightweight, is simply to give them some shared machinery as regards > elements and attributes. > > For this kind of purpose, I think the grove formalism is massive > overkill; right now people can whip off XML parsers in a week, if > we require them to master grove plans and property sets and so on, > we're tripling the amount of time that has to be invested. Agreed. My real point is that we have to have a vision for where such APIs are going. The absolute *last* thing I want to have happen is to get a low-level parser/event API and not be able to implement the more basic grove on top of that. Hence we need a vision of where such API are going and what they will grow into. I see a parser and event API as being the foundation of a much larger set of APIs for XML, SGML, and DSSSL. In light of this, here are some of my requirements: 1. The API should be componentized such that parser access and configuration is separated by event delivery and use. 2. Event APIs should be constructed in a way such that new properties of events and new events can be delivered within the same interface. This will allow support of additional grove plans within the same interface. 3. There is a minimal set of grove plans from a DSSSL perspective that we should conform to. (I have a good idea of what these grove plans are but I don't have the DSSSL spec in-front of me). These grove plans will help define what events to deliver and what properties the events should have. Suggestions: 1. Interfaces (sub-typing) is a preferred way to deliver such APIs. We do not want to enforce an inheritance hierarchy. Also, interfaces can easily be made cross-language. 2. We should define the APIs within a reference architecture(s) rather than just focusing on the communication between a parser and an arbitrary application. By using many common architectures we can understand the use-case scenarios for the API. This is a similar exercise the the CRC cards in object-oriented design. > At 12:20 PM 19/06/97 -0400, Peter Newcomb wrote: > >As the SGML property set has already been published (in DSSSL, and > >soon in the HyTime 2nd Edition) and is in use, I suggest that it be > >used as a terminology reference for new SGML and XML interface > >design. > > This is part of the problem; last time I looked, the SGML property > set was over 75 pages in length, and most of what it contains is > just not interesting for XML parsers. > > If we could just agree, specifically for Java, how to talk to a > few basic things (Element, Attribute, etc), this would be a huge > step forward. -Tim Yes, but we should start with the DSSSL specification. Not to mention this *yet* again today, but the DSSSLTK implements about five grove plans. I'll get the list tomorrow when I have the reference information on hand and post it here. If we spend the time working from the DSSSL grove specification, we can ensure grove production. If you want/need a more readable grove specification, try the grove guide that I built. The HTML version is at: http://www.copsol.com/sgmlimpl/standards/gguide.html and more generally at: http://www.copsol.com/sgmlimpl/standards/ The grove guide re-orients the SGML property set from DSSSL in the opposite way that it is specified. In the DSSSL standard, each grove plan is listed and within the grove plan either new classes are defined or properties are added to previously defined classes. In the grove guide, each class is defined and the properties are listed by grove plan. What we design and engineer today and label a standard may stick around longer than we expect. We shouldn't take too minimalist of an approach. My compromise is for a *reasonable* solution that has growth of the API designed into it. ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Thu Jun 19 23:52:00 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:00 2004 Subject: XML Java API Standardization Message-ID: <8169@ursus.demon.co.uk> In message <33A88B52.9FCC78DC@datachannel.com> jtigue@datachannel.com (John Tigue) writes: [...] > > Now that the number of XML processor implementations is increasing > rapidly, I would like to continue the subject of API standardization. I > have written a document which discusses the issue and presents an > informal proposal which continues the discussion of API standardization > for Java. > > The document is located at: > http://www.datachannel.com/ChannelWorld/XML/dev This is a really first-class approach to the subject and I welcome it. John has taken the time to summarise 4 parsers including his/datachannel's own and this is an excellent starting point. As an 'xmlProcessorConsumer' JUMBO will adopt this approach as soon as I work it out. > > The first goal is to find a lowest common denominator for the current > implementations and abstract that to a set of interfaces such that a > developer could use this new API independent of an underlying > implementation of the XML processor and/or invest in learning the > particular benefits a specific implementation provides. > > I hope the site will serve as a convenience to the community and I will > maintain it as a summary of what is going on in this list. Any feedback > would be greatly appreciated. This is a work in progress. The greater > the contributions, the better it will serve its purpose. This is really great. I'm in a rush, but at this stage standard terminology for the XML-related terms (Element, Attribute, etc.) and standard terminology for the Java-related stuff (Strea, Factory) etc. is exactly what is required. P. > > -- > John Tigue > Programmer > jtigue@datachannel.com > DataChannel (http://www.datachannel.com) > 206-462-1999 > > > --------------C967F0FA5C31930ED3CBF135 > Content-Type: text/x-vcard; charset=us-ascii; name="vcard.vcf" > Content-Transfer-Encoding: 7bit > Content-Description: Card for John Tigue > Content-Disposition: attachment; filename="vcard.vcf" > > begin: vcard > fn: John Tigue > n: Tigue;John > org: Datachannel > adr: 10020 Main St.;;#205;Bellevue;WA;98004;USA > email;internet: jtigue@datachannel.com > tel;work: 462-1999 > tel;home: 498-4708 > x-mozilla-cpt: ;0 > x-mozilla-html: FALSE > end: vcard > > > --------------C967F0FA5C31930ED3CBF135-- > > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > > -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From peter at techno.com Fri Jun 20 00:53:59 1997 From: peter at techno.com (Peter Newcomb) Date: Mon Jun 7 16:58:01 2004 Subject: XML Java API Standardization In-Reply-To: <3.0.32.19970619103633.00886180@pop.intergate.bc.ca> (message from Tim Bray on Thu, 19 Jun 1997 10:36:42 -0700) Message-ID: <199706192250.SAA15086@exocomp.techno.com> > For this kind of purpose, I think the grove formalism is massive > overkill; right now people can whip off XML parsers in a week, if > we require them to master grove plans and property sets and so on, > we're tripling the amount of time that has to be invested. I am not suggesting that writers of parsers learn or implement anything about groves. I am suggesting that the writers of the standard XML parser interface should learn and use the names defined by the SGML property set for those things (i.e. elements, attributes, etc.) that XML and SGML have in common. > At 12:20 PM 19/06/97 -0400, Peter Newcomb wrote: > >As the SGML property set has already been published (in DSSSL, and > >soon in the HyTime 2nd Edition) and is in use, I suggest that it be > >used as a terminology reference for new SGML and XML interface > >design. > > This is part of the problem; last time I looked, the SGML property > set was over 75 pages in length, and most of what it contains is > just not interesting for XML parsers. The SGML property set source (the 75+ pages of SGML) is best read by a machine and formatted for human consumption. Alex's grove guide is an example of this. Also, the parts of the SGML property set that do not apply to XML parsers are easily pruned. A grove plan that specifies this pruning is in the works, but as a start, try ignoring everything after the first three modules (baseabs, prlgabs0, and instabs). I've temporarily created a browseable rendition of these modules (and only these modules) at "http://www.techno.com/~peter/sgml-esis/". The HTML generation software I used to do this is not quite done yet, but I hope these pages will be useful anyway. The most notable problem is that I have not yet written the code to generate descriptive pages for modules. (If there are other problems, or you have suggestions on how to improve the format or enhance its usefulness, please tell me; I'll be using this software to produce browseable renditions of the complete SGML and HyTime property sets for the upcoming HyTime user's group site.) -peter -- Peter Newcomb TechnoTeacher, Inc. 233 Spruce Avenue P.O. Box 23795 Rochester, NY 14611-4041 USA Rochester, New York 14692-3795 USA +1 716 529 4303 (home) +1 716 464 8696 (direct) +1 716 755 8698 (cell) +1 716 271 0796 (main) +1 716 529 4304 (fax) +1 716 271 0129 (fax) peter@petes-house.rochester.ny.us peter@techno.com http://www.petes-house.rochester.ny.us http://www.techno.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri Jun 20 01:58:08 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:01 2004 Subject: XML Java API Standardization Message-ID: <8223@ursus.demon.co.uk> In message <199706191800.NAA14675@copsol.com> lex@www.copsol.com (Alex Milowski) writes: > Ok, I'm going to write about the "vision" thing... so you have been warned! ;-) [... lots of other contributions and the vision thing read, hopefully understood, and snipped...] > > What we design and engineer today and label a standard may stick around longer > than we expect. We shouldn't take too minimalist of an approach. My > compromise is for a *reasonable* solution that has growth of the API > designed into it. We have had a previous discussion on this list (ca. 2+ months ago) and we got quite close to getting an API and then everyone went off to other things. It's even more urgent now, because if we don't close on something, then in 2 more months there will be 16 incompatible parsers ... [a tcl one was anounced today, and we can assume there are others which don't come near XML-DEV...] It seems as if out of the spectrum of possibilities at one end is a 'golden grove' solution where every possible property set, etc. is included. And at the other the reasonable minimum is close to what John has put up. There is also something 'in the middle' which may be more difficult to hit precisely. I do not want to have a say in this (I don't even know what a grove *is* - even after having had it explained more than once), but I'll try to work with whatever comes out. However, I think we ought to aim for something within the next few days or we run the risk of losing momentum again. I am particularly impressed by the spirit of collaboration in this discussion and the willingness of current authors to recraft their code (OK they have to do it again on July 1 anyway :-). If we can agree where in the spectrum we wish to end up (there could be more than one place, as suggested), then it may take a few days to flesh out the details. Would it be reasonable to aim for having something roughly concurrent with July 1?? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eliot at isogen.com Fri Jun 20 04:18:06 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:01 2004 Subject: XML Java API Standardization Message-ID: <3.0.32.19970619210506.00b4cefc@swbell.net> At 11:49 PM 6/19/97 GMT, Peter Murray-Rust wrote: >I do not want to have a say in this (I don't even know what a grove *is* - >even after having had it explained more than once), A grove is nothing more than a directed graph of objects whose classes and properties are formally defined in a "property set", where a "property set" is nothing more than an object schema definition defined according the (small set of) rules defined in the Property Set Definition Requirements annex of the (very soon to be released) HyTime standard (Second Edition). The only thing that distinguishes a grove from any other graph-based object representation is a few unique object characteristics that happen to make representing SGML documents a lot easier. So saying "you should have a grove" is really saying "you should make your in-memory data structures follow the object schema defined by the SGML property set." There's really not that much to it. There's no reason to duplicate the person years of work that have gone in to defining the SGML property set, unless you enjoy the exercise of beating your head against that wall. The *only* reason groves are called groves, instead of "the directed graph, in-memory representation of a parsed SGML document" is that we didn't want to have to keep saying the latter. If it helps, substitute "parse tree" for "grove" and you'll be close enough to the truth that it won't matter for the purpose of discussion. Cheers, E. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From akirkpatrick at ims-global.com Fri Jun 20 14:15:19 1997 From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com) Date: Mon Jun 7 16:58:01 2004 Subject: XML Java API Standardization Message-ID: Tim, For this kind of purpose, I think the grove formalism is massive overkill; right now people can whip off XML parsers in a week, if we require them to master grove plans and property sets and so on, we're tripling the amount of time that has to be invested. I think it's true to say that soon all SGML/XML applications will be working on the parse tree (or grove) rather than the raw events. With this in mind, I would be quite willing to wait two weeks longer for a good XML grove API (after all, we've been waiting years for SGML tools :) If nothing comes up, I will have to write my own "parse tree builder" and you can bet it won't be compatible with anyone else's, beyond the simple notions of "element" and "tree". This is part of the problem; last time I looked, the SGML property set was over 75 pages in length, and most of what it contains is just not interesting for XML parsers. As someone's already said, we need to define the reduced property set for XML and make it easier to understand. If we could just agree, specifically for Java, how to talk to a few basic things (Element, Attribute, etc), this would be a huge step forward. -Tim I don't agree. XML isn't just a "Web thing". It has the potential to change the way applications communicate and store information. The XML is actually unimportant, it is the structure (represented in memory by the parse tree) which counts. We need common APIs to manipulate and query the structure. An interesting discussion either way... Alfie. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ddb at criinc.com Sat Jun 21 01:35:51 1997 From: ddb at criinc.com (Derek Denny-Brown) Date: Mon Jun 7 16:58:01 2004 Subject: XML Java API Standardization Message-ID: <3.0.32.19970620163449.00a5bda0@mailhost.criinc.com> >I've temporarily created a browseable rendition of these modules (and >only these modules) at "http://www.techno.com/~peter/sgml-esis/". The >HTML generation software I used to do this is not quite done yet, but >I hope these pages will be useful anyway. The most notable problem is >that I have not yet written the code to generate descriptive pages for >modules. sounds familiar. -derek -------------------------------------------------------------- ddb@criinc.com || software-engineer || www/sgml/java/perl/etc. "Just go that way, really fast. When something gets in your way, turn." -- _Better_Off_Dead_ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From peat at erols.com Sat Jun 21 15:10:54 1997 From: peat at erols.com (Peat) Date: Mon Jun 7 16:58:01 2004 Subject: XML Java API Standardization Message-ID: <199706211310.JAA17653@smtp2.erols.com> If the document is very large, and the parser is required to maintain the grove, we would then require the parser to also then include some type of defined memory management. Can this be a problem, where different parsers implement resource management differently? I would think if this burden is on the application layer, then knowledge of the application can be used to optimize resources. Grove standardization is a good idea. Any ideas on how the grove standardization can be implemented up one layer? - Bruce Peat peat@erols.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat Jun 21 17:59:05 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:01 2004 Subject: XML Java API Standardization Message-ID: <8335@ursus.demon.co.uk> In message <199706211310.JAA17653@smtp2.erols.com> "Peat" writes: > If the document is very large, and the parser is required to maintain the > grove, we would then require the parser to also then include some type of > defined memory management. Can this be a problem, where different parsers > implement resource management differently? This is an important point and one which I've been conscious of but ignored so far. JUMBO is quite large (with all the MOL classes in there's about half a megabyte of classes and I have had outOfmem failures with large files (ca. 1 Mbyte legacy input and translation into a tree). I don't know whether there is a generic solution to this. I tried to run the garbage collector (JDK1.02) occasionally and this helps, but since parser and browser and document all have to be in memory then large docs are a problem. Presumably in an application subtrees can be saved to disk (serialized?) > > I would think if this burden is on the application layer, then knowledge of > the application can be used to optimize resources. I would think that if the author uses entities, then knowledge of the entity structure would help. In the browser the entities could be treated as 'pointers' and resolved only when required. > > Grove standardization is a good idea. Any ideas on how the grove > standardization can be implemented up one layer? ^^ ??? ^^^ Again, I reiterate that I'd like to see something concrete in a few days and not to lose the momentum again. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From clloyd at gorge.net Sat Jun 21 20:01:28 1997 From: clloyd at gorge.net (Chris Lloyd) Date: Mon Jun 7 16:58:01 2004 Subject: XML Java API - An Idea(*) Message-ID: <01BC7E32.22CDABA0@chaosmobile.com.chaos> -----Original Message----- From: Peter Murray-Rust [SMTP:Peter@ursus.demon.co.uk] Sent: Saturday, June 21, 1997 11:31 AM To: xml-dev@ic.ac.uk Subject: Re: XML Java API Standardization In message <199706211310.JAA17653@smtp2.erols.com> "Peat" writes: > If the document is very large, and the parser is required to maintain the > grove, we would then require the parser to also then include some type of > defined memory management. Can this be a problem, where different parsers > implement resource management differently? Memory management issues shouldn't be an issue in the API standardization. If you are using a parser that cannot serialize the tree, then you are certainly going to be limited by memory. If you are using an object database to implement the grove, then you don't have size limitations but speed may become an issue. This is an important point and one which I've been conscious of but ignored so far. JUMBO is quite large (with all the MOL classes in there's about half a megabyte of classes and I have had outOfmem failures with large files (ca. 1 Mbyte legacy input and translation into a tree). I don't know whether there is a generic solution to this. I tried to run the garbage collector (JDK1.02) occasionally and this helps, but since parser and browser and document all have to be in memory then large docs are a problem. Presumably in an application subtrees can be saved to disk (serialized?) > > I would think if this burden is on the application layer, then knowledge of > the application can be used to optimize resources. I would think that if the author uses entities, then knowledge of the entity structure would help. In the browser the entities could be treated as 'pointers' and resolved only when required. Yes this is how other groves have been implemented > > Grove standardization is a good idea. Any ideas on how the grove > standardization can be implemented up one layer? ^^ ??? ^^^ I'm just entering this thread so I don't know what solutions have been discussed. There is already an API to draw from in the DSSSL spec and a definition of the SGML property set which gives us a common language to work from. The problem is that an XML API to a grove should be simple with a small interface and should leverage the object-oriented power and syntax of Java. Personally, when working with groves I find some abstractions very useful in an API. I would rather have an API based on iterators than one based on a set of navigation function calls. I'm talking about navigating the grove rather than building the grove. An iterator API would be extremely simple, well abstracted and more inline with patterns of C++ and Java programming than the SDQL API found in DSSSL. They could also maintain an adherence to the syntax of the SGML property set. Here is an example although my naming syntax probably does not correspond to the SGML property set here. // Assuming we have a object provided by the parser that is a grove, instantiate an iterator and navigate to the first element that is a TITLE tag // A Factory is an object that defines what SGML/XML constructs the iterator knows how to iterate. It provides the grove iterator with a different node iterator for each property node that it knows how to walk. ForwardGroveIterator XMLIter(OurGrove, XMLPropertySetFactory(), StartNodePropertyHandle); While(XMLIter++ != XMLIter.end()) { XMLBaseProperty Prop = XMLIter.Object(); // in C++ we would use the dereference operator like this XMLBaseProperty Prop = *XMLIter; If (Prop.GetClass() == Element.Class) // is this an element? { Element aElement = Prop; // lets convert the property from a base class object to it's concrete class // Now we have an element object and can call all it's member functions if (Element.GetIdent() == String("TITLE")) break; } } // OK lets instantiate a new iterator to walk back up to the root of the grove // use the copy constructor to produce a reverse iterator from our x and functions of individual properties in the grove. Hence we can use the SGML property set or another property set with the same code. 6.) Iterators work well in different memory models and garbage collection schemes. 7.) Iterators, Factories, and Algorithmns can be combined in very powerful and flexible ways. 8.) Finally, Iterators are fun!! Chris Lloyd clloyd@gorge.net Again, I reiterate that I'd like to see something concrete in a few days and not to lose the momentum again. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtauber at jtauber.com Sat Jun 21 20:49:36 1997 From: jtauber at jtauber.com (James K. Tauber) Date: Mon Jun 7 16:58:01 2004 Subject: XML Property Set Message-ID: <01BC7EB7.10467860.jtauber@jtauber.com> About a month ago I started making a list of those classes and properties from the SGML Property Set that were appropriate to XML. I got through about half of the classes the first night and then didn't touch it until now. With all this talk about XML groves, it is worth me finishing off the list? James K. Tauber xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sat Jun 21 21:10:10 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:01 2004 Subject: XML Property Set Message-ID: <8346@ursus.demon.co.uk> In message <01BC7EB7.10467860.jtauber@jtauber.com> "James K. Tauber" writes: > > About a month ago I started making a list of those classes and properties > from the SGML Property Set that were appropriate to XML. I got through > about half of the classes the first night and then didn't touch it until > now. > > With all this talk about XML groves, it is worth me finishing off the list? In my grove-illiterate opinion, yes! The PropertySet is a sword of Damocles hanging over these discussions. It's clear that we can't have all 70+ properties. IF (and I hope it's not a big IF) we can agree on a subset of the property set then we don't have this problem dissipating the discussion every time we get close :-) James Clark came up with a grove subset about 3 months back (have a look in March xml-dev) in response to one of my typical blunderings for information. It looked simple (I can't tell if it was comprehensive) and imagine it is fairly close to what we require. Unfortunately no-one seemed to take it up. So do JamesC and JamesT converge on a common solution? If they do, why not freeze this as an alpha version of the propertySubSet...?? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtauber at jtauber.com Sat Jun 21 22:17:15 1997 From: jtauber at jtauber.com (James K. Tauber) Date: Mon Jun 7 16:58:01 2004 Subject: XML Property Set Message-ID: <01BC7EC3.70CDF6C0.jtauber@jtauber.com> On Saturday, June 21, 1997 1:01 PM, Peter Murray-Rust [SMTP:Peter@ursus.demon.co.uk] wrote: > In my grove-illiterate opinion, yes! The PropertySet is a sword of Damocles > hanging over these discussions. It's clear that we can't have all 70+ > properties. IF (and I hope it's not a big IF) we can agree on a subset > of the property set then we don't have this problem dissipating the > discussion every time we get close :-) It shouldn't be a big IF at all. Deciding what to rip out isn't too difficult. The only problem lies in agreeing on how to do the additional classes (like XMLDECL) needed and how (or if) the properties should be modularised. > James Clark came up with a grove subset about 3 months back (have a look in > March xml-dev) in response to one of my typical blunderings for information. I'll go back and check that. JamesC would be in a MUCH better position to write an XML property set than me! James 'the other James' Tauber :-) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From clloyd at gorge.net Sat Jun 21 22:38:32 1997 From: clloyd at gorge.net (Chris Lloyd) Date: Mon Jun 7 16:58:01 2004 Subject: REPOST: XML Java API-an idea Message-ID: <01BC7E48.096832A0@chaosmobile.com.chaos> This is a repost because some of the original post was clipped. I'm just entering this thread so I don't know what solutions have been discussed. There is already an API to draw from in the DSSSL spec and a definition of the SGML property set which gives us a common language to work from. The problem is that an XML API to a grove should be simple with a small interface and should leverage the object-oriented power and syntax of Java. Personally, when working with groves I find some abstractions very useful in an API. I would rather have an API based on iterators than one based on a set of navigation function calls. I'm talking about navigating the grove rather than building the grove. An iterator API would be extremely simple, well abstracted and more inline with patterns of C++ and Java programming than the SDQL API found in DSSSL. They could also maintain an adherence to the syntax of the SGML property set. Here is an example although my naming syntax probably does not correspond to the SGML property set here. // Assuming we have a object provided by the parser that is a grove, instantiate an iterator and navigate to the first element that is a TITLE tag // A Factory is an object that defines what SGML/XML constructs the iterator knows how to iterate. It provides the grove iterator with a different node iterator for each property node that it knows how to walk. ForwardGroveIterator XMLIter(OurGrove, XMLPropertySetFactory(), StartNodePropertyHandle); While(XMLIter++ != XMLIter.end()) { XMLBaseProperty Prop = XMLIter.Object(); // in C++ we would use the dereference operator like this XMLBaseProperty Prop = *XMLIter; If (Prop.GetClass() == Element.Class) // is this an element? { Element aElement = Prop; // lets convert the property from a base class object to it's concrete class // Now we have an element object and can call all it's member functions if (Element.GetIdent() == String("TITLE")) break; } } // OK lets instantiate a new iterator to walk back up to the root of the grove // use the copy constructor to produce a reverse iterator from our forward iterator ReverseGroveIterator XMLReverseIter(XMLIter); While(XMLReverseIter++ != XMLReverseIter.end()) { // do stuff here } The navigation itself is not the same as defined in SDQL but the property set could be made to conform to the SGML property set. This might offer a compromise. The factory concept is very powerful because extending an iterator is as simple as adding a new factory class and a nodeiterator class for each new property being added to the grove. If someone wanted to inherit from the XML property set and put metadata in their grove, they could easily extend the functionality of the base iterators to support their new properties. Because the iterator class has a small interface, It's easy to plug and play new iterators into existing code. You can read more about iterators and factories in Design Patterns, Addison Wesley, Gamma, Helm, Johnson, Vlissides. Once we have the appropriate iterators then we can create an API of Functions and Algorthimns maybe based on SDQL that can do higher-level operations like this // Find the first parent object that is an element Algorithmn::find( ReverseIter, classid()); // C++ sytax with templates Algorithmn::find( ReverseIter, classid(ELEMENT)); // Java sytax without templates // Find the first object that is an element and whose name is TITLE if (Algorithmn::find( ReverseIter, AND(classid(), name("TITLE")))) { Element aElementFound = *ReverseIter; // get the element and use it } Why we need iterators 1.) Iterators hide the details of how a grove is actually linked together, whether is memory or in a object database, etc.. 2.) Iterators have the same iterface regardless of the types of properties in the grove 3.) Iterators are extensible and can provide read-only functionality as well as read-write functionality 4.) Iterators are a well know and accepted design pattern and are xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtigue at datachannel.com Sun Jun 22 01:24:38 1997 From: jtigue at datachannel.com (John Tigue) Date: Mon Jun 7 16:58:01 2004 Subject: JAX [was: XML Java API Standardization] Message-ID: <33AC62E1.55731DC9@datachannel.com> I have updated the site which discusses XML Java API Standardization in order to reflect the feedback of the last few days. The site is located at: http://www.datachannel.com/channelworld/xml/dev/ The most significant change has been the inclusion of event stream stuff. Event streams being lower level then parse trees they can't be ignored. DSSSL grove work is being studied for its relavant influence in terminology and future work. I'd like to leave the actual grove work for a later version. So the work has been repositioned as the lowest level (event streams) plus some of the next level (parse tree but not full grove). Also it seems the best thing to do would be to target JDK 1.1 because is has java.io.Reader which makes Unicode and internationalization much easier. JDK 1.0.2 will also supported but "depreciated." Every implementation has been building some sort of UnicodeInputStream and it seems that Reader is the way to go. I want JAX now but I don't want to blow the i18n stuff. I have become tired of typing "XML Java API Standardization" so I propose we rename it to "Java API for XML" or JAX for short. If anyone has a better idea I'd like to hear it. -- John Tigue Programmer jtigue@datachannel.com DataChannel (http://www.datachannel.com) 206-462-1999 -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 316 bytes Desc: Card for John Tigue Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970622/4387dca0/vcard.vcf From cbullard at hiwaay.net Sun Jun 22 02:27:56 1997 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 16:58:01 2004 Subject: JAX [was: XML Java API Standardization] References: <33AC62E1.55731DC9@datachannel.com> Message-ID: <33AC7165.C@hiwaay.net> John Tigue wrote: > > > I have become tired of typing "XML Java API Standardization" so I > propose we rename it to "Java API for XML" or JAX for short. If anyone > has a better idea I'd like to hear it. That's great. With Java Jumpin' Beans we can now add Java Jumpin' JAX. i love it. len xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From digitome at iol.ie Sun Jun 22 12:51:43 1997 From: digitome at iol.ie (Digitome Ltd.) Date: Mon Jun 7 16:58:01 2004 Subject: XML Java API Standardization Message-ID: <199706221051.LAA26889@mail.iol.ie> (I am not a Java person so I don't know the syntax for doing the following in Java. I presume it is possible. I think the approach might be useful though so here goes) The idea is to 1) have a textual representation of an XML document as a Python program 2) be able to re-create textual representations of XML document structures as Python programs The following is a Python representation of a simple XML doc:- from XMLStructures import * x = XMLTree ( XMLElement("FOO",(("ATTR1","VALUE1"),("ATTR2","VALUE2")), ( XMLElement("BAR",(),()) ) ) ) The nice thing about this is that it is both data file and parser rolled into one. A simple "import" statement recreates the in-memory representation of this data structure. Having created/manipulated an XMLTree the textual representation can be created with a single print statement:- x = XMLTree ( XMLElement("FOO",(("ATTR1","VALUE1"),("ATTR2","VALUE2")), ( XMLElement("BAR",(),()) ) ) ) # Change x here # ... print x XMLTree (XMLElement ("FOO",(('ATTR1', 'VALUE1'), ('ATTR2', 'VALUE2')),XMLElement ("BAR",(),()))) 1) Such structures give an immediate API in the form of Lispy list processing stuff. 2) Such structures allow parsers to be compared / checked for correct interpretation of XML. 3) Such structures give developers something to aim at when developing XML markup aware tools. Just in case anyone is interested, here is the Python code for the classes :- class XMLTree: def __init__ (self,r): self.root = r def __repr__ (self): return "XMLTree (%s)" % (self.root,) class XMLElement: def __init__(self,gi,attlist,children): self.GI = gi self.XMLAttlist = attlist self.Children = children def __repr__(self): return "XMLElement (\"%s\",%s,%s)" % (self.GI,self.XMLAttlist,self.Children) class XMLPcdata: def __init__(self,dat): self.data = dat def __repr__(self): return self.data Sean Mc Grath sean@digitome.com Digitome Electronic Publishing http://www.digitome.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From digitome at iol.ie Sun Jun 22 12:51:49 1997 From: digitome at iol.ie (Digitome Ltd.) Date: Mon Jun 7 16:58:01 2004 Subject: JAX Message-ID: <199706221051.LAA26896@mail.iol.ie> JAX is Irish slang for toilet! :-( Sean Sean Mc Grath sean@digitome.com Digitome Electronic Publishing http://www.digitome.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Sun Jun 22 15:43:16 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:01 2004 Subject: XML Java API Standardization Message-ID: <8361@ursus.demon.co.uk> In message <199706221051.LAA26889@mail.iol.ie> digitome@iol.ie (Digitome Ltd.) writes: > (I am not a Java person so I don't know the syntax for doing the following > in Java. Just to reassure the membership - XML-DEV is not Java-only - anything goes :-) [...] > The following is a Python representation of a simple XML doc:- > > from XMLStructures import * > > x = XMLTree ( > XMLElement("FOO",(("ATTR1","VALUE1"),("ATTR2","VALUE2")), > ( > XMLElement("BAR",(),()) > ) > ) > ) > > The nice thing about this is that it is both data file and parser rolled > into one. > Presumably this is similar to a serialised object (except that I believe that Java serialisation will not give a very readable file.) A possible attraction of serialised XML objects (e.g. at grove level) is that they would read into memory more rapidly, bother because no parsing was required and presumably because there are tricks for allocating memory. Obviously different parsers/applications would have different serialisations but if we had a standard grove it *might* be possible to have agreed serialisations of it. Or is this off track? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From peat at erols.com Sun Jun 22 16:14:30 1997 From: peat at erols.com (Peat) Date: Mon Jun 7 16:58:02 2004 Subject: JAX Message-ID: <199706221414.KAA15189@smtp2.erols.com> Oh oh, We can't have that !!! Here is a suggestion as an alternative.. XAPI-J pronounced "Zapi-J", which allows for XAPI-C or XAPI-Prolog, etc. and therefore extendible for whatever language which comes down the line. - Bruce Peat ---------- > From: Digitome Ltd. > To: xml-dev@ic.ac.uk > Subject: JAX > Date: Sunday, June 22, 1997 6:26 AM > > JAX is Irish slang for toilet! :-( > > Sean > > Sean Mc Grath > > sean@digitome.com > Digitome Electronic Publishing > http://www.digitome.com > > > xml-dev: A list for W3C XML Developers > Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ > To unsubscribe, send to majordomo@ic.ac.uk the following message; > unsubscribe xml-dev > List coordinator, Henry Rzepa (rzepa@ic.ac.uk) > xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtigue at datachannel.com Sun Jun 22 19:00:39 1997 From: jtigue at datachannel.com (John Tigue) Date: Mon Jun 7 16:58:02 2004 Subject: XAPI-J [was: JAX] References: <199706221414.KAA15189@smtp2.erols.com> Message-ID: <33AD5A34.13777A1E@datachannel.com> For the benefit of XML conversations in Ireland, let's change to XAPI. Now Extensible Markup has an extensible API name. I'm focused on XAPI-J; is there any work in other language that I should be aware of? > Oh oh, We can't have that !!! Here is a suggestion as an > alternative.. > > XAPI-J pronounced "Zapi-J", which allows for XAPI-C or XAPI-Prolog, > etc. > and therefore extendible for whatever language which comes down the > line. > > JAX is Irish slang for toilet! :-( -- John Tigue Programmer jtigue@datachannel.com DataChannel (http://www.datachannel.com) 206-462-1999 -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 316 bytes Desc: Card for John Tigue Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970622/1566f95b/vcard.vcf From tbray at textuality.com Sun Jun 22 21:08:12 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:02 2004 Subject: MCF again Message-ID: <3.0.32.19970622120552.00a799d0@pop.intergate.bc.ca> MCF is Meta Content Framework, an application of XML proposed by Netscape. The drafts have been heavily reworked based on early feedback, check the spec out at: http://www.textuality.com/mcf/NOTE-MCF-XML.html If (like a lot of other people) you found MCF a little daunting first time around, you might want to check out the new tutorial at: http://www.textuality.com/mcf/MCF-tutorial.html I understand this is now going to migrate over to a just-now-forming new working group in W3C that is going to try to co-ordinate all the disparate metadata activities. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From steve at cs.anu.edu.au Mon Jun 23 01:56:53 1997 From: steve at cs.anu.edu.au (Steven Ball) Date: Mon Jun 7 16:58:02 2004 Subject: XML Java API Standardization In-Reply-To: Your message of "Sun, 22 Jun 1997 11:26:23 +0100." <199706221051.LAA26889@mail.iol.ie> Message-ID: <199706222356.JAA09014@tcltk.anu.edu.au> > (I am not a Java person so I don't know the syntax for doing the following > in Java...) I'm no Java-phile either ;-) > The idea is to > > 1) have a textual representation of an XML document as a Python program > 2) be able to re-create textual representations of XML document structures > as Python programs I've done essentially the same thing for Tcl. My XML parser emits a "Heirarchical Tcl List Representation" of an XML document. For example: set doc { Audience Steve This is XML! } XML::parse $doc returns ==> parse:pi ?XML {VERSION 1.0} {} parse:pi !DOCTYPE {SYSTEM memo.dtd} {} parse:elem MEMO {REF 1234} { parse:elem TO {} { parse:text Audience {} {} } parse:elem FROM {} { parse:text Steve {} {} } parse:elem MESSAGE {} { parse:text {This is XML!} {} {} } } (above has been edited slightly for email-readability) This representation has two features: it can be easily manipulated as a list, especially with the dummy arguments to parse:pi and parse:text, and it can be passed to the `eval' command for execution - the element contents are themselves scripts. > 1) Such structures give an immediate API in the form of Lispy list > processing stuff. > 2) Such structures allow parsers to be compared / checked for correct > interpretation of XML. > 3) Such structures give developers something to aim at when developing XML > markup aware tools. Agreed, and the similarity of our (independent) approaches is noteworthy. My only comment is that (2) is modulo list syntax. Cheers, Steve Ball xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at Eng.Sun.COM Mon Jun 23 07:29:24 1997 From: Jon.Bosak at Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:02 2004 Subject: XAPI-J [was: JAX] In-Reply-To: <33AD5A34.13777A1E@datachannel.com> (jtigue@datachannel.com) Message-ID: <199706230527.WAA04374@boethius.eng.sun.com> [John Tigue:] | For the benefit of XML conversations in Ireland, let's change to XAPI. | Now Extensible Markup has an extensible API name. Great. | I'm focused on XAPI-J; is there any work in other language that I | should be aware of? There's already a validating XML parser in Tcl, and versions in other languages are bound to follow. Even so (and even trying to compensate for my bias as a Sun employee), I think that there is, and is going to be, such a powerful connection between XML and Java on the Web that the default name for the Java XML API should be simply XAPI ("zappy"), and all other versions should use the qualified names (Tcl-XAPI or whatever). Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jeanpa at microsoft.com Mon Jun 23 07:38:10 1997 From: jeanpa at microsoft.com (Jean Paoli) Date: Mon Jun 7 16:58:02 2004 Subject: XML-Data Message-ID: <78DFE33066ABD0118B9200805FD431BA932987@RED-16-MSG.dns.microsoft.com> I am pleased to present XML-Data, a Position Paper from Microsoft. XML-Data is an application of XML for exchanging structured data and metadata on the Internet. This position paper is sent to multiple working groups in the W3C dealing with this subject (XML, meta-data) and we expect this paper to be discussed and improved by these working groups. The current proposal needs namespaces and uses the Layman/Bray proposal. The URL of this paper (on the Microsoft site) will be posted tomorrow. -Jean Paoli ---------------- XML-Data

XML-Data.html

Position Paper from Microsoft
20 June 1997

XML-Data

Authors:
Andrew Layman, Microsoft Corporation
Jean Paoli, Microsoft Corporation
Steve De Rose, Inso Corporation
Henry S. Thompson, University of Edinburgh
Acknowledgements:
We thank Paul Grosso (Arbortext), Sharon Adler (Inso Corporation), Anders Berglund (Inso Corporation), François Chahuneau (AIS/Berger-Levrault), and Edward Jung (Microsoft) for their help and contributions to this proposal.

Copyright (c) 1997 Microsoft Corp.


Abstract

This document provides the specification for exchanging structured and networked data on the Web. This specification uses XML, the Extensible Markup Language for describing data as well as data about data. We expect this specification to be useful for a wide range of applications such as describing database transfers, digital signatures or remotely-located web resources.

1. Introduction

The Internet holds the potential to integrate all information in a global network (with many private but integrated domains). The Internet promises access to information any time and, with wireless technology, anywhere. Today, however, the Internet is merely an access medium to text and pictures. To actualize the Internet's potential, we need to add intelligent search, data exchange, adaptive presentation, and personalization. The Internet must go beyond setting an information access standard, and must set an information understanding standard, which means: a standard way of representing data so that software can better search, move, display, and otherwise manipulate information currently hidden in contextual obscurity.

XML is an important step in this direction. It offers a standard syntax for textual structure of tagged data, based on extensive industry and theoretical experience. Its lexical format easily depicts a tree structure. A tree is a natural format that is richer than a simple flat list, yet (compared to a generalized graph) also respectful of cognitive and data processing requirements for economy and simplicity.

Looking at this point in more detail, there are several ways of structuring data. One is a flat tagging system. In this system, sets of keywords are applied to data elements. This is a simple form of data structure, but it does not capture any relationships between the keywords.

A more advanced means of structuring information is a tree. A tree allows expression of subsumption, containment, or any other single (contextual) relationship such as "manages." Trees correspond to object-oriented class hierarchies, file system hierarchies, organizational hierarchies and so forth. Trees are relatively easy to understand and to construct. Trees are efficient to process, and there is a linear (e.g. textual) structure that a program can parse incrementally, and determine when it is finished. This makes trees particularly useful as a transmission format for asynchronous, distributed systems such as the Internet, and also for display purposes where the single relationship (usually visual containment) enables incremental display.

A still more elaborate structure is a directed graph. A graph allows expression of arbitrary binary relationships, that is, many relationships between two things. A graph can express subsumption, containment, and any number of other relationships simultaneously. It is therefore a superset of a tree. This makes graphs very expressive for real-world semantics, but it also makes them harder to understand, more difficult to construct, and less efficient to process than trees. There is no efficient linear (e.g. textual) structure of a graph that can be incrementally processed. Therefore, while they are particularly useful for representing (and instrumenting) the complete semantics of a system, they are typically not suitable for transmission, display, or immediate processing.

The tree structure is proved broadly implementable and easy to deploy, not just in theory but also widely in practice. Industrial implementations, in the SGML community and elsewhere, demonstrate its intrinsic quality and industrial strength, e.g. aircraft (ATA), automotive (J2008), banking (OFX), and semiconductors (Pinnacles PCIS).

This proposal shows how to add a single convention to XML so that graph arcs are easily added into a lexical tree structure, without requiring decomposition of tree format into a "lowest common denominator" nodes-and-arcs structure. (For a quick look at the difference, see the XML-Data versus MCF in XML comparison.)

XML-Data consists of a collection of related technologies. First, it unifies lexical trees with graph structures. Second, it builds on this to define a representation for schemata based on XML instance syntax. It offers a mechanism to organize element types into a hierarchy, and proposes a small set of basic types. Finally, it adds facilities for lexical typing and proposes a small collection of lexical types.

XML-Data can encode the content, semantics and schemata for a gamut of cases, from simple and prosaic to complex and sophisticated:

  • An ordinary document
  • A structured record, such as a appointment record or purchase order
  • An object, with data and methods
  • A data record, such as the result set of a query
  • Information in a database or a web site (e.g. CDF)
  • Graphical presentation (e.g. an application user interface)
  • Upper ontology (standard schema entities and types)
  • UberWeb (all the links between information and people on the web)

The resulting flexibility of a single homogenous data representation system allows any reader to uniformly determine the structural semantics of a data element. Information can then be reused for new purposes and in novel contexts. For example, a record from a database of restaurants and a record from a client contact database might be reused in the context of an appointment, say in setting a lunch date with a client. The relationships between the restaurant and contact data do not reside in the schema data described by either database individually, but are extensions defined by the instance of the appointment.

This proposal, building on the earlier Web Collections in XML proposal, shows how to use a single syntax for a broad range of data, using that syntax for data and schemata, permitting the expressiveness of graph data when such power is required, but retaining the benefits of lexical trees.

2. Examples of XML-Data

Data

The following example shows a simple order from a bookstore for several books, a record, and a cup of coffee.

<ORDER>
  <SOLD-TO>
    <PERSON><LASTNAME>Layman</PERSON>
            <FIRSTNAME>Andrew</FIRSTNAME>
    </PERSON>
  </SOLD-TO>
  <SOLD-ON>19970317</SOLD-ON>
  <ITEM>
    <PRICE>5.95</PRICE>
    <BOOK>
      <TITLE>Number, the Language of
Science</TITLE>
      <AUTHOR>Dantzig, Tobias</AUTHOR>
    </BOOK>
  </ITEM>
  <ITEM>
    <PRICE>12.95</PRICE>
    <BOOK>
      <TITLE>Introduction to Objectivist
Epistemology</TITLE>
      <AUTHOR>Rand, Ayn</AUTHOR>
    </BOOK>
  </ITEM>
  <ITEM>
    <PRICE>12.95</PRICE>
    <RECORD>

<TITLE><COMPOSER>Tchaikovsky's</COMPOSER
> First Piano Concerto</TITLE>
      <ARTIST>>Janos</ARTIST>
    </RECORD>
  </ITEM>
  <ITEM>
    <PRICE>1.50</PRICE>
    <COFFEE>
      <SIZE>small</SIZE>
      <STYLE>cafe macchiato</STYLE>
    </COFFEE>
  </ITEM>
</ORDER>

XML-Data is flexible enough to encode heterogeneous structures, for example books, records and coffee all within one sales order. These different kinds of items do not need to all have the same internal parts. For example, books have titles, coffee generally doesn't. XML-Data allows values to be expressed as element content (for example the book titles shown) or with a value attribute (for example the author and artist elements). Properties of elements can be expressed as attributes (e.g. size and style of coffee) or as sub-elements (e.g. author, artist). XML-Data can appear in separate documents or within other documents (such as HTML pages).

Data about Other Data

XML-Data is suitable for complex, self-contained data structures such as the book order, and also for information such as the Channel Definition Format, which describes remotely-located web resources, many of which are themselves data:

<CHANNEL>
  <ITEM
HREF="http://www.zoosports.com/intro.htm"
level="2"
precache="NO">
    <A
HREF="http://www.zoosports.com/page1.htm">
This is a link to page 1.</A>
    <TITLE>Welcome to ZooSports!</TITLE>
    <ABSTRACT>ZooSports articles, news, and promotional
offers</ABSTRACT>
  </ITEM>
  <SCHEDULE ENDDATE="1994-11-05">
    <INTERVALTIME DAY="1"/>
    <EARLIESTTIME HOUR="12"/>
    <LATESTTIME HOUR="18"/>
  </SCHEDULE>
</CHANNEL>

PICS-NG Labels

XML-Data can express PICS-NG Labels:

(This uses the Layman-Bray proposal for namespaces.)

<xml>
  <xml:schema>
    <namespaceDcl
href="http://purl.org/Schemas"
name="purl"/>
    <namespaceDcl
href="http://www.foo.com"
name="foo"/>
  </xml:schema>
  <xml:data>
    <purl:description1
href="http://purl.color.org/document.html">
;
      <title>Light and Dark: A study of
color</title>
      <subject><LCSH>
          <for>Color and Color
Palettes</for></LCSH> </subject>
      <author> <foo:author>
                            <name>John
Smith</name>

<affiliation>thedarkside</affiliation>

<email>john@thedarkside</email></foo:aut
hor>
               <foo:author>
                            <name>Smith, Jane
Q.</name>

<affiliation>thelightregion</affiliation>

<email>jane@thelightregion</email></foo:
author></purl:description1>
  </xml:data>
</xml>

Digital Signatures, Security &Authentication

Returning to the bookstore example, this is the same order with a digital signature added. The structured nature of XML-Data makes it easy to sign whole elements or parts of them.

<ORDER>
  <dsig:DSIG>

<MANIFEST>>80183589575795589189518915</MANIFEST
>
    <SIG
href="http://XYX/Joe@company.com"/>
  </dsig:DSIG>
  <SOLD-TO>
    <PERSON><LASTNAME>>Layman</PERSO>
            <FIRSTNAME>>Andrew</FIRSTNAME>
    </PERSON>
  </SOLD-TO>
  <SOLD-ON>>19970317</SOL>
  <ITEM>
    <PRICE>5.95</PRICE>
    <BOOK>
      <TITLE>Number, the Language of
Science</TITLE>
      <AUTHOR>Dantzig, Tobias</AUTHOR>
    </BOOK>
  </ITEM>
  <ITEM>
    <PRICE>12.95</PRICE>
    <BOOK>
      <TITLE>Introduction to Objectivist
Epistemology</TITLE>
      <AUTHOR>Rand, Ayn</AUTHOR>
    </BOOK>
  </ITEM>
  <ITEM>
    <PRICE>12.95</PRICE>
    <RECORD>

<TITLE><COMPOSER>Tchaikovsky's</COMPOSER
> First Piano Concerto</TITLE>
      <ARTIST>>Janos</ARTIST>
    </RECORD>
  </ITEM>
  <ITEM>
    <PRICE>1.50</PRICE>
    <COFFEE>
      <SIZE>small</SIZE>
      <STYLE>cafe macchiato</STYLE>
    </COFFEE>
  </ITEM>
</ORDER>

Database Information

While XML-Data can represent complex structures, it can also represent simple ones, for example a simple list of database records:

<BOOK-MASTER-LIST>
  <BOOK id="book1">
    <TITLE>Number, the Language of
Science</TITLE>
    <AUTHOR>>Dantzig, Tobias</AUTHOR>
  </BOOK>

  <BOOK id="book2">
    <TITLE>Introduction to Objectivist
Epistemology</TITLE>
    <AUTHOR>>Rand, Ayn</AUTHOR>
  </BOOK>

  <BOOK id="book3">
    <TITLE>I, The Jury</TITLE>
    <AUTHOR>>Spillane, Mickey</AUTHOR>
  </BOOK>

  <BOOK id="book4">
    <TITLE>Half Magic</TITLE>
    <AUTHOR>>Eager, Edward</AUTHOR>
  </BOOK>

  <BOOK id="book5">
    <TITLE>QED</TITLE>
    <AUTHOR>>Feynmann, Richard P.</AUTHOR>
  </BOOK>
<BOOK-MASTER-LIST>

Graph Structures

An XML-Data element may include links to resources outside the immediate tree. When it meets application needs, this href facility can be used to break up a single structure into multiple parts, with relations among them indicated by Universal Resource Identifier (URI) links. The references can be local or remote. In this example, they are inventory records from the database table we just looked at.

<ORDER id="order1">
   <dsig:DSIG>

<MANIFEST>>80183589575795589189518915</MANIFEST
>
     <SIG
href="http://XYX/Joe@company.com"/>
   </dsig:DSIG>
   <SOLD-TO>

<PERSON><LASTNAME>>Layman</PERSO>
              <FIRSTNAME>>Andrew</FIRSTNAME>
      </PERSON>
    </SOLD-TO>
    <SOLD-ON>19970317<</SOLD-ON>
    <ITEM
href="http://bigbookstore.com/data/bookmaster?XML-XPTR=book
1">
      <PRICE>5.95</PRICE>
    </ITEM>
    <ITEM
href="http://bigbookstore.com/data/bookmaster?XML-XPTR=book
2">
      <PRICE>12.95</PRICE>
    </ITEM>
    <ITEM
href="http://bigbookstore.com/data/musicmaster?XML-XPTR=cd1
">
      <PRICE>12.95</PRICE>
    </ITEM>
    <ITEM>
      <PRICE>1.50</PRICE>
      <COFFEE>
        <SIZE>small</SIZE>
        <STYLE>cafe macchiato</STYLE>
      </COFFEE>
    </ITEM>
</ORDER>

Notice that each of the ITEM elements establishes a relationship between the ORDER and a BOOK, and that the relationship itself has attributes, in this case the price at which the book was sold. Relations can have attributes, can contain elements and the process can be carried to any needed level of detail.

Discontiguous Information (propertyOf)

Information about an element can be contained in the element, but also can sit outside it. For example, the following applies a digital signature to a sales order without actually modifying the order:

<dsig:DSIG>
  <xml:propertyOf
href="http://bigbookstore.com/data/orders?XML-XPTR=order1&q
uot;/>
  <MANIFEST
>80183589575795589189518915</MANIFEST>
  <SIG
href="http://XYX/Joe@company.com"/>
</dsig:DSIG>

Schema

Every data object, such as a purchase order, contains certain parts, such as sold-to, sold-on date, items, etc. We can write a formal description of what these parts are and which are allowed where. This is called a "schema" and is written using a form of XML-Data:

<xml:schema ID="BookOrderSchema">
  <!-- This schema is digitally signed. Schemas are a form of data,
       so they, too, can be signed. -->
  <dsig:DSIG>
    <MANIFEST
>*(&#&$&@*$&%*&@*$&$*@</M
ANIFEST>
    <SIG
href="http://XYX/Jane@company.com"/>
  </dsig:DSIG>

  <!-- Here are all the element types, their contents,
       attributes and relations. -->
  <elementType id="ORDER">
    <relation href="#SOLD-TO"/>
    <relation href="#SOLD-ON"/>
    <relation href="#ITEM"
occurs="STAR"/>
  </elementType>
  <relationType id="SOLD-TO">
    <elt href="#PERSON"/>
  </relationType>
  <relationType id="SOLD-ON">  
    <pcdata/>
    <!-- Date is YYYYMMDD -->
    <attribute name="lextype"
default="DATE.ISO8061"
presence="fixed"/>
  </relationType>
  <elementType id="PERSON">
    <relation href="#LASTNAME"/>
    <relation href="#FIRSTNAME"/>
  </elementType>
  <elementType id="LASTNAME">
    <pcdata/>
  </elementType>
  <elementType id="FIRSTNAME">
    <pcdata/>
  </elementType>
  <relationType id="PRICE">
    <pcdata/>
  </relationType>
  <relationType id="ITEM">
    <any/>
    <relation href="#PRICE"/>
    <range href="#BOOK"/>
    <range href="#RECORD"/>
    <range href="#COFFEE"/>
  </relationType>
  <elementType id="BOOK">
    <relation href="#TITLE"/>
    <relation href="#AUTHOR"/>
  </elementType>
  <elementType id="RECORD">
    <relation href="#TITLE"/>
    <relation href="#ARTIST"/>
  </elementType>
  <relationType id="SIZE">
    <pcdata/>
  </relationType>
  <relationType id="STYLE">
    <pcdata/>
  </relationType>
  <elementType id="COFFEE">
    <relation href="#SIZE"/>
    <relation href="#STYLE"/>
  </elementType>
  <elementType id="TITLE">
    <mixed><elt
href="#COMPOSER"/></mixed>
  </elementType>
  <relationType id="AUTHOR">
    <pcdata/>
  </relationType>
  <relationType id="ARTIST">
    <pcdata/>
  </relationType>
  <relationType id="COMPOSER">
    <pcdata/>
  </relationType>
</xml:schema>

Type Extension

Sometimes some elements are variants of others, in which case we can organize the element types into a genus-species hierarchy using the extends attribute:

<xml:schema ID="ArtSchema">
  <elementType id="artistic-work">
    <relation href="#TITLE"/>
  </elementType>
  <elementType id="BOOK"
extends="#artistic-work">
    <relation href="#AUTHOR"/>
  </elementType>
  <elementType id="RECORD"
extends="#artistic-work">
    <relation href="#ARTIST"/>
    <relation href="#COMPOSER"
occurs="OPTIONAL"/>
  </elementType>
  <relationType id="AUTHOR">
    <pcdata/>
  </relationType>
  <relationType id="COMPOSER"
extends="#AUTHOR"/>
  <relationType id="ARTIST">
    <pcdata/>
  </relationType>
</xml:schema>

Here we see that books and records are both types of artistic work, and that a composer is a type of author.

Schema Extension

We can use also use this ability to customize a schema that has useful features, but which is too general. In this example, we show a general schema for orders, then another one that is customized for our bookstore:

<xml:schema
ID="GenericOrderSchema">
  <elementType id="ORDER">
    <relation href="#SOLD-TO"/>
    <relation href="#SOLD-ON"/>
  </elementType>
  <relationType id="SOLD-TO">
    <elt href="#PERSON"/>
  </relationType>
  <elementType id="PERSON">
    <relation href="#LASTNAME"/>
    <relation href="#FIRSTNAME"/>
  </elementType>
  <relationType id="LASTNAME">
    <pcdata/>
  </relationType>
  <relationType id="FIRSTNAME">
    <pcdata/>
  </relationType>
</xml:schema>  


<xml:schema id="BookOrderSchema">
  <elementType id="ORDER"
extends="http://generic.com/genericOrder?XML-XPTR=ID(ORDER)
">
    <relation href="#ITEM"
occurs="STAR"/>
  </elementType>

  <relationType id="ITEM">
    <any/>
    <relation
href="http://generic.com/genericOrder?XML-XPTR=ID(ORDER)"/>
    <range
href="http://art.com/schemata?XML-XPTR=ID(BOOK)&qu
ot;/>
    <range
href="http://art.com/schemata?XML-XPTR=ID(RECORD)&
quot;/>
    <range href="#COFFEE"/>
  </relationType>

  <relationType id="SIZE">
    <pcdata/>
  </relationType>

  <relationType id="STYLE">
    <pcdata/>
  </relationType>

  <elementType id="COFFEE">
    <relation href="#SIZE"/>
    <relation href="#STYLE"/>
  </elementType>
</xml:schema>

3. XML-Data Schema

The XML-Data schema language defines element types, attributes, relations, and which of these can be used in which combinations with others. It also provides features for organizing element types into a genus-species hierarchy, a basic set of element types, and a small set of lexical types. The schema contains other features from XML Document Type Definition (DTD) language, such as entity and notation declarations. The XML-Data schema is powerful enough to express the same structural information and constraints as XML DTDs. It covers all the features of XML-DTDs. An XML DTD can be mechanically converted to an XML-Data schema.

Schemata are composed of principally of declarations for:

  • element types, represented by elementType
  • attributes of elements, represented by attribute
  • relations among elements, represented by relationType
  • rules governing the valid combinations of the above, represented by any, mixed and pcdata; also by ent, group, relation, and range..
  • internal and external entities, represented by intEntityDecl and extEntityDecl
  • notations, represented by notationDcl

Comments can be interspersed as usual in XML, and there is provision for using references to external schemata or schema fragments.

3.1. The schema document element type: schema

All schema elements are contained within a schema element, like this:

<?XML version='1.0' rmd='all'?>
<!doctype schema SYSTEM
"http://www.w3c.org/pub/sotr/schema.dtd">
<xml:schema id='ExampleSchema'>
  <!-- schema goes here. -->
</xml:schema>

3.2. The element type declaration element type: elementType

Key terms used here: element, elementType, empty, any, mixed, pcdata, content model.

The heart of an XML-Data schema is the elementType declaration which defines a class of elements, gives them attributes, establishes a grammar of which other element types and character data are allowed in their contents and defines their allowable relationships to elements of other classes. (The allowable content, including relations, is called "content model.")

<elementType id="example">  <!-- element
example (p*) -->
    <elt href="#p" occurs="STAR"/>
</elementType>
<elementType id="p">       <!-- element p
((#PCDATA|p)*) -->
    <mixed><elt href="#p"/></mixed> 
</elementType>

The name attribute is optional if id is present, in which case the id is used as the name.

Within an elementType, elt indicates that instances are permitted to only have a single element type in their content. The occurs attribute of elt specifies whether this content is optional, and gives its cardinality.

Empty and any content are expressed using predefined elements empty and any. (Empty may be omitted. Any signals that any mixture of elements and parsed character data is legal.) Parsed character data content is similarly expressed with a pcdata item. Mixed content (a mixture of parsed character data and one or more element types), is identified by a mixed element, whose content identifies the element types allowed in addition to parsed character data (see below).

<elementType id="ARTIST">
  <pcdata/>
</elementType>

More complex content models are created using group:

<elementType id="animalFriends" >
  <group groupType="OR" occurs="STAR">
    <group groupType="OR" occurs="PLUS">
      <elt href="#cat"/>
      <elt href="#dog"/>
    </group>
    <elt href="#bird"/>
    <elt href="#rabbit"/>
    <elt href="#pig"/>
    <elt href="#fish"/>
  </group>
</elementType>

3.3 Relations

Key terms used here: relationType, relation, XML-Link locator, href.

Relation element types express a relationship between one element (usually the relation's parent) and either another element or an atomic value (such as a simple number, string or date). Relations use the XML-Link locator without implying navigation. The target of a relation is the element referenced by the href attribute if one is present, else the element contents. This single convention unifies graphs and trees.

Including a relation in an elementType makes it an implicit part of that element's content model, with the default for occurs being OPTIONAL. Relations must occur (in a valid document instance) after any other content. RelationsTypes are elements, and the full content model is as if there were a sequential group containing first the explicitly provided content model, then the relations in a starred or group with all the relations as content.

Two element types are used in the schema to effect a relation: The relationType is a specialized kind of elementType, while relation has the same function as elt ( but validates that it refers to a relationType).

If a default attribute is specified for a relation, it becomes the default of the value attribute of the relation elt. The range element, if present, declares a restriction on the valid target of a relation. Each range element references one elementType; any of which are valid.

 <relationType id="favoriteFood"
><mixed/></relationType>
 <relationType id="chases"
><any/></relationType>

 <elementType id="dog" >
   <any/>
   <attribute name="name"/>
   <relation href="favoriteFood"/>
   <relation href="chases"/>
 </elementType>

3.4 Attributes

Key terms used here: attribute, attribute, values, default.

After the content model, attribute declarations may occur, which are divided into attributes with enumerated or notation values, and all other kinds.

<elementType id="p1">       <!-- element
p1 ((#PCDATA|p1)*) -->
    <mixed><elt href="#p"/></mixed> 
    <attribute name='id' type='ID'/>  <!-- attlist p id
ID=#IMPLIED
                                                        exm (a|b|c) 'c'
                                                        x CDATA FIXED
'y' -->
    <attribute name='exm' type='ENUMERATION' values='a b
c'default='c'/>
    <attribute name='x' defType='FIXED' default='y'/>
</elementType>

An attribute may be given a default value. Whether it is required or optional is signaled by presence. (Presence ordinarily defaults to IMPLIED, but if omitted and there is an explicit default, presence is set to the SPECIFIED.)

Attributes with enumerated (and notation) values permit a values attribute, a space-separated list of legal values.. The values attribute is required when the type is ENUMERATION or NOTATION, else it is forbidden. In these cases, if a default is specified it must be one of the specified values.

Similar to the facility of multiple ATTLISTs, we sometimes need to have attributesDcls declared separately from the elementType they refer to. We can do this with the propertyOf element, discussed later.

3.5 The internal and external entity declaration element type: intEntityDcl and extEntityDcl

Key terms used here: entity, internal entity, external entity, notation.

This and the next two declarations cover entities in general. Entities are a powerful shorthand mechanism, similar to macros in a programming language.

<intEntityDcl name="LTG">
    <entityDef>Language Technology Group</entityDef>
</intEntityDcl>
<extEntityDcl name="dilbert">
    <notation href="#gif"/>
    <systemId
href="http://www.ltg.ed.ac.uk/~ht/dilb.gif"/>
</extEntityDcl>

Here as elsewhere, following XML, systemId must be a URL, absolute or relative, and publicId, if present, must be a Public Identifier as defined in ISO/IEC 9070:1991, Information technology -- SGML support facilities -- Registration procedures for public text owner identifiers.. If a notation is given, it must be declared (see below) and the entity will be treated as binary, i.e., not substituted directly in place of references.

<notationDcl name="gif">
    <systemId href='http://who.knows.where/'/>
</notationDcl>

3.6. The external declarations element type: extDcls

Key terms used here: external entity with declarations.

Although we allow an external entity with declarations to be included, we recommend a different declaration for schema modularization. The extDcls declaration gives a clean mechanism for importing (fragments of) other schemata. It replaces the common SGML idiom of declaring an external parameter entity and then immediately referring to it, and has the same import, namely, that the text referred to by the combination of systemId and publicId is included in the schema in place of the extDcls element, and that replacement text is then subject to the same validity constraints and interpretation as the rest of the schema.

3.7. Type Extension

Key terms used here: type (class), typeOf, extension (inheritance, subclassing), implements, extends, typeOf (genus).

Schema of all types can benefit from a subtyping mechanism: indicating that one class of object is a specialization of another more general class. For example, cat and dog both have the type pet as their more general category. To make more effective use of such classes, we introduce one new schema attribute, which can be used to declare explicitly that an element type is a subclass of another: extends:

<xml:schema>
  <elementType id="animalFriends" >
    <elt href="#pet" occurs="PLUS" />
  </elementType>

  <elementType id="pet" >
    <any/>
  </elementType>

  <elementType id="cat" extends="#pet"/>

  <elementType id="dog"  extends="#pet"/>

</xml:schema>

This schema says that the animalFriends element class can contain one or more elements from the pet class, such as a cat or a dog. Also, that each cat and dog instance is a pet (that is, any cat is semantically a pet, and any valid cat is also a valid pet). So the following data is now valid under this schema:

<animalFriends>
  <cat/>
  <dog/>
  <cat/>
</animalFriends>

Type Extension

It is frequently necessary to add new attributes to a subclass. This requires no extra machinery, because XML already permits multiple attribute list declarations, which cumulatively add attributes to element types. So each subclass may easily add any new attributes desired, as shown here:

<elementType id="dog"
extends="#pet"/>
  <attribute name="age"/>
</elementType>

If the super type has content model, (attributes, etc.) these are inherited, that is, they are also declared implicitly for the derived class. In the following example, we give an owner attribute to pet. This are inherited, so both cat and dog now also now have an owner attribute..

<xml:schema>
  <elementType id="animalFriends" >
    <elt href="#pet" occurs="PLUS" />
  </elementType>

  <elementType id="pet">
    <any/>
    <attribute id='name'/>
    <attribute id='owner'/>
  </elementType>

  <elementType id="cat" extends="#pet"/>
    <elt href='#kittens'/>
    <attribute id='lives' type='NMTOKEN'/>
  </elementType>

  <elementType id="dog" extends="#pet"/>
    <elt href='#puppies'/>
    <attribute id='breed'/>
  </elementType>
<xml:schema>

This schema says that the animalFriends element class can contain one or more pet elements. Because cat and dog are subtypes of pet, they can occur as well. So the following instance fragment is now valid under this schema:

<animalFriends>
  <cat name="Fluffy" lives='9'/>
  <pet name="Diego"/>
  <dog name="Gromit" owner='Wallace' breed='mutt'/>
</animalFriends>

Additional relations can also be added, but only be added if the content model of the superType consists of a single list of optional, repeatable element types.

When defining a derived element class, one can also override existing attributes and relations. The following example adds a Height relation and overrides the favoriteFood relation, giving it a default value of "Fish." (We also do something fancy here. Making this overridden element itself have its super type favoriteFood ensures that the derived element is in all other respects identical.)

<relationType id="height">
  <any/>
</relationType>

<relationType id="#favoriteCatFood"
extends="#favoriteFood"/>

<elementType id="cat" extends="#pet"/>
  <relation href="#height"/>
  <relation href="#favoriteCatFood"
default="Fish"/>
</elementType>

Schema Extension

We can also use subtyping to extend an existing schema without editing it. Suppose that we cannot edit the schema defining pet, cat or dog, but want to use elements with those names and semantics in our document. The following adds the "eyeColor" property to cat.

<relationType id="eyeColor"
extends="http://whereever.org/#eyeColor">
    <pcdata/>
</relationType>

<elementType id="cat"
extends="http://whereever.org/#cat"/>
  <relation href="#eyeColor"/>
</elementType>

The rules for allowable subtyping must enforce certain constraints, which are in principle that a subtype can have additional relations and attributes (provided this is consistent with the super type's content model, but never fewer) and can add restrictions (but never relax them). In practice, this principle leads to rules such as that default values can be added if there are none, changed, or converted to FIXED if DEFAULT.

Implements

Subtyping as we have described it here is actually a combination of two effects: First, we assert that an element of one type is also of another (as in a cat is a pet).

Second, we achieve economies and maintainability in the declarations to make sure that the first is true. That is, the derived element class is automatically provided with all the properties of the super type. Sometimes it is valuable to have the first effect without the second. (This is equivalent to the Java implements facility.) We indicate this by using the implements element, as in

<relationType id="favoriteFood" >
  <mixed/>
</relationType>

<relationType id="weight" >
  <mixed/>
</relationType>

<elementType id="cat" >
  <implements href="http://whereever.org/#pet" />
  <attribute name="name"/>
  <relation href="#favoriteFood" />
  <relation href="#weight" />
</elementType>

This has no effect on the attributes or relations of instances of cat, but asserts in the schema that every cat is also a pet (that is, any cat is semantically a pet, and any valid cat is also a valid pet).

Relation of Type Extension to Parameter Entities

Sophisticated DTDs often make complex use of parameter entities in an attempt to consolidate common structures in one, reusable place. Such parameter entities often represent implicit classes.

The need is real, but the approach often leads to obscurity, and reduced maintainability. Further, expansion of entities loses all connection with their source: once expanded, the fact that some set of element types was a co-declared set, re-used in multiple places, is lost.

3.8 Lexical Data Types

Information such as dates and numbers is often expressed in a format that requires some further parsing. For example, the same date can be written "October 22, 1954" or "19541022". (And from what I've seen, about 300 other ways.) The lextype attribute discriminates formats. Appearing on instance elements, it describes the format of the remainder of the element. The value of the lextype attribute is always by reference to a URI identifying the parsing rules. XML-Data should define a small number of these. We propose NUMBER, INTEGER, REAL and DATE.ISO8061.

<birthday
lextype="DATE.ISO8061">19541022</birthday>

These are declared in the schema as follows:

<relationType id="birthday">
  <attribute name="lextype"
default="DATE.ISO8061"
presence="fixed"/>
</relationType>

When giving the lexical type of an attribute in the schema, lextypeIs is used, as in:

<attribute name="price"
presence="REQUIRED"
lextypeIs="number"/>

Some patterns will indicate that several properties or attributes should be used in combination to arrive at a value. For example, a custom pattern could indicate a date expressed as the following:

<relationType id="birthday">
  <attribute name="lextype"
default="DATE.ATTR-YMD"
presence="specified"/>
</relationType>
...
<birthday year="1954"
month="10"
day="22" >

3.9. Basic Semantic Data Types

We need to define here a small number of basic types and their hierarchy, corresponding to simple data types such as Number and Date. (Dates are a subtype of numbers.)

We also need to define the expression of each of the basic Java and SQL data types in terms of these basic ones, plus additional properties giving units, precision, min, max, default pattern, and other properties. For example, an INTEGER typically is a number a certain min and max property values. Note that units should be an element type with possible structure, so that things like "miles/hours" or "feet/(sec*sec)" can be represented and used for automatic conversions.

4. Standard Vocabulary

We expect standard libraries of vocabulary to be developed to capture common semantic used in vertical applications and particularly in industry and application domains. Dublin Core and CDF are two examples of such standard libraries.

5. Relations to other proposed standards

The W3C site at http://www.w3.org/PICS/Member/NG contains links to several related papers, including Ora Lassila's PICS-NG document, Renato Ianella's small PICS extension proposal, CDF, MCF in XML, the Web Collections using XML proposal. Specific notes on some of these follow:

5.1 XML-LINK

All relations use href in a manner consistent with XML-LINK working draft dated April 6, 1997 (the most recent as of the time of this writing). XML-Links are a type of relation (with extra attributes, elements, and semantics indicating traversal).

5.2 PICS-NG

PICS-NG Metadata Model and Label Syntax describes a set of requirements for structured data to be used on the Internet. XML-Data is an application of XML concepts to those requirements.

5.3 CDF

The Channel Definition Format (CDF) is a natural application of XML-Data and is fully compatible with the syntax and the ideas presented in this document. Its format is a validatable grammar given a proper schema. The existing use of href in CDF is consistent with XML-LINK and XML-Data usage. CDF defines a number of basic element types that would be appropriate for a standard library.

5.4 MCF in XML

MCF in XML has two principal components: The ability to represent a "directed labeled graph" and also a set of predefined element types. The first of these is effected by a convention on use of the href attribute (the same convention used in XML-Data relations, with the same effect). Of the second, some element types are genuinely necessary to represent schemata and a type system (these are also present in XML-Data) while others would be appropriate for a standard library.

XML-Data has a number of features not in MCF:

  • Principally, XML-Data permits tree structures in cases when MCF only permits a graph. (MCF requires that the target of all relations must be out-of-line when it is an element. XML-Data allows in-line targets.)
  • XML-Data hrefs are explicitly URIs. (Though MCF units can be URIs, it is not clear from the current document when they are and when they are not.)
  • Finally, names in XML-Data were chosen for more compatibility with existing XML usage (or at least that is the intention).
  • XML-Data schemata can represent all the information in an XML DTD, while it is not clear that MCF can do this.
  • XML-Data has additional capabilities for expressing relationships in the schema (relation, relationType, extends, implements).
  • XML-Data proposes lextypes as a basic element type, a feature not discussed in MCF.

This chart tabulates the MCF "bootstrap" element types and describes their equivalence in XML-Data

Category
"elementType" in XML-Data.
typeOf
"typeOf" relation in XML-Data. Also,"extends" and "implements" in XML-Data assert the relationship in the schema.
Unit
"href" in XML-Data.
domain
"propertyOf" in XML-Data.
range
"range" in XML-Data. This gives the allowed type of the target of a property.
superType
This may correspond to "implements" XML Data. However the MCF document is not clear on this point.
Property
This corresponds to the abstract concept of a link class expressed in schemata by relation and relationType..
FunctionalProperty
This appears to be a relation with occurs = OPTIONAL or REQUIRED (that is, occurs at most once).
mutuallyDisjoint
This is a relationship asserted among the members of an enumeration. XML-Data does not contain a predefined propertyType for this. It could be added easily if this is useful.
parent
A generic property, whose meaning appears to be contextual. XML-Data does not contain a predefined elementType for this. It is unneeded because parentage is expressed by containment, while when out-of-line, specific meanings are conveyed by more precise relationship types such as propertyOf.
name
"name" in XML-Data. However, note that like parent, the interpretation of name in MCF seems to be contextual.
description
XML-Data does not contain a predefined elementType for this. We think that this belongs to a standard library and not in this specification.
Sequence
This is a special arc type in MCF that expresses the same fact as lexical order in XML.
ord
This is a MCF helper element type for Sequence.

Comparative examples of XML-Data and MCF in XML representation of an order for several books. (All persons in this example are assumed to be not in the document, but elsewhere.) The id attribute is on all elements representing real-world objects, in both models. In the MCF model id also appears on elements needed artificially for reference.

MCF in XML XML-Data

<ORDER id="order1">
  <SOLD-TO
unit="http:/people#person1"/>
  <SOLD-ON value="19970317"/>
  <ITEMS unit="sequence1"/>
</ORDER>

<BOOK id="book1">
  <TITLE value="Number, the Language of
Science"/>
  <AUTHOR unit="http:/people#person2"/>
</BOOK>

<SEQUENCE id="sequence1">
  <ORD UNIT="book1">
    <PRICE value="5.95"/>
  </ORD>
  <ORD UNIT="cd1">
    <PRICE value="12.95"/>
  </ORD>
  <ORD UNIT="book2">
    <PRICE value="6.95"/>
  </ORD>
  <ORD UNIT="food1">
    <PRICE value="1.50"/>
  </ORD>
</SEQUENCE>

<COFFEE id="food1">
  <size value="small"/>
  <style value="cafe macchiato"/>
</RECORD>

<RECORD id="cd1">
  <TITLE value="Rachmaninoff's Second Piano
Concerto"/>
  <ARTIST unit="http:/people#person3"/>
</RECORD>

<BOOK id="book2">
  <TITLE value="The Evolution of
Complexity"/>
  <AUTHOR unit="http:/people#person4"/>
</BOOK>
<ORDER id="order1">
  <SOLD-TO
href="http:/people#person1"/>
  <SOLD-ON value="9970317"/>
  <ITEM>
    <PRICE>5.95</PRICE>
    <BOOK id="book1">
      <TITLE >Number, the Language of
Science</TITLE>
      <AUTHOR
href="http:/people#person2"/>
    </BOOK>
  </ITEM>
  <ITEM>
    <PRICE>12.95</PRICE>
    <RECORD id="cd1">
    <TITLE >Rachmaninoff's Second Piano
Concerto</TITLE>
      <ARTIST
href="http:/people#person3"/>
    </RECORD>
  </ITEM>
  <ITEM>
    <PRICE>6.95</PRICE>
    <BOOK id="book2">
      <TITLE >The Evolution of
Complexity</TITLE>
      <AUTHOR
unit="http:/people#person4"/>
    </BOOK>  
  </ITEM>
  <ITEM>
    <PRICE>1.50</PRICE>
    <COFFEE>
      <SIZE>small</SIZE>
      <STYLE>cafe macchiato</STYLE>
    </COFFEE>
  </ITEM>
</ORDER>

 

6. Conclusion

Future applications of the Internet will focus on adding user value to information through semantic annotation. Semantics will permit information to be discovered, targeted, reused, and integrated. Not only does this make the content more usable, but it opens up opportunities for software developers to build components that exploit these semantics. Such components could include applications as prosaic as application or user logging, or as futuristic as user agents that assist in finding or organizing contents, World-Wide Web "surf buddies" that accompany a user's browsing and adding valuable or entertaining comments, or natural language query systems. Semantic annotation turns the Internet into a platform for programming powerful and valuable applications.

This proposal lays the foundation for how applications can annotate their information content. The proposal adds powerful new constructs for representing semantics, sufficiently advanced for use in artificial intelligence and natural language systems, yet retains the architecture and investment of existing XML and the efficiency of its representation.


Appendix A - The XML DTD for a schema


<!ENTITY % nodeattrs 'id ID #IMPLIED'  >
<!-- href is as per XML-LINK, but is not required unless there is
      no content -->

<!ENTITY % exattrs   'extends CDATA #IMPLIED'  >

<!ENTITY % linkattrs 'id ID #IMPLIED
                      href CDATA #IMPLIED' >

<!-- The shared content model of elementType, linkType and
relationType -->
<!-- Omitted element type same as "empty." -->
<!ENTITY % extendedmodel 'implements*,
                          (elt|group|empty|any|pcdata|mixed)?,
                          (relation|attribute)*'>

<!-- The top-level container -->
<!element schema         ((elementType|propertyOf|linkType|
                          relationType|extendType|augmentElementType|
                          intEntityDcl|extEntityDcl|
                          notationDcl|extDcls|c)*)>
<!attlist schema %nodeattrs;>

<!-- Element Type Declarations -->
<!element elementType   (%extendedmodel)>
<!-- Either name or id must be present - - absent name defaults to id
-->
<!attlist elementType %nodeattrs;
                      %exattrs;
                name    CDATA      #IMPLIED>

<!-- Element types allowed in content model -->
<!-- Note this is just short for a model group with only one elt in
it -->
<!element elt           EMPTY>
<!-- Elements can have exponents as well as groups -->
<!-- The href is required -->
<!attlist elt   %linkattrs;
                occurs     (required|optional|star|plus) 'required'>

<!-- A group in a content model, sequential or disjunctive -->
<!element group         ((group|elt)+)>
<!attlist group         %nodeattrs;
                groupType (seq|or) 'seq'
                occurs  (required|optional|plus) 'required'>

<!element any           EMPTY>
<!element empty         EMPTY>
<!element pcdata	EMPTY>

<!-- mixed content is just a flat, non-empty list of elts -->
<!-- We don't need to say anything about #pcdata, it's implied -->
<!element mixed         (elt+)>
<!attlist mixed         %nodeattrs;> 

<!-- Attributes -->
<!-- default value must be present iff presence is specified or fixed
-->
<!-- presence defaults to specified if default is present, else
implied -->
<!-- name attribute is locally unique, defaults to id if absent
-->
<!element attribute  empty>
<!attlist attribute  %linkattrs;
                name    CDATA #IMPLIED
                type
(id|idref|idrefs|entity|entities|nmtoken|nmtokens|
                         enumeration|notation|cdata) 'cdata'
                default CDATA #IMPLIED
                values NMTOKENS #IMPLIED
                presence (implied|specified|required|fixed) #IMPLIED 
                lextypeIs CDATA #IMPLIED>

<!-- Relations - - relationTypes are pointed to from relations,
            just as elementTypes are pointed to from elts -->
<!element relationType  (%extendedmodel;,
                         range*)>
<!attlist relationType  %nodeattrs;
                        %exattrs;
                        name CDATA #IMPLIED >

<!element range empty >
<!attlist range %linkattrs; >

<!element relation  EMPTY>
<!attlist relation  %linkattrs;
                    default CDATA #IMPLIED
                    occurs (required|optional|star|plus) 'optional'>

<!-- For adding attributes to existing element types -->
<!element propertyOf    EMPTY>
<!attlist propertyOf    href CDATA #REQUIRED>

<!element augmentElementType
((relation|attribute)*)>
<!attlist augmentElementType %linkattrs;
                             %exattrs;>

<!-- Shorthand for simple XML-LINKs -->
<!element linkType (%extendedmodel;)>
<!attlist linkType %nodeattrs;
                   %exattrs;
                   name CDATA #IMPLIED
                   role CDATA #IMPLIED
                   title CDATA #IMPLIED
                   show (embed|replace|new) #IMPLIED
                   actuate (auto|user) #IMPLIED
                   behaviour CDATA #IMPLIED >

<!element implements EMPTY>
<!attlist implements href CDATA #REQUIRED>

<!-- Entity Declarations -->
<!-- Note as this is written only external entities
      can have structure without escaping it -->
<!-- Name defaults to id if absent -->
<!element intEntityDcl     (#PCDATA)>
<!attlist intEntityDcl %nodeattrs;
                name    CDATA #IMPLIED>

<!-- The entity will be treated as binary if a notation is present
-->
<!-- systemID and publicId (if present) must have the required syntax
-->
<!element extEntityDcl    ( systemId, publicId?)>
<!attlist extEntityDcl %nodeattrs;
                name    CDATA #IMPLIED
		notation CDATA #IMPLIED>

<!-- Pointers for above -->
<!element systemID      EMPTY>
<!attlist systemID      %linkattrs;>
<!-- Must be empty if href is used -->
<!element publicID      (#PCDATA) >
<!attlist publicID      %linkattrs;>

<!-- Notation Declarations -->
<!-- systemID and publicId (if present) must have the required syntax
-->
<!element notationDcl        (systemId, publicId?)>
<!attlist notationDcl   %linkattrs;
                name    CDATA #IMPLIED>

<!-- External entity with declarations to be included -->
<!-- systemID and publicId (if present) must have the required syntax
-->
<!element extDcls       empty>
<!attlist extDcls
                systemId CDATA #REQUIRED
                publicId CDATA #IMPLIED>

<!-- Namespace Declarations -->
<!-- systemID and publicId (if present) must have the required syntax
-->
<!element namespaceDcl  EMPTY>
<!attlist namespaceDcl  %linkattrs;
                name    CDATA #IMPLIED>

xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lex at www.copsol.com Mon Jun 23 15:19:57 1997 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 16:58:03 2004 Subject: XML Property Set In-Reply-To: <01BC7EC3.70CDF6C0.jtauber@jtauber.com> from "James K. Tauber" at Jun 22, 97 04:11:02 am Message-ID: <199706231317.IAA16211@copsol.com> > > In my grove-illiterate opinion, yes! The PropertySet is a sword of Damocles > > hanging over these discussions. It's clear that we can't have all 70+ > > properties. IF (and I hope it's not a big IF) we can agree on a subset > > of the property set then we don't have this problem dissipating the > > discussion every time we get close :-) > > It shouldn't be a big IF at all. Deciding what to rip out isn't too difficult. > The only problem lies in agreeing on how to do the additional classes (like > XMLDECL) needed and how (or if) the properties should be modularised. > > > James Clark came up with a grove subset about 3 months back (have a look in > > March xml-dev) in response to one of my typical blunderings for information. > > I'll go back and check that. JamesC would be in a MUCH better position to write > an XML property set than me! Well, I'm going to make an offer. I've spent the better part of a year working on and with a Java-based API for groves. I am certain that I can create an interface from this (if not take it wholesale) for the XAPI and groves. So, my offer is that I can come up with a draft and "the James's" and the lot can validate if I am on the right track. I am fairly certain that at this point in time we should not say "maybe later" to groves. We should standardize parser access, event interfaces, and groves at the same time. We have enough developers with experience in all of these. An API architecture that I propose is: |---------------| | Grove API | |---------------| | Grove Builder | | API | |----------------------------------| | XML Event API | |----------------------------------| | XML Parser API | |----------------------------------| They are described as follows: XML Parser API: Provides interfaces to instantiation and use of XML parsers such that a new XML parser can be integrated with existing application potentially with their knowledge. This might allow a user to configure an application with (in Java) the class name of the XML Factory or whatever. XML Event API: Provides an interface to allow XML parsers to deliver events to arbitrary applications. My suggestion here is that we consider two kinds of APIs or at least constructs. First, there is the idea of the "document string" which is the exact character for character representation of each construct. Second, is a semantic event like "start element". Both are useful depending on what one is doing. Grove Builder API: This API bridges the gap between the event API and a grove. Essentially, the algorithm for building a grove is most likely the same regardless of the implementation technology used to create the grove. Hence, a standard event handler could be defined as well as an interface to allow different grove implementations to be used (for example, a JDBC grove and an in memory grove). Grove API: This API, obviously, provides access to XML groves! Again, my suggestion is that we take advantage of interfaces in XAPI. Interfaces will allow us to mix inheritance hierarchies in the above four APIs. Now, I feel strongly that above APIs or what they become are developed together. They can certain affect how each other is designed. If we have these four APIs, we have the fundamental building blocks for all kinds of XML applications--both simple and complex. In addition, we have the basic infrastructure for DSSSL! (Ah, you can see my motivation now!) ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtauber at jtauber.com Mon Jun 23 18:49:26 1997 From: jtauber at jtauber.com (James K. Tauber) Date: Mon Jun 7 16:58:03 2004 Subject: XML Property Set Message-ID: <01BC8038.C146D560.jtauber@jtauber.com> On Monday, June 23, 1997 6:18 AM, Alex Milowski [SMTP:lex@www.copsol.com] wrote: > Well, I'm going to make an offer. I've spent the better part of a year > working on and with a Java-based API for groves. I am certain that I can > create an interface from this (if not take it wholesale) for the XAPI and > groves. So, my offer is that I can come up with a draft and "the James's" > and the lot can validate if I am on the right track. Sounds good. JamesC's post from March pretty much outlines properties and classes for a document instance which leaves prolog and also the sort of nodes that would be necessary (for editors, etc) to ensure that a processor can output character-for-character what was input. JamesT xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From k.grimes at liant.com Mon Jun 23 20:12:35 1997 From: k.grimes at liant.com (Kevin Grimes) Date: Mon Jun 7 16:58:03 2004 Subject: XAPI Message-ID: <97Jun23.101149edt.32261-1@stelmo.liant.com> From: Kevin Grimes@LIANT on 06/23/97 02:14 PM May I suggest that the XML API be expressed in language neutral IDL rather than Java. I believe the main impact this would have on current interfaces/implementations would be to the member function that actually loads/processes the document--you'd probably want to replace the Java InputStream or URL parameter with a String (BSTR in Microsoft's IDL)--as in the following lines from my xml.idl... HRESULT processDocument([in] BSTR filename); HRESULT processDocumentURL([in] BSTR url); ...and let processDocument create the InputStream or whatever. I currently have APIs defined by IDL, with the XML processor implemented in Java, and clients written in C++ and Java. The C++ client-Java processor combination uses COM and the Microsoft Java Virtual Machine, but the Java client-Java processor pair runs under either Sun or Microsoft (same XML processor). The client can use an IGrove or IXMLApplication (callback) interface or both. I haven't attempted events yet, but believe this will require making the XML processor into a Java Bean. Regards, Kevin Grimes Liant Software (k.grimes@liant.com) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Mon Jun 23 21:46:29 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:03 2004 Subject: XML Property Set Message-ID: <8420@ursus.demon.co.uk> In message <199706231317.IAA16211@copsol.com> lex@www.copsol.com (Alex Milowski) writes: [...] > > Well, I'm going to make an offer. I've spent the better part of a year > working on and with a Java-based API for groves. I am certain that I can > create an interface from this (if not take it wholesale) for the XAPI and > groves. So, my offer is that I can come up with a draft and "the James's" > and the lot can validate if I am on the right track. I think this is an excellent way forward and many thanks to all who are contributing to this effort. I am prepared to make the effort to understand it and find ways of interfacing it with JUMBO. > > I am fairly certain that at this point in time we should not say "maybe later" > to groves. We should standardize parser access, event interfaces, and groves > at the same time. We have enough developers with experience in all of these. > Just to check I have it right... > An API architecture that I propose is: > > |---------------| > | Grove API | <<< I assume this has similarities to JamesClark's > |---------------| ReallySimple API ... > | Grove Builder | > | API | <<< different memory/storage models are implemented here. > |----------------------------------| > | XML Event API | << presumably fairly similar to NXP? > |----------------------------------| > | XML Parser API | << Corresponds to John Tigue's analysis? > |----------------------------------| > [...] > Now, I feel strongly that above APIs or what they become are developed > together. They can certain affect how each other is designed. I'd agree with this. Can they be developed rapidly or in parallel so that there aren't bottlenecks/hold-ups? > If we have these four APIs, we have the fundamental building blocks for all > kinds of XML applications--both simple and complex. In addition, we have > the basic infrastructure for DSSSL! (Ah, you can see my motivation now!) If I get this right it makes the DSSSL approach and the JavaClass-per-Element (as in JUMBO), very closely connected. The Grove API serves both purposes? If so, that looks very exciting. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From paul at arbortext.com Mon Jun 23 22:20:40 1997 From: paul at arbortext.com (Paul Grosso) Date: Mon Jun 7 16:58:03 2004 Subject: XML internal text entity replacement text Message-ID: <3.0.32.19970623151756.00686b44@pophost.arbortext.com> In the XML spec (31-Mar-97), the paragraph in section 1.5 just prior to production [9] says: Literal data is any quoted string containing neither a left angle bracket nor the quotation mark used as a delimiter for that string. It may contain entity and character references. Literals are used for specifying the replacement text of internal entities (EntityValue).... Production [9] itself, which defines EntityValue doesn't forbid "<". The paragraph following productions 9-15 talks about parameter entity and character refs, but not about element markup. Section 4.3 [production 64] uses EntityValue, and section 4.3.1 talks about internal entities, but says nothing about whether the replacement text can contain elements. I don't remember hearing that internal entities couldn't contain element markup, and appendix A doesn't list it as a difference from SGML, so I suspect the production is correct and the wording that says "literal data can't have '<' and EntityValue is literal data" is wrong. Can anyone provide confirmation or denial of my assumption that the above quoted text is wrong in suggesting that internal text entity replacement text cannot contain element markup? xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lex at www.copsol.com Mon Jun 23 22:27:12 1997 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 16:58:03 2004 Subject: XML Property Set In-Reply-To: <8420@ursus.demon.co.uk> from "Peter Murray-Rust" at Jun 23, 97 08:04:52 pm Message-ID: <199706232025.PAA16524@copsol.com> > Just to check I have it right... > > > An API architecture that I propose is: > > > > |---------------| > > | Grove API | <<< I assume this has similarities to JamesClark's > > |---------------| ReallySimple API ... > > | Grove Builder | > > | API | <<< different memory/storage models are implemented here. > > |----------------------------------| > > | XML Event API | << presumably fairly similar to NXP? > > |----------------------------------| > > | XML Parser API | << Corresponds to John Tigue's analysis? > > |----------------------------------| > > > [...] > > Now, I feel strongly that above APIs or what they become are developed > > together. They can certain affect how each other is designed. > > I'd agree with this. Can they be developed rapidly or in > parallel so that there aren't bottlenecks/hold-ups? Yes, I believe that they can be developed in parallel. I for one can make the commitment that we can develop a reference implementation of groves in Java given that the Event API is standardized across XML Java parsers. > > If we have these four APIs, we have the fundamental building blocks for all > > kinds of XML applications--both simple and complex. In addition, we have > > the basic infrastructure for DSSSL! (Ah, you can see my motivation now!) > > If I get this right it makes the DSSSL approach and the JavaClass-per-Element > (as in JUMBO), very closely connected. The Grove API serves both purposes? > If so, that looks very exciting. Well, almost. The Grove API is *one* component of the infrastructure necessary for a DSSSL system. In some senses, it is the most important. In the DSSSLTK I opted for several APIs--one for groves, one for flow objects and flow object trees, and one for the DSSSL engine. In the next version there will be one for the parser implementation as well. Groves allows us to deliver SDQL and DSSSL engines with minimal effort, but there is still more to standardize. For example, a simple DSSSL engine API might be: public interface Processor { SGMLDocument transform(SGMLDocument transformation,SGMLDocument doc); FlowObject format(SGMLDocument style,SGMLDocument doc); } The DSSSLTK is a little more complex then this because it provides the ability to "compile" transformations and stylesheets into Transformation and StyleSheet objects. My main point was that other APIs (DSSSL Engine for example) are *users* of the Parser and Grove APIs. Hence, these should be able to be developed first independant of the others. ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lex at www.copsol.com Mon Jun 23 22:33:37 1997 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 16:58:03 2004 Subject: XAPI In-Reply-To: <97Jun23.101149edt.32261-1@stelmo.liant.com> from "Kevin Grimes" at Jun 23, 97 02:14:27 pm Message-ID: <199706232031.PAA16544@copsol.com> > > From: Kevin Grimes@LIANT on 06/23/97 02:14 PM > > May I suggest that the XML API be expressed in language neutral IDL rather > than Java. I believe the main impact this would have on current > interfaces/implementations would be to the member function that actually > loads/processes the document--you'd probably want to replace the Java > InputStream or URL parameter with a String (BSTR in Microsoft's IDL)--as in > the following lines from my xml.idl... > I like the idea of using IDL. I have to confess that I haven't had much of an opportunity to use it (although I would have liked to have). So, where are the IDL "experts" that we can bring onboard to get that part correct? ...hey, I still have to support C++ no matter how much I like Java! ;-) ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jwrobie at mindspring.com Mon Jun 23 23:06:37 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:03 2004 Subject: XAPI Message-ID: <1.5.4.32.19970623210523.008c6594@pop.mindspring.com> At 03:31 PM 6/23/97 -0500, Alex Milowski wrote: >> >> From: Kevin Grimes@LIANT on 06/23/97 02:14 PM >> >> May I suggest that the XML API be expressed in language neutral IDL rather >> than Java. I believe the main impact this would have on current >> interfaces/implementations would be to the member function that actually >> loads/processes the document--you'd probably want to replace the Java >> InputStream or URL parameter with a String (BSTR in Microsoft's IDL)--as in >> the following lines from my xml.idl... > >I like the idea of using IDL. I have to confess that I haven't had much >of an opportunity to use it (although I would have liked to have). So, where >are the IDL "experts" that we can bring onboard to get that part correct? Hmmmm...I just talked to one of our IDL experts, who wasn't convinced that this would be a helpful direction. Is there really an advantage to defining it in IDL first? The IDL could be created after the specification is finished in Java, and the Java-based specification is probably easier to create, understand, and test. I *like* making things language independent, but at this stage, I'm leery of adding complexity that doesn't add any new conceptual power. A Java-based specification can be translated into IDL later, and until the specification has actually been *implemented* in more than one language, the IDL doesn't buy you much. Jonathan *************************************************************************** Jonathan Robie jwrobie@mindspring.com http://www.mindspring.com/~jwrobie POET Software, 3207 Gibson Road, Durham, N.C., 27703 http://www.poet.com *************************************************************************** xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Mon Jun 23 23:24:11 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:03 2004 Subject: Documentation for DTDs Message-ID: <8438@ursus.demon.co.uk> In message <199706231819.LAA28781@mehitabel.eng.sun.com> Murray Altheim writes: [...] > > I would imagine that a small hack to the perl code would allow for a minor > translation to XML: > > >

> Description of identifier here. >

> >

> Description of identifier here. >

> ... > > I haven't checked to see what other changes might be necessary in the > change from full SGML to XML DTDs, but I suspect these might be minor. > Since this is free, functional, and suits the purpose, I'd say go > with the leader... I would also agree - I don't know if Earl Hood reads this list - were you suggesting we ask him to think about the problem? P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From k.grimes at liant.com Tue Jun 24 00:26:27 1997 From: k.grimes at liant.com (Kevin Grimes) Date: Mon Jun 7 16:58:03 2004 Subject: XAPI Message-ID: <97Jun23.105823edt.32261-1@stelmo.liant.com> From: Kevin Grimes@LIANT on 06/23/97 06:30 PM Limiting the types used in the Java XAPI to basic Java types like int, boolean, String, plus the Java interfaces that you've actually implemented in Java (IElement, IXMLProcessor etc.) will make the job of porting the Java XAPI to IDL easier. InputStream was the first thing I tripped over when I translated my Java APIs into IDL--to avoid defining an InputStream interface in IDL I rewrote the member function processDocument to use String. Here is my version of xml.idl. This is my first use of IDL, and XML, and Java for that matter, so please let me know if I'm doing something stupid. I compile this with the MIDL compiler that comes with Microsoft's Visual Studio. We have C++ and Java clients that implement the IXMLApplication callbacks or use the IGrove and INode interfaces to traverse the parse tree. Regards, Kevin (k.grimes@liant.com) // xml.idl [ uuid (14859300-E953-11d0-B96A-00A024f2C5E0), version (0.0), helpstring("LXMLProcessor Type Library") ] library LXMLProcessor { importlib("stdole32.tlb"); interface INodeList; [ object, uuid (14859302-E953-11d0-B96A-00A024f2C5E0), helpstring("INode Interface"), ] interface INode : IDispatch { HRESULT addChild([in] INode* child); HRESULT getChild([in] int i, [out, retval] INode** child); HRESULT getChildren([out, retval] INodeList** children); HRESULT getNumberOfChildren([out, retval] int* count); HRESULT getParent([out, retval] INode** parent); HRESULT setParent([in] INode* parent); } [ object, uuid (14859303-E953-11d0-B96A-00A024f2C5E0), helpstring("INodeList Interface"), ] interface INodeList : IDispatch { HRESULT addItem([in] INode* item); HRESULT getCount([out, retval] int* count); HRESULT getItem([in] int i, [out, retval] INode** item); } [ object, uuid (14859304-E953-11d0-B96A-00A024f2C5E0), helpstring("ICharacterData Interface"), ] interface ICharacterData : IDispatch { HRESULT toString([out, retval] BSTR* cdata); } [ object, uuid (14859305-E953-11d0-B96A-00A024f2C5E0), helpstring("IElement Interface"), ] interface IElement : IDispatch { HRESULT addAttribute([in] BSTR name, [in] BSTR value); HRESULT getAttributeValue([in] BSTR name, [out, retval] BSTR* value); HRESULT getId([out, retval] BSTR* id); HRESULT getType([out, retval] BSTR* type); HRESULT isEmpty([out, retval] VARIANT_BOOL* empty); HRESULT setId([in] BSTR id); HRESULT setIsEmpty(); HRESULT toString([out, retval] BSTR* retval); } [ object, uuid (14859306-E953-11d0-B96A-00A024f2C5E0), helpstring("IGrove Interface"), ] interface IGrove : IDispatch { HRESULT getDocumentRoot([out, retval] INode** root); HRESULT setDocumentRoot([in] INode* root); } [ object, uuid (14859307-E953-11d0-B96A-00A024f2C5E0), helpstring("IXMLApplication Interface"), ] interface IXMLApplication : IDispatch { HRESULT doBinaryEntity([in] BSTR systemId, [in] BSTR notationName, [in] BSTR notationSystemId); HRESULT doCharacterData([in] BSTR data); HRESULT doEmptyElement([in] IElement* e); HRESULT doEndOfDocument([in] BSTR docname); HRESULT doEndTag([in] IElement* e); HRESULT doFatalError([in] BSTR error); HRESULT doProcessingInstruction([in] BSTR pi); HRESULT doReportableError([in] BSTR error); HRESULT doStartOfDocument([in] BSTR docname); HRESULT doStartTag([in] IElement* e); HRESULT doWarning([in] BSTR warning); } [ object, uuid (14859308-E953-11d0-B96A-00A024f2C5E0), helpstring("IXMLProcessor Interface"), ] interface IXMLProcessor : IDispatch { HRESULT buildParseTree([in] VARIANT_BOOL build); HRESULT checkValidity([in] VARIANT_BOOL check); HRESULT getGrove([out, retval] IGrove** grove); HRESULT processExternalEntities([in] VARIANT_BOOL process); HRESULT processDocument([in] BSTR filename); HRESULT processDocumentURL([in] BSTR spec); HRESULT setApplication([in] IXMLApplication* app); } [ uuid (1485930A-E953-11d0-B96A-00A024f2C5E0), helpstring("Node Class"), appobject ] coclass Node { interface INode; } [ uuid (1485930B-E953-11d0-B96A-00A024f2C5E0), helpstring("NodeList Class"), appobject ] coclass NodeList { interface INodeList; } [ uuid (1485930C-E953-11d0-B96A-00A024f2C5E0), helpstring("CharacterData Class"), appobject ] coclass CharacterData { interface INode; interface ICharacterData; } [ uuid (1485930D-E953-11d0-B96A-00A024f2C5E0), helpstring("Element Class"), appobject ] coclass Element { interface INode; interface IElement; } [ uuid (1485930E-E953-11d0-B96A-00A024f2C5E0), helpstring("Grove Class"), appobject ] coclass Grove { interface IGrove; } [ uuid (1485930F-E953-11d0-B96A-00A024f2C5E0), helpstring("XMLApplication Class"), appobject ] coclass XMLApplication { interface IXMLApplication; } [ uuid (14859310-E953-11d0-B96A-00A024f2C5E0), helpstring("XMLProcessor Class"), appobject ] coclass XMLProcessor { interface IXMLProcessor; } }; xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lauren at sqwest.bc.ca Tue Jun 24 00:46:47 1997 From: lauren at sqwest.bc.ca (Lauren Wood) Date: Mon Jun 7 16:58:03 2004 Subject: XML API and the DOM Message-ID: I thought I would post some clarification of the DOM work here, since it's an acronym that's been mentioned a couple of times. W3C has a working group called the Document Object Model Working Group. See http://www.w3.org/MarkUp/DOM/. To quote the activity statement on the User Interface domain page on the W3C site: (http://www.w3.org/UI/) "The Document Object Model is a platform- and language-neutral interface that will allow programs and scripts to dynamically access and update the content, structure and style of documents. The document can be further processed and the results of that processing can be incorporated back into the presented page." The name of the DOM group is a little misleading, since we are trying to standardize an interface rather than the underlying model. Obviously there will be more than a little overlap between the DOM and XAPI. The DOM will be more general - it has to work with HTML documents as well as XML documents, and it has to be platform- and language-independent. We are writing the interface in IDL, and will also do language bindings to Java and probably C++ and JavaScript. Most of the people on the DOM group are also on the xml-dev mailing list, as we want to be sure that whatever API is decided on here flows into the DOM specification. The full DOM specification will contain a lot more and take a lot longer than the basic XAPI being talked about here. The first draft of level one is due to be ready by the end of August and I will post the URL here when it is ready. cheers, Lauren --- Lauren Wood, SoftQuad, Inc. (posting as chair of the W3C DOM WG) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From sbb at Eng.Sun.COM Tue Jun 24 04:19:14 1997 From: sbb at Eng.Sun.COM (Steve Byrne) Date: Mon Jun 7 16:58:03 2004 Subject: XAPI In-Reply-To: <97Jun23.105823edt.32261-1@stelmo.liant.com> References: <97Jun23.105823edt.32261-1@stelmo.liant.com> Message-ID: <199706240218.TAA07038@javinator.eng.sun.com> Kevin, Thank you for sending out your initial interface definition. Unfortunately, I believe that the IDL that people are talking about using as an interface definition standard is OMG's IDL, which is different from the proprietary Microsoft IDL MIDL. If you retrieve the CORBA 2.0 specification from www.omg.org, you'll find that Chapter 3 defines the OMG IDL, and this is, I believe, what you should be using to define your interfaces with. Steve xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Tue Jun 24 10:55:30 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:58:03 2004 Subject: XML internal text entity replacement text In-Reply-To: Paul Grosso's message of Mon, 23 Jun 1997 15:19:09 -0500 References: <3.0.32.19970623151756.00686b44@pophost.arbortext.com> Message-ID: <559.199706240855@grogan.cogsci.ed.ac.uk> Paul asks: > [Question about '<' in EntityValue] I believe the text is out of sync. with the productions, and the productions are correct, and general entities can contain markup (or rather, characters which will be treated as markup in the right context). That's the way we've implemented it in LT XML. ht xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From digitome at iol.ie Tue Jun 24 12:36:14 1997 From: digitome at iol.ie (Sean Mc Grath) Date: Mon Jun 7 16:58:03 2004 Subject: XAPI Message-ID: <199706241036.LAA19984@GPO.iol.ie> [Jonathan Robie] > >Is there really an advantage to defining it in IDL first? The IDL could be >created after the specification is finished in Java, and the Java-based >specification is probably easier to create, understand, and test. I *like* >making things language independent, but at this stage, I'm leery of adding >complexity that doesn't add any new conceptual power. Hmmm. IDL == Language independent spec of an API....might this be better approached as an XML application? I.e. a DTD for the XML API spec. A doc conforming to that spec. that can be down-translated to Java, C++, Python and (gasp) IDL! APIs are stuctured docs. Let's practice what we are preaching and capture the API in XML. Unless there are compelling reasons why this does not make sense. Just thinking out loud and looking forward to a discussion on the issue. Sean xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From akirkpatrick at ims-global.com Tue Jun 24 15:05:56 1997 From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com) Date: Mon Jun 7 16:58:03 2004 Subject: DTD invented by Microsoft?! Message-ID: The following extract is from the MS white paper on XML. Are they describing what we all understand as the DTD here or is it something else? If the former, what are Microsoft doing taking credit for it, I wonder... > Microsoft has proposed a "Document Type Definition" (DTD) syntax for expressing the schema for an > XML document directly within XML itself, allowing XML data to describe its own structure. Expressing > schemata within XML adds great power to the XML format because it makes it possible for software > examining certain data to understand its structure without earlier knowledge about the data or its > meaning. The section on white-space also seems oversimplified at best. BTW: I don't have anything against Microsoft, even if their developments do seem to be all over the place at the moment. Alfie. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Tue Jun 24 15:40:18 1997 From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula) Date: Mon Jun 7 16:58:03 2004 Subject: XAPI References: <1.5.4.32.19970623210523.008c6594@pop.mindspring.com> Message-ID: <33B0135D.6163@edu.uni-klu.ac.at> Jonathan Robie wrote: > Hmmmm...I just talked to one of our IDL experts, who wasn't convinced that > this would be a helpful direction. > > Is there really an advantage to defining it in IDL first? The IDL could be > created after the specification is finished in Java, and the Java-based > specification is probably easier to create, understand, and test. I would agree with this statement. IDL is fine but not really necessary right now. I personally would be too much afraid that we loose momentum if we introduce yet another obstacle - meaning having to get a full understanding of IDL. If there is somebody here that is willing to take our material and transform it into an IDL spec., that'd be great, though. Nevertheless, let's talk Java and then we proceed from there. IMHO :) -- Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Tue Jun 24 15:40:37 1997 From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula) Date: Mon Jun 7 16:58:04 2004 Subject: XML Property Set References: <199706232025.PAA16524@copsol.com> Message-ID: <33B02038.495A@edu.uni-klu.ac.at> |---------------| | Grove API | <<< I assume this has similarities to JamesClark's |---------------| ReallySimple API ... | Grove Builder | | API | <<< different memory/storage models are implemented here. |----------------------------------| | XML Event API | << presumably fairly similar to NXP? |----------------------------------| | XML Parser API | << Corresponds to John Tigue's analysis? |----------------------------------| I guess the two bottom layers and the connection between Event API and Grove Builder API are my call. Let me ask you : Should we go for a pure event oriented API, like it is now implemented in NXP (and leave it up to the next layer to create the objects) or should we have creator methods that would be set in the event API like now the Esis object is set (setEsis) in the parser. These methods would be called in case of a specific event and the result of this method call would be send to the application via the event interface. For instance we would have an interface : public interface Constructors { public Element createElement(); public Attribute createAttribute(); ... } Element and Attribute etc. will probably be subclasses of Node, as per James' simple API. Node, however, should be defined very generally so that we don't *have* to think about DSSSL when want to talk about/use a node. An event-producer class conforming to the Esis(++) interface would need to implement a method : public void setCreator(Constructors constr); It would work then like : a.) parser recognises a certain tag b.) calls the appropriate creator method to create an object of class element c.) sends the created object to the next layer via the event producer An alternative to the creator methods would be to set the objects to be created via a Hashtable of Strings. Then the objects would be create via their "name". For instance an entry in the hashtable would look like : "Element" -----> "dsssl.Element" and the result would be the creation of an object of type dsssl.Element for the "event" Element. -- Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lex at www.copsol.com Tue Jun 24 16:21:04 1997 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 16:58:04 2004 Subject: XML Property Set In-Reply-To: <33B02038.495A@edu.uni-klu.ac.at> from "Norbert H. Mikula" at Jun 24, 97 12:30:00 pm Message-ID: <199706241418.JAA16855@copsol.com> > > |---------------| > | Grove API | <<< I assume this has similarities to JamesClark's > |---------------| ReallySimple API ... I'm not certain what the ReallySimple API is. I was think along the lines of the DSSSLTK dsssl.grove package. > | Grove Builder | > | API | <<< different memory/storage models are implemented > here. Yes, exactly. The Grove API is abstract and the Grove Builder API hides the exact implementation methodologies for constructing grove objects from the rest of the world. > |----------------------------------| > | XML Event API | << presumably fairly similar to > NXP? Potentially. I assume there will be some "grand convergence" on this between parsers and the needs of a grove. > |----------------------------------| > | XML Parser API | << Corresponds to John Tigue's > analysis? > |----------------------------------| Yes, and maybe more infrastructure if we are up to it. IMHO, this API should provide the ability for different parsers to be *configured* for use within an application. Hence, this should be an abstract component that is complete enough to allow most (if not all) applications to not have to know the implementation details. We could use a factory design pattern here. > I guess the two bottom layers and the connection between > Event API and Grove Builder API are my call. Potentially. I would guess that there could be a reference implementation of an event handler that "knows" how to interface a grove builder. It is probably not true that *all* grove builders can be accessed the same. For example, in a database situation, extra work may be necessary in the connection of the events to the grove builder. > Let me ask you : > > Should we go for a pure event oriented API, like it is now > implemented in NXP (and leave it up to the next layer to create the > objects) or should we have creator methods that would be set in the > event > API like now the Esis object is set (setEsis) in the parser. These > methods > would be called in case of a specific event and the result of this > method call would be send to the application via the event interface. I think the event API is the most abstract and lowest level for a parser. In this manor, applications that do not need "grove objects" will not have to have them created within some implementation. SP, for example, has quite an extensive event-oriented API. Each event has a great deal of detail (basically, everything there is to know). It is fairly easy to access the high level semantics of these events. Low level semantics like document strings--character for character representations of the event--are a littler more work. This is a design decision that we have to make. We could have two event APIs--one for document string access and one for high level access including document string information, but that could get far to complex. One might also beg the question of why we need the document string separated out when you can get it from the high-level events. > > For instance we would have an interface : > > public interface Constructors > { > public Element createElement(); > public Attribute createAttribute(); > ... > } > > Element and Attribute etc. will probably be subclasses of Node, > as per James' simple API. Node, however, should be defined > very generally so that we don't *have* to think about > DSSSL when want to talk about/use a node. > > An event-producer class conforming to the Esis(++) interface would need > to > implement a method : > > public void setCreator(Constructors constr); > > It would work then like : > > a.) parser recognises a certain tag > b.) calls the appropriate creator method to create an > object of class element > c.) sends the created object to the next layer via > the event producer Well, the above example is similar to the GroveConstructor class in the DSSSLTK. The GroveConstructor is different in that it trys to only allow sub-node objects to be created from appropriate parents. For example, the document element can only be created by passing in the SGMLDocument node. An element can only be created by passing in the parent of the element. > > An alternative to the creator methods would be to > set the objects to be created via a Hashtable of Strings. > Then the objects would be create via their "name". > > For instance an entry in the hashtable would look > like : "Element" -----> "dsssl.Element" and the > result would be the creation of an object of type > dsssl.Element for the "event" Element. I'm not certain I understand what you mean. Can you give a more detailed example? ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eliot at isogen.com Tue Jun 24 17:05:42 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:04 2004 Subject: DTD invented by Microsoft?! Message-ID: <3.0.32.19970624095340.0077f968@swbell.net> At 02:06 PM 6/24/97 +0000, akirkpatrick@ims-global.com wrote: >> Microsoft has proposed a "Document Type Definition" (DTD) syntax for >expressing the schema for an > XML document directly within XML itself, >allowing XML data to describe its own structure. In Microsoft's defense, they have correctly used the term "document type definition", which is what the acronym "DTD" expands to, to mean the overall definition of a document type. As SGML only defines part of the total mechanism one needs to define a document type (the declarations allowed within a DOCTYPE declarations, what we are now calling "DTD declarations"), you are free to define additional formalisms for defining schemas however you want. Many people have defined "DTDs for DTDs" (including myself)--the only thing you can't do is claim to be *replacing* the declarations defined by 8879. Of course, since XML (and the WebSGML TC) allow the DOCTYPE declaration (or its contained declarations) to be omitted, there's nothing preventing the use of some alternate syntax for schema representation as an *application convention*. The XML ERB is on record as stating that while it might be useful to have a "better" syntax for DTD declarations, the definition of such is out of scope for XML, and in any case is a tar pit second only to name spaces (and thus best left to the SGML revision). Cheers, E. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From matthewg at poet.de Tue Jun 24 17:41:49 1997 From: matthewg at poet.de (Matthew Gertner) Date: Mon Jun 7 16:58:04 2004 Subject: XAPI Message-ID: <01BC80C5.7984F300@matthewg@poet.de> > [Jonathan Robie] > > > >Is there really an advantage to defining it in IDL first? The IDL could be > >created after the specification is finished in Java, and the Java-based > >specification is probably easier to create, understand, and test. I *like* > >making things language independent, but at this stage, I'm leery of adding > >complexity that doesn't add any new conceptual power. > > Hmmm. IDL == Language independent spec of an API....might this be better > approached > as an XML application? I.e. a DTD for the XML API spec. A doc conforming to > that spec. > that can be down-translated to Java, C++, Python and (gasp) IDL! > > APIs are stuctured docs. Let's practice what we are preaching and capture the > API in XML. Unless there are compelling reasons why this does not make sense. > > Just thinking out loud and looking forward to a discussion on the issue. > > Sean If I may be so bold, this sounds like a great idea to me. Producing an API in Java is a valid approach and is more than defensible considering the current Internet climate. However, there is also an argument to be made for a language-independent approach (as evidenced by the discussion in this thread). If this approach is to be favored, it seems to me to make far more sense to develop a generalized DTD for API specifications and make the specification itself in XML. This would have the following advantages: 1) Make a truly language-independent spec which conforms to the XML philosophy. (I am not going to talk about the "spirit of XML". :-) 2) Produce a reusable DTD which would have significant value in its own right. 3) Provide the perfect basis for generating documentation directly from the API specification. 4) Ensure that every "user" has the necessary expertise to understand the formulation of the spec. I am not sure how many people really master IDL. Presumably anyone using XAPI will be able to read and understand XML. 5) Provide a demonstration to the outside world as to how XML can be used to facilitate language/application independence and information reuse. It couldn't be that hard to write a DSSSL app to produce a concrete language implementation from the XML-based spec, right? Cheers, Matthew ------------------------------------------------ Matthew Gertner Project Manager/Architect, Internet/Document Management POET Software GmbH Tel: +49 (40) 609 90254 Fax: +49 (40) 609 90115 E-mail: matthewg@poet.de ------------------------------------------------ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From clloyd at gorge.net Tue Jun 24 17:52:49 1997 From: clloyd at gorge.net (Chris Lloyd) Date: Mon Jun 7 16:58:04 2004 Subject: XAPI and IDL Message-ID: <01BC807B.A8998F60@chaosmobile.com.chaos> Kevin, I might be wrong but I think we are getting mixed up between pure IDL and inplementing an API that uses Microsoft's IDispatch interface. IDispatch is limited by VARIANT types and LPUNKNOWNs but IDL and even COM for that matter is not. Even if Alex passes a pointer to a stream or some other object, that object can be wrapped with IUNKNOWN if you are trying to provide a COM interface for the object. I know that it's easier to provide a COM interface using IDispath and basic types but I think it's just far too limiting for an API. Maybe we should let the interface be designed in Java first and worry about the IDL later. :) Chris Lloyd From: Kevin Grimes@LIANT on 06/23/97 06:30 PM Limiting the types used in the Java XAPI to basic Java types like int, boolean, String, plus the Java interfaces that you've actually implemented in Java (IElement, IXMLProcessor etc.) will make the job of porting the Java XAPI to IDL easier. InputStream was the first thing I tripped over when I translated my Java APIs into IDL--to avoid defining an InputStream interface in IDL I rewrote the member function processDocument to use String. Here is my version of xml.idl. This is my first use of IDL, and XML, and Java for that matter, so please let me know if I'm doing something stupid. I compile this with the MIDL compiler that comes with Microsoft's Visual Studio. We have C++ and Java clients that implement the IXMLApplication callbacks or use the IGrove and INode interfaces to traverse the parse tree. Regards, Kevin (k.grimes@liant.com) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From akirkpatrick at ims-global.com Tue Jun 24 17:58:50 1997 From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com) Date: Mon Jun 7 16:58:04 2004 Subject: XAPI Message-ID: Matthew Gertner wrote... If I may be so bold, this sounds like a great idea to me. Producing an API in Java is a valid approach and is more than defensible considering the current Internet climate. However, there is also an argument to be made for a language-independent approach (as evidenced by the discussion in this thread). If this approach is to be favored, it seems to me to make far more sense to develop a generalized DTD for API specifications and make the specification itself in XML. This would have the following advantages: ------------------------ I think the concern was that this kind of overhead might hold up the API development (some were already arguing that it is too early to think about a grove API at all). I agree that Java may be too web-orientated but would rather see the API take shape in this language than not at all. Having said that, the people doing the API work should try to make it easy to get the structures/methods/documentation into other formats and should certainly minimise any language specific areas (I guess Java is a good environment in this respect?!). Alfie. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From peter at techno.com Tue Jun 24 19:39:08 1997 From: peter at techno.com (Peter Newcomb) Date: Mon Jun 7 16:58:04 2004 Subject: XAPI Message-ID: <199706241729.NAA16378@exocomp.techno.com> > [Jonathan Robie] > > Hmmm. IDL == Language independent spec of an API....might this be > better approached as an XML application? I.e. a DTD for the XML API > spec. A doc conforming to that spec. that can be down-translated to > Java, C++, Python and (gasp) IDL! Designing a generic language (or DTD) for API description is non-trivial (consider the work that has gone into creating IDL) and therefore cannot be done satisfactorily in time for XAPI, given that XAPI is needed _now_. Having said that, I agree with you completely. In fact, this has already been done to some extent for SGML: it's called the SGML property set. A property set is nothing more than interface specifications for classes of objects using the "grove" object model. Property sets are not, however, suited for generic interface descriptions. There is no way, for instance, to describe an action method such as "parse this" in a property set. A property sets are abstract interfaces to static groves. In my own work, I have developed another SGML language (otherwise known as a DTD) that allows me to describe both interfaces and implementations of object classes using a more generic object model. This language also allows me to tie some of these classes and methods to properties in property sets, thus providing a framework for implementation of property sets, but also giving me a platform and language-neutral representation of my entire API and implementation. This representation is then compiled down to APIs and implementations for specific platforms and languages. This system has taken some time to develop, (and is still under development), but has already shown its worth in terms of ease of coding, maintenance, porting, and documentation. However, it is still not a fully generic and complete system (it may never be), as I have geared it towards implementing property sets, and have only added and implemented those features needed for doing so. In the long run, I suggest taking a similar approach: create an XML language for describing APIs and use it to describe the XAPI, linking it to the relevant classes and properties from the SGML and/or XML property sets (for use with DSSSL and/or HyTime). Develop the API description language along with the XAPI described with it; add to the API description language only those features needed for XAPI, while leaving the door open for further enhancements needed for other applications. Study the object models used by IDL, Java, C++, Python, and others, especially with regards to how they impact API development. In the short term, let us develop XAPI-J with the above in mind (somewhere near the back) so that people can use it now, and so that it can be used as a model for future development. -peter -- Peter Newcomb TechnoTeacher, Inc. 233 Spruce Avenue P.O. Box 23795 Rochester, NY 14611-4041 USA Rochester, New York 14692-3795 USA +1 716 529 4303 (home) +1 716 464 8696 (direct) +1 716 755 8698 (cell) +1 716 271 0796 (main) +1 716 529 4304 (fax) +1 716 271 0129 (fax) peter@petes-house.rochester.ny.us peter@techno.com http://www.petes-house.rochester.ny.us http://www.techno.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jwrobie at mindspring.com Tue Jun 24 20:08:23 1997 From: jwrobie at mindspring.com (Jonathan Robie) Date: Mon Jun 7 16:58:04 2004 Subject: XAPI Message-ID: <1.5.4.32.19970624180706.00933ee0@pop.mindspring.com> At 01:29 PM 6/24/97 -0400, Peter Newcomb wrote: >> [Jonathan Robie] >> >> Hmmm. IDL == Language independent spec of an API....might this be >> better approached as an XML application? I.e. a DTD for the XML API >> spec. A doc conforming to that spec. that can be down-translated to >> Java, C++, Python and (gasp) IDL! For the record, I didn't say that. Sean McGrath did, quoting my earlier message, which went like this: >Is there really an advantage to defining it in IDL first? The IDL could be >created after the specification is finished in Java, and the Java-based >specification is probably easier to create, understand, and test. I *like* >making things language independent, but at this stage, I'm leery of adding >complexity that doesn't add any new conceptual power. So not only am I in agreement with the rest of your message, your message actually agrees with what I said earlier! Incidentally, Alex Milowski referred to the "factory design pattern". Using design patterns as a basis for the design is really helpful, because there is a book which describes each of these patterns in detail, complete with diagrams, scenarios, etc. For instance, there are 9 pages on the factory design pattern that Alex mentioned. This makes it much easier to communicate about design choices on a high conceptual level. POET's Wildflower API, which was developed completely independently of Alex's software, also uses a design patterns approach to parse, manage, and navigate SGML documents in the document repository. I wonder if some of the rest of us are also design patterns critters? Jonathan *************************************************************************** Jonathan Robie jwrobie@mindspring.com http://www.mindspring.com/~jwrobie POET Software, 3207 Gibson Road, Durham, N.C., 27703 http://www.poet.com *************************************************************************** xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtigue at datachannel.com Tue Jun 24 21:31:09 1997 From: jtigue at datachannel.com (John Tigue) Date: Mon Jun 7 16:58:04 2004 Subject: XAPI References: <1.5.4.32.19970624180706.00933ee0@pop.mindspring.com> Message-ID: <33B020B2.65235836@datachannel.com> Jonathan Robie wrote: > Incidentally, Alex Milowski referred to the "factory design pattern". > Using > design patterns as a basis for the design is really helpful, because > there > is a book which describes each of these patterns in detail, complete > with > diagrams, scenarios, etc. For instance, there are 9 pages on the > factory > design pattern that Alex mentioned. This makes it much easier to > communicate > about design choices on a high conceptual level. POET's Wildflower > API, > which was developed completely independently of Alex's software, also > uses a > design patterns approach to parse, manage, and navigate SGML documents > in > the document repository. I wonder if some of the rest of us are also > design > patterns critters? The book is Design Patterns Element of Reusable Object-Oriented Software by Erich Gamma, Helm, Johnson, and Vlissides (Addison-Wesley) ISBN: 0-201-63361-2 or see http://st-www.cs.uiuc.edu/users/patterns/ for a blurb. XAPI included IXMLProcessorFactory which uses the Factory Method pattern on page 107. Having common handles to design concepts definitely helps the conversation and I have used Gamma et al. where I can in XAPI. -- John Tigue Programmer jtigue@datachannel.com DataChannel (http://www.datachannel.com) 206-462-1999 -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 316 bytes Desc: Card for John Tigue Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970624/d16ffbae/vcard.vcf From cbullard at hiwaay.net Wed Jun 25 01:45:28 1997 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 16:58:04 2004 Subject: DTD invented by Microsoft?! References: Message-ID: <33B05BF1.47B3@hiwaay.net> akirkpatrick@ims-global.com wrote: > > The following extract is from the MS white paper on XML. Are they > describing > what we all understand as the DTD here or is it something else? If the > former, > what are Microsoft doing taking credit for it, I wonder... > > > Microsoft has proposed a "Document Type Definition" (DTD) syntax for > expressing the schema for an > XML document directly within XML itself, > allowing XML data to describe its own structure. Expressing > schemata > within XML adds great power to the XML format because it makes it > possible for software > > examining certain data to understand its structure without earlier > knowledge about the data or its > > meaning. > > The section on white-space also seems oversimplified at best. > > BTW: I don't have anything against Microsoft, even if their developments > do seem to be all over the place at the moment. This appears to be the long awaited and somewhat dreaded attempt to use instance syntax for type definitions. It is an idea that has been floated several times on the XML WG list and generally resisted. It is a bad idea and may be the reason SGML community members finally withdraw from XML development. len xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed Jun 25 01:51:22 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:04 2004 Subject: XML Property Set Message-ID: <8492@ursus.demon.co.uk> I think we have the basis of agreement on the overall architecture and it's important to move reasonably quickly with it. From what I can gather the main players are in agreement with the 4-block structure and - as a typical webhacker - it seems to make sense to me. [There are two places that I would hook JUMBO into - the Event API and the Grove API. It's VERY important to retain focus. The 'Grove' approach should be similar to the ReallySimple API [This was a propsoed Interface from James Clark on this list back in March under the 'Simple API' thread. James produced an interface that even I can understand, and that's what I hope we are taking forward (in spirit at least). In message <199706241418.JAA16855@copsol.com> lex@www.copsol.com (Alex Milowski) writes: > > > > |---------------| > > | Grove API | <<< I assume this has similarities to JamesClark's > > |---------------| ReallySimple API ... > > I'm not certain what the ReallySimple API is. I was think along the lines > of the DSSSLTK dsssl.grove package. I think it's very important to keep this as lightweight as possible at this stage. We're building prototypes (the language isn't stable - we don't know what July 1 might include/omit :-). So this API must make sense to a wide range of people - it will be their main interaction with a parser. > > > | Grove Builder | > > | API | <<< different memory/storage models are implemented > > here. > > Yes, exactly. The Grove API is abstract and the Grove Builder API hides the > exact implementation methodologies for constructing grove objects from the > rest of the world. I suspect that it will quite a small community that needs to interact with this; specialist developers who care about the memory model, caching, interaction with OBDs etc. > > > > |----------------------------------| > > | XML Event API | << presumably fairly similar to > > NXP? > > Potentially. I assume there will be some "grand convergence" on this between > parsers and the needs of a grove. Good. > > > > |----------------------------------| > > | XML Parser API | << Corresponds to John Tigue's > > analysis? > > |----------------------------------| > > Yes, and maybe more infrastructure if we are up to it. IMHO, this API should > provide the ability for different parsers to be *configured* for use within > an application. Hence, this should be an abstract component that is > complete enough to allow most (if not all) applications to not have to > know the implementation details. Yes. It hasn't been difficult to interact with the current parsers, but as more come we shall get terminological slippage and this may cause confusion. These APIs should hold the terminology fixed. > [...] > > I'm not certain I understand what you mean. Can you give a more > detailed example? I think it would be very valuable to have some examples as soon as reasonable. We shall then get a feel for the size of the property set (hopefully very small) the factory model and so forth. It would be very useful to have a V0.1 to concentrate discussion and to get a feel for scale. [WRT other discussions, I agree with those who are suggesting pure Java at present. Although the extensions to other languages are probably fairly straightforward, it all adds effort. Java will prove the concept, show the problems, and it is then much easier to extend and generalise.] Remember also that there is/will_be a lot of work on XML-LINK, XML-TYPE, XML-STYLE and we shall all get diverted when these crystallise. A relatively solid processing API will help all of these efforts as well. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From peter at techno.com Wed Jun 25 05:44:26 1997 From: peter at techno.com (Peter Newcomb) Date: Mon Jun 7 16:58:04 2004 Subject: XAPI In-Reply-To: <1.5.4.32.19970624180706.00933ee0@pop.mindspring.com> (message from Jonathan Robie on Tue, 24 Jun 1997 14:07:06 -0400) Message-ID: <199706250321.XAA16541@exocomp.techno.com> > Date: Tue, 24 Jun 1997 14:07:06 -0400 > From: Jonathan Robie > Cc: xml-dev@ic.ac.uk > > At 01:29 PM 6/24/97 -0400, Peter Newcomb wrote: > >> [Jonathan Robie] > >> > >> Hmmm. IDL == Language independent spec of an API....might this be > >> better approached as an XML application? I.e. a DTD for the XML API > >> spec. A doc conforming to that spec. that can be down-translated to > >> Java, C++, Python and (gasp) IDL! > > For the record, I didn't say that. Sean McGrath did, quoting my earlier > message, which went like this: Sorry about that... I receive CTS as email and had already deleted your post before I read Sean's post and was moved to write mine. I got confused by the quoting. > >Is there really an advantage to defining it in IDL first? The IDL could be > >created after the specification is finished in Java, and the Java-based > >specification is probably easier to create, understand, and test. I *like* > >making things language independent, but at this stage, I'm leery of adding > >complexity that doesn't add any new conceptual power. > > So not only am I in agreement with the rest of your message, your message > actually agrees with what I said earlier! Yes... I had read your post, and meant mine as a second to yours. -peter -- Peter Newcomb TechnoTeacher, Inc. 233 Spruce Avenue P.O. Box 23795 Rochester, NY 14611-4041 USA Rochester, New York 14692-3795 USA +1 716 529 4303 (home) +1 716 464 8696 (direct) +1 716 755 8698 (cell) +1 716 271 0796 (main) +1 716 529 4304 (fax) +1 716 271 0129 (fax) peter@petes-house.rochester.ny.us peter@techno.com http://www.petes-house.rochester.ny.us http://www.techno.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Wed Jun 25 10:17:11 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:58:04 2004 Subject: DTD invented by Microsoft?! In-Reply-To: len bullard's message of Tue, 24 Jun 1997 18:44:50 -0500 References: <33B05BF1.47B3@hiwaay.net> Message-ID: <730.199706250817@grogan.cogsci.ed.ac.uk> Len writes: > This appears to be the long awaited and somewhat dreaded > attempt to use instance syntax for type definitions. It > is an idea that has been floated several times on the > XML WG list and generally resisted. It was resisted, correctly in my view, as a component of XML-lang itself, and in the decision the point was made several times that the right place for this was one level up, as a generic application. That's what the schema proposal in the XML-data document is aimed at providing. > > It is a bad idea and may be the reason SGML community > members finally withdraw from XML development. I'd be interested to hear your reasons for thinking it's a bad idea -- not surprisingly I think it's a good idea -- it puts flexibility in the place it ought to be, and provides clean mechanisms for dealing with precisely the tasks which PEs are messily used for now. Why disagreement about this point should implicate the relation between XML and SGML is unclear to me. ht xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Wed Jun 25 10:27:52 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:58:04 2004 Subject: XML Property Set In-Reply-To: "Norbert H. Mikula"'s message of Tue, 24 Jun 1997 12:30:00 -0700 References: <199706232025.PAA16524@copsol.com> <33B02038.495A@edu.uni-klu.ac.at> Message-ID: <734.199706250827@grogan.cogsci.ed.ac.uk> I think this is all useful discussion, but I also think that it's getting too monolithic. In our experience with using a simple API to access a (normalised) SGML document stream, which led to our LT XML tool, we found that most of the quick and simple tools (often DTD-specific) we wanted to build were most easily constructed on top of an I/O model, not and event model or a grove model. That is, one where the basic APPLICATION structure was while (bit=GetBit(xmlStream)) { select (bit.type) { case startTag: ... case endTag: ... case textData: ... case PI: ... } } I'd be sorry to lose this level, but am not clear where it fits in the developing picture. Is this the 'XML Parser API'? ht xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From akirkpatrick at ims-global.com Wed Jun 25 11:08:56 1997 From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com) Date: Mon Jun 7 16:58:04 2004 Subject: DTD invented by Microsoft?! Message-ID: Could someone explain what "instance syntax for type definitions" means. Thanks! ht@cogsci.ed.ac.uk[SMTP:PC @INTERNET {ht@cogsci.ed.ac.uk}] writes: -------------------------------------------------------------------------- ------------- Len writes: > This appears to be the long awaited and somewhat dreaded > attempt to use instance syntax for type definitions. It > is an idea that has been floated several times on the > XML WG list and generally resisted. It was resisted, correctly in my view, as a component of XML-lang itself, and in the decision the point was made several times that the right place for this was one level up, as a generic application. That's what the schema proposal in the XML-data document is aimed at providing. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From lex at www.copsol.com Wed Jun 25 15:38:21 1997 From: lex at www.copsol.com (Alex Milowski) Date: Mon Jun 7 16:58:05 2004 Subject: XML Property Set In-Reply-To: <734.199706250827@grogan.cogsci.ed.ac.uk> from "Henry S. Thompson" at Jun 25, 97 09:27:44 am Message-ID: <199706251336.IAA00363@copsol.com> > I think this is all useful discussion, but I also think that it's > getting too monolithic. In our experience with using a simple API to > access a (normalised) SGML document stream, which led to our LT XML > tool, we found that most of the quick and simple tools (often > DTD-specific) we wanted to build were most easily constructed on top > of an I/O model, not and event model or a grove model. That is, one > where the basic APPLICATION structure was > > while (bit=GetBit(xmlStream)) { > select (bit.type) { > case startTag: ... > case endTag: ... > case textData: ... > case PI: ... > } > } > > I'd be sorry to lose this level, but am not clear where it fits in the > developing picture. Is this the 'XML Parser API'? IMHO, this would be the XML Event API. Somewhere previous to this you would use the XML Parser API to setup the parser and tell it what document to parse. ============================================================================== R. Alexander Milowski http://www.copsol.com/ alex@copsol.com Copernican Solutions Incorporated (612) 379 - 3608 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Wed Jun 25 16:28:01 1997 From: nmikula at edu.uni-klu.ac.at (Norbert H. Mikula) Date: Mon Jun 7 16:58:05 2004 Subject: XML Property Set References: <199706232025.PAA16524@copsol.com> <33B02038.495A@edu.uni-klu.ac.at> <734.199706250827@grogan.cogsci.ed.ac.uk> Message-ID: <33B185E1.3D25@edu.uni-klu.ac.at> Henry S. Thompson wrote: > I'd be sorry to lose this level, but am not clear where it fits in the > developing picture. Is this the 'XML Parser API'? What you have in mind is pretty much the level of the event based API. To my understanding the XML parser API is meant to tell the parser how to behave (i.e. validate/non-validate, set optional error reporting..) and to set things like the object dealing with the Esis stream, set the input stream etc. -- Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jeanpa at microsoft.com Wed Jun 25 19:04:24 1997 From: jeanpa at microsoft.com (Jean Paoli) Date: Mon Jun 7 16:58:05 2004 Subject: XML-Data Message-ID: <78DFE33066ABD0118B9200805FD431BA93299D@RED-16-MSG.dns.microsoft.com> The current version of XML-Data, the Microsoft Position Paper, is at http://www.microsoft.com/standards/xml, along with a white paper on XML. Together, these documents present our vision for the use of structured data on the web. I hope these are easier to use than the version I mailed to you on Sunday. The XML-Data paper is also in http://www.w3.org/XML/Group/9706/xml-data (thanks to Dan Conolly). -Jean Paoli > ---------- > From: Jean Paoli > Sent: Sunday, June 22, 1997 10:37 PM > To: 'w3c-sgml-wg@w3.org'; 'xml-dev@ic.ac.uk'; > 'w3c-sgml-erb@hpsgml.fc.hp.com' > Cc: Andrew Layman; Thomas Reardon; Adam Bosworth; Hadi Partovi > Subject: XML-Data > > I am pleased to present XML-Data, a Position Paper from Microsoft. > XML-Data is an application of XML for exchanging > structured data and metadata on the Internet. > This position paper is sent to multiple working groups > in the W3C dealing with this subject (XML, meta-data) > and we expect this paper to be discussed and improved > by these working groups. > The current proposal needs namespaces and uses the Layman/Bray > proposal. > > The URL of this paper (on the Microsoft site) will be posted tomorrow. > -Jean Paoli > > ---------------- > > xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Ingo.Macherius at TU-Clausthal.de Wed Jun 25 19:54:44 1997 From: Ingo.Macherius at TU-Clausthal.de (Ingo Macherius) Date: Mon Jun 7 16:58:05 2004 Subject: DTD invented by Microsoft?! In-Reply-To: Message-ID: <199706251753.TAA18214@sinfonix.rz.tu-clausthal.de> > Could someone explain what "instance syntax for type definitions" > means. Thanks! I am aware this is a beginner's question. Is xml-dev the right place to answer ? If not, where is the place for such Q/A ? Anyway: In valid XML there are two distinct parts of a document, the DTD and the "document instance". Both serve different purposes. The "instance" is the marked up text the user produces. (So any valid HTML page is an "instance" of the HTML DTD). The tags allowed in the instance are declared in the DTD using a different syntax. The term "instance syntax for type definitions" means, that the same syntax is used for both DTD and instance. Compare: ]> with aaa bbb bbb ccc ccc Using the second case there has to be a mechanism to tell meta-structure-defining tags (, , ...) from user-defined ones, e.g. 1. namespaces (proposed mechanism for XML) 2. reserved attributes (like the current XML-Link draft) 3. reserved names (like with HTML) 4. processing instructions (shudder) 5 ... Q: Has Microsoft published the intended syntax for "Schemata" (the MS name for "marked up" DTD) to the public ? I can't find the link, help is welcome. ++im -- Ingo Macherius // L'Aigler Platz 4 // D-38678 Clausthal-Zellerfeld mailto:Ingo.Macherius@tu-clausthal.de http://www.tu-clausthal.de/~inim/ Information!=Knowledge!=Wisdom!=Truth!=Beauty!=Love!=Music==BEST (Frank Zappa) xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From marcus at lab.com Wed Jun 25 20:13:53 1997 From: marcus at lab.com (Wendell Piez) Date: Mon Jun 7 16:58:05 2004 Subject: XML DTD for HTML? Message-ID: <33B160F0.5091363B@lab.com> List members: Is there an XML DTD for HTML publicly available? We would be much obliged to take a look.... Regards, Wendell Piez HuskyLabs marcus@lab.com xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ebaatz at barbaresco.East.Sun.COM Wed Jun 25 21:26:17 1997 From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS) Date: Mon Jun 7 16:58:05 2004 Subject: Character encoding questions Message-ID: I was struck by the following sentence in the Microsoft XML White Paper: XML supports a range of encodings...subject only to the restriction that an entire document must share the same encoding. My immediate reaction was that that wasn't correct, although the definition of "document" above isn't obvious to me (for example, are external entities part of a document?). However, when checking into the XML April specification, I got in over my head. I am hoping that someone here will help me out of my hole. If my XML document is a simple Unicode text file then I begin it like the following a Byte Order Mark ... with the Byte Order Mark being required even though an EncodingDecl is used? (I would have said "yes" until I got to Appendix E "Autodetection of Character Sets," which worries about detecting UCS-2 when there is no Byte Order Mark.) Is the EncodingDecl necessary if the file starts with a Byte Order Mark? Where can I have an EncodingPI? Section 4.3.3 talks about their being "at the beginning of a system entity, before any other character data or markup" but doesn't define "system entity" (perhaps one that has an ExternalID that contains "SYSTEM"?). If my document references an external entity, then I believe that the external entity must start with an EncodingPI (see Appendix E "Autodetection of Character Sets") if it isn't in UTF-8 or start with a Byte Order Mark. If I wanted to take the external entity and, for portability reasons, bundle it into my XML document as an internal entity, what do I do with the external entity's EncodingPI? It doesn't seem to be allowed in the internal entity declaration, somewhat like: "text here"> I presume that the answer is that I cannot convert an external entity into an internal unless the external entity and my XML document have the same encoding. What is the motivation for not allowing a change of encoding within an entity? The mechanism for handling that seems no different than that needed to handle different encodings in external entities, which I think of as being logically a part of the referencing document. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed Jun 25 23:53:04 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:05 2004 Subject: XML-Data Message-ID: <8518@ursus.demon.co.uk> In message <78DFE33066ABD0118B9200805FD431BA93299D@RED-16-MSG.dns.microsoft.com> Jean Paoli writes: > The current version of XML-Data, the Microsoft Position Paper, is at > http://www.microsoft.com/standards/xml, along with a white paper on XML. > Together, these documents present our vision for the use of structured > data on the web. > I hope these are easier to use than the version I mailed to you on > Sunday. Thanks very much Jean, In fact the mail that was sent last week was unreadable on my mailer (completely) although it still had to be downloaded. It would be appreciated if *very* large documents were mounted on the WWW and not directly mailed to this list, because not everyone can read them easily, and some of us have to pay for connect time. I discovered the URL independently and it's certainly a useful resource for the XML community - abstracters and curators will no doubt add it to their pages. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed Jun 25 23:53:10 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:05 2004 Subject: DTD invented by Microsoft?! Message-ID: <8519@ursus.demon.co.uk> In message <199706251753.TAA18214@sinfonix.rz.tu-clausthal.de> "Ingo Macherius" writes: > > Could someone explain what "instance syntax for type definitions" > > means. Thanks! > > > I am aware this is a beginner's question. Is xml-dev the right place > to answer ? If not, where is the place for such Q/A ? The first place to go would be Peter Flynn's FAQ if it is relevant to XML. www.ucc.ie/xml/ Peter has a form where you can post questions and/or answers. When the FAQ was set up he was urging people to post. It's now a very impressive site with colours for versions, etc. and I'm not sure whether there is still a call for material. ???Peter. There can be times when "beginners' questions" are appropriate on this list (*I* ask enough :-). This is when there is a real danger that a general lack of understanding might lead to fuzziness in *implementation*. I think this is particularly true when terminology is not clear. So, for example, 'What is a resource in XML-LINK?' is worth discussing, and EliotK has given an excellent working reference document for us. > > P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Wed Jun 25 23:53:37 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:05 2004 Subject: XML-LINK Message-ID: <8520@ursus.demon.co.uk> I recently posted some concerns about XML-LINK on XML-WG and it was suggested that XML-DEV would be more appropriate; I agree. The main question is to what extent *generic* XML-link-processors can be built which are application-independent. They would rely totally on the XML-LINK spec for their implementation. I have been rereading Eliot Kimber's popsting of 1997-05-31 on this list and have found it extremely helpful. Since no-one has challenged any of the ideas or terminology there, I shall take that as a reference point and try to use his terms consistently. (I am aware that July 1 may bring additional clarification, but discussion here will help). In many respects, XML-LINK behaviour has parallels to our discussions on XML-LANG APIs. The draft specifies what goes in, but leaves more fluid 'what comes out'. It's critical that we have consistent terminology in all of these endeavours and outline the areas of complete agreement. My primary concern is with the terms 'resource' and 'embed', where I believe there is scope for added precision, and where I am not clear that all the discussion on XML-WG about these has been consistent with Eliot's document. It seems clear that link traversal requires us to have parsed documents in 'memory' (this could also mean persistent storage, etc.) A link connects 'nodes in trees' and 'resource' is essentially synonymous with 'node' (EK, P1.). I will build a simple example, and then ask how it might be implemented: a.xml:

This is a link in a paragraph

can be parsed to a tree (I use '-' to indicate childOf in a TOC-like structure and PC(string) indicates a child with #PCDATA content (whitespace problems ignored). P -PC(This is ) -A --PC(a ) --B ---PC(link) -PC( in a paragraph) Now, from Eliot's posting I identify the node A as the resource at one end of the link l1. The content of A is not relevant to the resource, since a node is a point. [However some XML-WG postings seemed to imply that the content of A is a resource, which is at variance with Eliot's explanation.] For EXTENDED, INLINE="TRUE" I am less clear what the resource is in:

Here is the father and the mother and the baby

which parses to: MYLINK -P --PC(Here is the) --A ---PC(father) --PC(and the) --A ---PC(mother) --PC(and the) --A ---PC(baby) Now this is a single link, with (presumably) a single end at the INLINE end. So does this mean that the 'resource' of this link is the MYLINK node with ID=family? Or are there three 'resources' at this end, the A nodes with IDs of 'father', 'mother' and 'baby'? *-*-*-* Now for the other end of the link, and EMBED. I have implemented EMBED in JUMBO like IMG in HTML: would locate the ID=MOL in foo.xml, process() it to create an object, which would then display() itself in the document at the position where the A link would be rendered. But I am more concerned about when the located node is a (sub)tree which [XML-LINK] 'should be embedded, for the purposes of display or processing in the body of the resource and at the location where the traversal started' Taking the first example (a.xml) which links to b.xml and assume b.xml contains: b.xml:

This is a node in a paragraph

which is parsed to: P -PC(This is ) -NODE --PC(a ) --B ---PC(node) -PC( in a paragraph) The link from ID="A" in a.xml links to node ID="B1" in b.xml. In one interpretation, that's it - 'embed'ding is up to the application. But is there any reasonable default behaviour? (A) it could be traversed as if it were physically part of the a.xml document (i.e. if NODE were a child of A (and presumably the eldest sibling). The processor would encounter A, process the node (only), find it had a LINK, process that, then find A had content and process that. Note that the content of A remains and it would be application-dependent whether the *content* of A was hidden or remains. (B) Nothing happens unless BEHAVIOR is set. In which case are there reasonable values for it? And does the concept of embedding have any meaning? My own feeling is that (A) is the most reasonable default. With NEW we have a separate window, with a separate namespace (so it doesn't matter if b.xml has a different DTD from a.xml). So this 'window' has to transported into the current 'window'. I'd value comments. If this seems to be a consensus view then I'll try to implement it in JUMBO. At present I suspect that JUMBO has got this partly right and partly wrong. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From cbullard at hiwaay.net Thu Jun 26 02:06:11 1997 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 16:58:05 2004 Subject: DTD invented by Microsoft?! References: <33B05BF1.47B3@hiwaay.net> <730.199706250817@grogan.cogsci.ed.ac.uk> Message-ID: <33B1B254.1A55@hiwaay.net> Henry S. Thompson wrote: > > Len writes: > > > This appears to be the long awaited and somewhat dreaded > > attempt to use instance syntax for type definitions. It > > is an idea that has been floated several times on the > > XML WG list and generally resisted. > > It was resisted, correctly in my view, as a component of XML-lang > itself, and in the decision the point was made several times that the > right place for this was one level up, as a generic application. > That's what the schema proposal in the XML-data document is aimed at > providing. > > > > > It is a bad idea and may be the reason SGML community > > members finally withdraw from XML development. > > I'd be interested to hear your reasons for thinking it's a bad idea -- Why do we need two ways to do the same thing? Rick Jeliffe provided the example in the SGML DTD syntax we know now. If simplicity is the goal, why introduce this now? len xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From galiard at let.rug.nl Thu Jun 26 06:52:52 1997 From: galiard at let.rug.nl (Harry Gaylord) Date: Mon Jun 7 16:58:05 2004 Subject: Character encoding questions Message-ID: <199706260452.GAA22614@odur.let.rug.nl> > I was struck by the following sentence in the Microsoft XML White Paper: > > XML supports a range of encodings...subject only to the restriction > that an entire document must share the same encoding. > > My immediate reaction was that that wasn't correct, although the > definition of "document" above isn't obvious to me (for example, are > external entities part of a document?). However, when checking into the > XML April specification, I got in over my head. I am hoping that someone > here will help me out of my hole. > > If my XML document is a simple Unicode text file then I begin it like > the following > > a Byte Order Mark > > ... > > with the Byte Order Mark being required even though an EncodingDecl is > used? (I would have said "yes" until I got to Appendix E "Autodetection > of Character Sets," which worries about detecting UCS-2 when there > is no Byte Order Mark.) Is the EncodingDecl necessary if the file > starts with a Byte Order Mark? > > Where can I have an EncodingPI? Section 4.3.3 talks about their being > "at the beginning of a system entity, before any other character data or > markup" but doesn't define "system entity" (perhaps one that has an > ExternalID that contains "SYSTEM"?). If my document references an > external entity, then I believe that the external entity must start > with an EncodingPI (see Appendix E "Autodetection of Character Sets") > if it isn't in UTF-8 or start with a Byte Order Mark. > In classical SGML this info is contained in the system declaration where one or more character sets can be declared and the control characters used to switch between them, using the ISO 2022 and related standard systems. These are read in before the dtd. However, if I understand the XML proposals correctly, they do not envisage a system declaration. The best info on system declarations are a white paper from omnimark and an article in TAG by Wayne Wohler. On character sets you might have a look at my article in CHUM a couple of years ago. I have a preprint in ps available by ftp if you want to see it. It does not have the character set tables which ISO claims the copyright for. With the implementation of unicode/ucs we don't need all those things with control characters which are too succeptible to corruption. All the characters you need (or almost all in my case) are in the new character set. The other option in classic SGML is to use a subdoc, but as far as I can remember it can contain its own dtd, but I don't think it can have a system declaration. My docs are at the office. > Harry Gaylord former chair TEI committee on character sets member ISO SC2 and NNI shadow committee > xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From north at synopsys.com Thu Jun 26 09:29:17 1997 From: north at synopsys.com (Simon North) Date: Mon Jun 7 16:58:05 2004 Subject: PUBLIC identifiers in XML? In-Reply-To: <78DFE33066ABD0118B9200805FD431BA932987@RED-16-MSG.dns.microsoft.com> Message-ID: <199706260728.JAA12969@cadis.de> This is mostly likely a RTFM question, but the XML FAQ says: "No public identifiers in entity and notation declarations" While the XML-lang says (page 22 in the dead tree version or see http://www.w3.org/pub/WWW/TR/WD-xml-lang#secA. for the borrowed electrons version): "No public identifiers in ENTITY, DOCTYPE, and NOTATION declarations". but at the same time, XML-lang explicitly includes PUBLIC in the production rule (Section 4.3.2, page 18 or see http://www.w3.org/pub/WWW/TR/WD-xml-lang#sec4.3.2) AND has an example of an external entity declaration that *does* use a public identifier. I've also seen public identifiers used in DOCTYPE declarations for XML, and I had understood that this was OK but should still be supported by a SYSTEM identifier. I had also heard/read somewhere that a resolution mechanism for public identifiers was being worked on and that the restriction might then go away. Could someone please enlighten me on this? Thanks. Simon North - COSSAP Technical Writer, Synopsys Synopsys GmbH, Kaiserstr. 100, 52134 Herzogenrath Germany. +49 2407 955873 -- north@synopsys.com Voice mail: +1 415 694 4141 55055 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ht at cogsci.ed.ac.uk Thu Jun 26 11:44:53 1997 From: ht at cogsci.ed.ac.uk (Henry S. Thompson) Date: Mon Jun 7 16:58:06 2004 Subject: DTD invented by Microsoft?! In-Reply-To: len bullard's message of Wed, 25 Jun 1997 19:05:40 -0500 References: <33B05BF1.47B3@hiwaay.net> <730.199706250817@grogan.cogsci.ed.ac.uk> <33B1B254.1A55@hiwaay.net> Message-ID: <1113.199706260944@grogan.cogsci.ed.ac.uk> Len writes: > > Len wrote > > > It is a bad idea and may be the reason SGML community > > > members finally withdraw from XML development. > > > Henry S. Thompson wrote: > > I'd be interested to hear your reasons for thinking it's a bad idea -- > > Why do we need two ways to do the same thing? Rick Jeliffe > provided the example in the SGML DTD syntax we know now. > If simplicity is the goal, why introduce this now? Just because I can express any logical formula using Shaeffer stroke, or any program in assembler, doesn't mean I should. Using PEs to encode an element-type hierarchy not only obscures the author's intention, it invites accidental error, encourages hacking at the margins, and makes it harder for non-specialist users to augment the hierarchy cleanly. Compare, for example with . . . Understanding why and how the first of these does its work requires considerable specialist knowledge, and if you don't believe me ask Lou Burnard and Michael Sperberg-McQueen how easy they have found it to educate TEI users to make such extensions themselves. An explicit type hierarchy also simplifies things for the original author, making the schema easier to maintain, to explain, and to read for the ordinary user. Compare: with . . . . . . Note finally that the PE method only easily allows a single layer of specialisation -- once you've defined x.phrase in the above example, you can't give the results to someone else and say "And to add stuff to paragraph's content model, define x.phrase to what you want". If you want to do that, you have to and so on. The element-type hierarchy approach allows multiple independent specialisations, with a free choice of attachment points (i.e. extends='phrase' or extends='myCrystal'). Hope this helps communicate the value I see in this approach. ht xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ebaatz at barbaresco.East.Sun.COM Thu Jun 26 16:04:18 1997 From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS) Date: Mon Jun 7 16:58:06 2004 Subject: Character encoding questions Message-ID: Thanks for you wide-ranging and graceful reply. > (There have been some suggestions that ... encoding > declarations [be] optional if there is an external carrier with a > character-encoding label... I hope those sentiments are resisted. Having something like a declaration that is transport or operating environment independent seems a lot simpler, reliable, and understandable. If the declaration is redundant, it is harmless. > The reason I, for one, didn't lobby for allowing change of encoding > within an entity... For what it is worth, I'm in agreement. In practice I don't see much need for such a feature and there exist straightforward ways of handling such problems when they do exist. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From flammia at sls.lcs.mit.edu Thu Jun 26 16:49:51 1997 From: flammia at sls.lcs.mit.edu (Giovanni Flammia) Date: Mon Jun 7 16:58:06 2004 Subject: DTD invented by Microsoft?! References: <33B05BF1.47B3@hiwaay.net> <730.199706250817@grogan.cogsci.ed.ac.uk> <33B1B254.1A55@hiwaay.net> <1113.199706260944@grogan.cogsci.ed.ac.uk> Message-ID: <33B278E4.4867D782@sls.lcs.mit.edu> As someone who is not used to write DTDs, I appreciate the simplifications proposed by Henry Thompson. With XML, less is more. So, for example, I can see why constraining XML documents to be trees is better than allowing people to encode arbitrary object graphs. Isn't XML and its extensions to become "SGML for the masses, without DTDs"? If you keep a gentle learning curve for people to create new tags, I am sure the popularity of XML will spread like wildfire. I apologize if this comment might seem misplaced, but if one has to learn full-blown SGML syntax and how to write DTDs, then most people who are afraid to get into SGML now (and are currently occasional users of SGML w/o dwelling into DTDs) will be also afraid to work with XML. I am a little bit confused about how much power of expression should XML have. If an XML document encodes detailed semantics about how to process its elements, like a full blown programming language, and you have to use an IDL for it, isn't XML competing with distributed object communication (e.g., CORBA), and distributed object databases (e.g., ObjectStore) but much less efficient (requiring parsing to communicate with objects, rather than calling the objects' methods directly)? How does all this fit together? Shouldn't XML be specialized to expose just enough of the semantics necessary to improve indexing, searching, and multi-modal display of Web documents? Giovanni Flammia flammia@sls.lcs.mit.edu -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 280 bytes Desc: Card for Giovanni Flammia Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970626/75d92eb1/vcard.vcf From north at synopsys.com Thu Jun 26 17:08:34 1997 From: north at synopsys.com (Simon North) Date: Mon Jun 7 16:58:06 2004 Subject: DTD invented by Microsoft?! In-Reply-To: <33B278E4.4867D782@sls.lcs.mit.edu> Message-ID: <199706261508.RAA18154@cadis.de> Giovanni Flammia wrote: > With XML, less is more. My 2 cents ... yes, and possibly even less than HTML. While it isn't (yet) an XML application, the HDML DTD gives a hint of how you can use a very meagre set of elements to create an application. > if one has to learn > full-blown SGML syntax and how to write DTDs, then most people who > are afraid to get into SGML now (and are currently occasional users > of SGML w/o dwelling into DTDs) will be also afraid to work with > XML. Why should we/they be exposed to it? ... come on, tool developers! Writing CSS style sheets (I'll leave DSSSL style sheets out of this), can be pretty complicated but there are some fair WYSIWYG tools coming on the market already. > If an XML document encodes detailed semantics about how to > process its elements, But wasn't that the whole point of dropping LINK, LINKTYPE and USELINK? XML doesn't need the semantics, IMHO, these need to be provided externally to xml-lang; hence the need for XAPI. > How does all this fit together? Java? > Shouldn't XML be specialized to expose just enough of the semantics > necessary to improve indexing, searching, and multi-modal display of > Web documents? isn't it already? Simon North - COSSAP Technical Writer, Synopsys Synopsys GmbH, Kaiserstr. 100, 52134 Herzogenrath Germany. +49 2407 955873 -- north@synopsys.com Voice mail: +1 415 694 4141 55055 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ebaatz at barbaresco.East.Sun.COM Thu Jun 26 17:41:03 1997 From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS) Date: Mon Jun 7 16:58:06 2004 Subject: Character encoding questions Message-ID: > >Having something like > >a declaration that is transport or operating environment > >independent seems a lot simpler, reliable, and understandable. > >If the declaration is redundant, it is harmless. > > If they are in conflict, it can be harmful. Quite true if a program that is confronted with the conflict does something harmful. I'm speculating that fewer harmful results will occur in the real world if #1 below occurs than if #2 occurs. #1. A program trusts the XML information unless it results in the XML document not looking like an XML document. Then the program can give up or try some environment-driven methods. #2. A program keeps a lot of specialized information about environments it might run in (lets assume that the program is somewhat portable so the environments include as least Windows, Apple, and some sort of Unix) and how an XML document might reach it. If the program has the right information and correctly winds its way through it, then it gets a good XML document. I think that #2 is much harder to do than being told what to do by the document. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From matt at wdi.disney.com Thu Jun 26 18:12:18 1997 From: matt at wdi.disney.com (Matthew Fuchs) Date: Mon Jun 7 16:58:06 2004 Subject: DTD invented by Microsoft?! In-Reply-To: Giovanni Flammia "Re: DTD invented by Microsoft?!" (Jun 26, 10:12am) References: <33B05BF1.47B3@hiwaay.net> <730.199706250817@grogan.cogsci.ed.ac.uk> <33B1B254.1A55@hiwaay.net> <1113.199706260944@grogan.cogsci.ed.ac.uk> <33B278E4.4867D782@sls.lcs.mit.edu> Message-ID: <9706260914.ZM15268@scrumpox.rd.wdi.disney.com> On Jun 26, 10:12am, Giovanni Flammia wrote: > Subject: Re: DTD invented by Microsoft?! > > As someone who is not used to write DTDs, I appreciate the > simplifications > proposed by Henry Thompson. With XML, less is more. So, for example, I > can see > why constraining XML documents to be trees is better than allowing > people to encode > arbitrary object graphs. > You can't constrain them to be trees. However the element structure has to be a tree because that is the only graph structure which can be linearized to a text document. A major use of attributes is to indicate the back and cross edges in the original graph. > Isn't XML and its extensions to become "SGML for the masses, without > DTDs"? > No. DTDs were not created just to cause pain and suffering. They are actually not hard to create. Well-formedness was to allow useful processing to occur without the parser requiring the DTD and to make parsers easier to write, not to inspire tag salad. > If you keep a gentle learning curve for people to create new tags, I am > sure > the popularity of XML will spread like wildfire. I apologize if this > comment > might seem misplaced, but if one has to learn > full-blown SGML syntax and how to write DTDs, then most people who > are afraid to get into SGML now (and are currently occasional users of > SGML w/o dwelling into > DTDs) will be also afraid to work with XML. > Lack of XML has not prevented people from introducing new tags. But tag salad is like spaghetti code. Tags (elements) are not independent of each other if they have any semantics. DTDs help keep this under control. > I am a little bit confused about how much power of expression should XML > have. I have an acquaintance who loves to say that ASCII is computationally complete. You can express arbitrary computations in ASCII. Why should XML be less? > If an XML document encodes detailed semantics about how to process its > elements, like a full blown programming language, and you have to use an > IDL for it, isn't XML competing with distributed object communication > (e.g., CORBA), and distributed object databases (e.g., ObjectStore) but > much less efficient (requiring parsing to communicate with objects, > rather than calling the objects' methods directly)? How does all this > fit together? > No. The document doesn't encode these semantics, but you need an API to allow semantics to be applied to the document. Your contrast with distributed objects is also false. Look at CORBA under the hood, and you'll see there's parsing going on. Also, one of the points of mobile agent technology is that distributed invocation is not necessarily cheap. Sending a document can be very much like sending an agent. (Check out my paper "Let's Talk" at http://cs.nyu.edu/phd_students/fuchs). > Shouldn't XML be specialized to expose just enough of the semantics > necessary to improve > indexing, searching, and multi-modal display of Web documents? > Then it's just HTML++ Yuck! :-( Matthew Fuchs matt@wdi.disney.com -- xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From rrseibel at att.com Thu Jun 26 18:56:48 1997 From: rrseibel at att.com (Seibel, Robert R) Date: Mon Jun 7 16:58:06 2004 Subject: XML and HTML Intermixed Message-ID: <9706261658.AB03565@hoccson.ho.att.com> XML Dev. Team: In my application, I see the need to be able to mix XML (my own tags) and HTML tags in a core content database. I plan on using a DTD at various authoring points to validate structure and tags. Do you see mixing tags as reasonable? The XML tags could be converted to the appropriate HTML tags if sent to a browser. Then again all of the tags or information could be formatted for the appropriate output device on the fly. For instance, I may have a tag called PROBLEM and another called SOLUTION. As I'm explaining the solution, it would be nice to use HTML tags to explain the solution. Example: Problem description
  1. Do this first
  2. This is second

Call me on questions.

Let's say I used a style sheet to display the contents. It seems to me that using HTML tags intermixed with XML tags is a good thing. I don't have to reinvent my own tags when HTML already defines them. Comments? Thanks, Bob Seibel xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Thu Jun 26 23:59:09 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:06 2004 Subject: PUBLIC identifiers in XML? Message-ID: <3.0.32.19970626145528.00a8d560@pop.intergate.bc.ca> At 09:29 AM 26/06/97 +0000, Simon North wrote: >This is mostly likely a RTFM question, but the XML FAQ says: > > "No public identifiers in entity and notation declarations" ... > >Could someone please enlighten me on this? Yes, XML has public identifiers. The docs are wrong. In the errata file, will get fixed. -T. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Fri Jun 27 02:07:04 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:06 2004 Subject: Lark 0.90 available, with an application Message-ID: <3.0.32.19970626170447.00a71490@pop.intergate.bc.ca> Hi - Lark 0.90 is now available at http://www.textuality.com/Lark Differences: - now does entity references in attribute values - does &#X style hex character references - has draconian error handling - the Handler has an element() method to serve as an element factory - lots of bug fixes - it's all in a package, textuality.lark Doesn't do PE's yet. It's now over 40k, sigh. For me, the interesting thing is that it now comes with an application named XH. It was bothering me that I was writing but not using the software, so I created xh, which reads the XML form of all the docs I'm working on (XML-lang, XML-link, MCF, etc etc etc) and generates the HTML. This used to be done with a mouldy tumerous perl program - nothing against perl, but xh is a lot cleaner and nicer. Also it produces valid HTML, which the perl didn't. Xh is interesting as it is probably a canonical customer for XAPI (why did we lose JAX, I liked it?) - it doesn't use the event stream, it lets the parser build the tree and then just runs around the elements and attributes. For Xh, I also, after getting it working, realized that I had re-used Peter Murray-Rust's trick of just having a .class per element-type (Class.forName() and Class.newInstance(), gotta love 'em) - I wonder if this is just a coincidence or is this the basic paradigm on which XML software is going to be built? If so, it might make sense to wire a standard class-finder call into XAPI. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-708-9592 xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From cbullard at hiwaay.net Fri Jun 27 04:20:55 1997 From: cbullard at hiwaay.net (len bullard) Date: Mon Jun 7 16:58:06 2004 Subject: Lark 0.90 available, with an application References: <3.0.32.19970626170447.00a71490@pop.intergate.bc.ca> Message-ID: <33B32355.14F0@hiwaay.net> Tim Bray wrote: > > If so, it might make sense to wire > a standard class-finder call into XAPI. I'm reading "Late Night VRML 2.0 with Java". The same approach (class per nodeType) seems to be recommended there. Granted, VRML is designed to be an object-oriented format, but I suspect you are right. Good going. len xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From galiard at let.rug.nl Fri Jun 27 06:34:33 1997 From: galiard at let.rug.nl (Harry Gaylord) Date: Mon Jun 7 16:58:06 2004 Subject: character encoding questions Message-ID: <199706270434.GAA10659@odur.let.rug.nl> Paul Grosso has sent me a note that I was talking about the SGML declaration, not the system declaration yesterday. He is right. The preprint is available from the following address: ftp://let.rug.nl/pub/Galiard/chum7.ps Let me know if you have any problems in getting it. Harry Gaylord xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at Eng.Sun.COM Fri Jun 27 06:38:58 1997 From: Jon.Bosak at Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:06 2004 Subject: XML and HTML Intermixed In-Reply-To: <9706261658.AB03565@hoccson.ho.att.com> (rrseibel@att.com) Message-ID: <199706270437.VAA05158@boethius.eng.sun.com> [Bob Seibel:] | Let's say I used a style sheet to display the contents. It seems to me | that using HTML tags intermixed with XML tags is a good thing. I don't | have to reinvent my own tags when HTML already defines them. You can mix tags all you want; with the exception of a handful of reserved names, the XML name space belongs to you. But this means that "UL" has no more meaning to an XML processor than "SOLUTION" does; in both cases, you must use some other mechanism to specify the semantics. The most common mechanisms are going to be Java classes or stylesheets. Some people have suggested the definition of an "HTXML" to grandfather existing HTML tags but let you define any other tag that is not HTML. One great big problem with this approach is that if HTXML is based on HTML 4.0 (say), and HTML 4.0 has no SOLUTION tag, and therefore you use SOLUTION all through your documents assuming a particular meaning for SOLUTION, and then in HTML 4.1 a tag named SOLUTION with a different meaning is defined, you're hosed. A possible way out of this would be to define a reserved attribute to tell an XML browser that you want some element type to have the semantics of some HTML tag: ... This has some advantages, but given the speed with which XML is moving, I personally am not persuaded that it's worth the trouble. Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Jon.Bosak at Eng.Sun.COM Fri Jun 27 07:11:52 1997 From: Jon.Bosak at Eng.Sun.COM (Jon Bosak) Date: Mon Jun 7 16:58:06 2004 Subject: Lark 0.90 available, with an application In-Reply-To: <3.0.32.19970626170447.00a71490@pop.intergate.bc.ca> (message from Tim Bray on Thu, 26 Jun 1997 17:04:52 -0700) Message-ID: <199706270510.WAA05183@boethius.eng.sun.com> [Tim Bray:] > (why did we lose JAX, I liked it?) Because we were informed that it means "toilet" in Ireland -- clearly a variant of "jakes," which is why JAX bothered me for reasons that I couldn't identify until the Irish usage was pointed out. Jon xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri Jun 27 09:31:42 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:06 2004 Subject: XML and HTML Intermixed Message-ID: <8543@ursus.demon.co.uk> In message <9706261658.AB03565@hoccson.ho.att.com> "Seibel, Robert R" writes: > XML Dev. Team: There is no 'team' other than the public-spirited members of this list and others :-). Everyone is invited to join in - no entrance qualifications - just a willingness to help the development process. > > In my application, I see the need to be able to mix XML (my own tags) > and HTML tags in a core content database. I plan on using a DTD > at various authoring points to validate structure and tags. This is an absolutely key question - which some of us raise at regular intervals. My analysis - which I hope others will challenge or amplify - is something like this: HTML2.0 and HTML3.2 *at present* are SGML-compatible (if properly authored, with balanced tags, quoted attributes, etc.) They are not XML-compatible for reasons which have been discussed here (inclusions/exclusions, '&' content models, etc. in the DTD, and some EMPTY tags which require the syntax in XML). We all expect that 'someone' will convert common DTDs to XML and HTML is a leading candidate but so far no-one has actually done it. (IMO it needs to have the (in)formal blessing of the W3C, since HTML is a W3C protegee). So the question might break down to: (a) can I mix HTML(non-XML) with XML in the same document? This would not be a valid XML document overall, but it might be valid input to an HTML browser which recognised XML markup. It's up to the browser (or other software) creator as to whether that's meaningful. (b) can I refer to an XML document from an HTML document? This is simple if there is a MIME type for XML, since standard helper technology can be used. [This is what I do for CML (Chemical Markup Language) and I use the browser to call a viewer for text/xml or chemical/x-cml]. It is generally believed that 'someone' is submitting an application to IETF/IANA for registration of the text/xml MIME type (??Progress??). (c) can I XML-ise HTML and mix it with my own DTD? Yes. It depends on how this is done. I have edited HTML2.0 to be XML-compliant for my own purposes. CML 'contains' HTML2.0 as part of the CML DTD. This guarantees there are no namespace problems (i.e. CML cannot have identical ELEMENTs to those in HTML). So this allows CML documents to contain chunks of XML-ised HTML. Rendering these is non trivial, because it is not easy to pass HTML to the browser without using Javascript and I do not like doing this (non-portable, flaky, etc.) Moreover I have tweaked my HTML to use the full XML-LINK syntax for tags such as . (d) Can I use HTML with my document if I have an ElementType which clashes with one in HTML? Not easily. The question of combining DTDs and document fragments has exercised the ERB/WG and generated megabytes of opinion. A solution will appear at some time in the future. (e) Can I use XML-ised HTML and include XML-LINKs to other XML documents? Yes, if the HTML has been extended to use XML-LINK. This is what I do to avoid namespace clashes. It may have its detractors. Be warned that there is not much software which can display XML documents using two different DTDs at the same time; I'm working out how JUMBO will do this - if I get some answers to my LINK queries it should be fairly straighforward. > > Do you see mixing tags as reasonable? The XML tags could be converted > to the appropriate HTML tags if sent to a browser. Then again There are normally no default 'appropriate HTML tags'. How would you convert 276+354/872=6354? to HTML? One way to tackle this is through stylesheets (CSS1 or DSSSL) where appropriate formatting/rendering is applied to each tag, including context. Alternatively (as in JUMBO) Java classes can be supplied for each ElementType which might convert to HTML. (For example, MOLecule in CML has 1500 lines of Java which among many other things will render it as HTML). > all of the tags or information could be formatted for the appropriate > output device > on the fly. > > For instance, I may have a tag called PROBLEM and another called > SOLUTION. > As I'm explaining the solution, it would be nice to use HTML tags to > explain the > solution. > > Example: > > Problem description > >
    >
  1. Do this first
  2. >
  3. This is second
  4. >
>

Call me on questions.

> > > Let's say I used a style sheet to display the contents. It seems to me > that > using HTML tags intermixed with XML tags is a good thing. I don't have > to > reinvent my own tags when HTML already defines them. > Comments? I am strongly in favour or re-using DTDs and document fragments. So many chemical documents will draw from 3 DTDs: - HTML for the main text - MathML for the mathematics - CML for the chemistry The ERB/WG has debated this at great length and accepts it as very desirable and high-priority. No actual mechanism is given at present. An additional character has been reserved for NAMEs in case we need to use it for namespace #in the future, but we're not allowed to use it yet [I think that is the correct position??]. To summarise, I believe that mix-and-match from different DTDs is a valid and useful approach to XML. It means that there can be 'islands of validity' [an idea from the WG] within XML documents, so that XML-WF docs will not be semantically void tag soup. The difficulty at present is how those islands are identified - there is no consensus yet. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri Jun 27 10:40:39 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:06 2004 Subject: Lark 0.90 available, with an application Message-ID: <8554@ursus.demon.co.uk> In message <3.0.32.19970626170447.00a71490@pop.intergate.bc.ca> Tim Bray writes: > Hi - Lark 0.90 is now available at > http://www.textuality.com/Lark > > Differences: > - now does entity references in attribute values > - does &#X style hex character references > - has draconian error handling > - the Handler has an element() method to serve as an element factory > - lots of bug fixes > - it's all in a package, textuality.lark Great!! I was waiting for the 'package' to bolt it into JUMBO. [I'm writing this before I have downloaded it.] > > Doesn't do PE's yet. > It's now over 40k, sigh. We can't easily get round this problem. XML takes a *lot* of code. I have found that JUMBO has huge classes (e.g. 100 member functions) for Node, Tree and TOC. Trouble is that they all have to be loaded even if only a small amount of functionality is used - e.g. you have to have mouseDrag(), mouseMove() even if the user might not drag the mouse :-) > > For me, the interesting thing is that it now comes with an application > named XH. It was bothering me that I was writing but not using the > software, so I created xh, which reads the XML form of all the docs > I'm working on (XML-lang, XML-link, MCF, etc etc etc) and generates > the HTML. This used to be done with a mouldy tumerous perl program - > nothing against perl, but xh is a lot cleaner and nicer. Also it > produces valid HTML, which the perl didn't. > > Xh is interesting as it is probably a canonical customer for XAPI > (why did we lose JAX, I liked it?) - it doesn't use the event stream, So did I! There are 30K+ references to JAX on the net including jax.org (where the mouse genome is being explored). > it lets the parser build the tree and then just runs around the > elements and attributes. Yes. JUMBO does this by having a generic SGMLNode (named before XML was invented) which has default actions for attributes, contents, etc. It has routines such as process(), toHTML(), toString(), display(Graphics g), etc. So that reading a DTD-less XML document it can still do something with it. > > For Xh, I also, after getting it working, realized that I had re-used > Peter Murray-Rust's trick of just having a .class per element-type > (Class.forName() and Class.newInstance(), gotta love 'em) - I wonder if > this is just a coincidence or is this the basic paradigm on which XML > software is going to be built? If so, it might make sense to wire > a standard class-finder call into XAPI. I suspected we were quite close to this with ElementFactory. I've been slightly reluctant to post JUMBO code for this part because JUMBO has evolved rather than been planned (it wasn't intended to be graphical to start with :-) The basic steps are: - parse the document into a Tree of Nodes (actually Elements at present) This is all that can be done with a DTD-less document. If NXP or Lark is given as an argument, JUMBO will use them as the parser. It creates Elements as it encounters them (even with Lark - this is historical). - if a DTD is given, it downloads a *.class file for that DTD. [This is resolved locally at present, but if we agree on catalogs and other naming conventions, then we can resolve it globally. - the class file gives a list of ElementTypes ('GI's) for which there are *.class files available. Thus in PLAYDTD.class there are references to STAGEDIRNode.class, SPEECHnode.class. **This does NOT have to have a class for each type unless that is seen as essential. The default Node methods are used. - if a Node has a GI in the DTD class, it is specifically created. Thus the PLAYDTD.class has code like: if (gi.equals("SPEECH")) { node = DTD.createSubclassedNode("SPEECH", content, attributes); } else { node = new Node(content, attributes); } Then the subclassed Nodes have node-specific methods, and display() will show specific icons, etc. This is done at, or immediately after, parse time. So JUMBO will create a subclassed Node from a generic Lark element if required. If this is what ElementFactory does, then great! There are the following performance hits: (a) it is slower to parse since the specialised nodes are created at that time (b) all the specialised code is loaded at parse time even if the user doesn't require it. Since performance is hit by code size, some applications run very slowly. So perhaps there needs to ba a lazy creation of specialised Elements?? IOW everything is generic until it's actually referenced, when it gets a specialised Element from the factory. Maybe I will post the code for PLAYDTD if it would help the process. P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From n.bradley at pindar.co.uk Fri Jun 27 10:43:30 1997 From: n.bradley at pindar.co.uk (Neil Bradley) Date: Mon Jun 7 16:58:06 2004 Subject: XML and HTML Intermixed Message-ID: >[Bob Seibel:] > >| Let's say I used a style sheet to display the contents. It seems to me >| that using HTML tags intermixed with XML tags is a good thing. I don't >| have to reinvent my own tags when HTML already defines them. > >You can mix tags all you want; with the exception of a handful of >reserved names, the XML name space belongs to you. But this means >that "UL" has no more meaning to an XML processor than "SOLUTION" >does; in both cases, you must use some other mechanism to specify the >semantics. The most common mechanisms are going to be Java classes or >stylesheets. Is there a general assumption that the browser vendors will support XML in the near future? If this is so, I would think that HTML tags can be avoided. Just use cascading style sheets on XML tags instead. The one big exception to this would be tables, which are not covered by CSS (yet?). I know I have mentioned this before, but if browser vendors are going to lead the XML revolution, and I think (or at least hope) they are, then we should expect that they would want to retain their investment in HTML tables, rather than adopt some new standard. Could we not therefore accept this reality, but to maintain flexibility state that the Table element definition must include a fixed attribute, say -XML-TABLE, when this assumption should be made. _________________________________________________________ Neil Bradley, SGML Consultant, Pindar plc Author of "The Concise SGML Companion" Addison-Wesley Longman (ISBN: 0-201-41999-8) The third-rate mind thinks with the majority; the second-rate mind thinks with the minority; the first-rate mind is only happy thinking (A. A. Milne) Tel: +44 (0)1904 330162 EMail: neil@bradley.co.uk URL: http://www.bradley.co.uk _________________________________________________________ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Fri Jun 27 13:44:33 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:58:07 2004 Subject: XML and HTML Intermixed In-Reply-To: <8543@ursus.demon.co.uk> Message-ID: In message <8543@ursus.demon.co.uk>, Peter Murray-Rust writes >... >(e) Can I use XML-ised HTML and include XML-LINKs to other XML documents? > Yes, if the HTML has been extended to use XML-LINK. This is what I >do to avoid namespace clashes. It may have its detractors. Be warned that >there is not much software which can display XML documents using two different >DTDs at the same time; I'm working out how JUMBO will do this - if I get some >answers to my LINK queries it should be fairly straighforward. >... >To summarise, I believe that mix-and-match from different DTDs is a valid and >useful approach to XML. It means that there can be 'islands of validity' >[an idea from the WG] within XML documents, so that XML-WF docs will not >be semantically void tag soup. The difficulty at present is how those >islands are identified - there is no consensus yet. I would suggest, looking at the XML-Link spec, that the clean way to mix and match is to use simple links with the attribute specifications: SHOW="EMBED" ACTUATE="AUTO" Have your chunk of HTML as a separate document (which can be valid or well-formed, as you wish), and just point to it. This is a fragment from an object record in a museum catalogue, where the artist's biographical details are stored in a separate XML document with a different DTD: Mathias, William On hitting the empty element, the XML processor will go off and read mathias.xml. This will be parsed separately, and probably held separately in memory. It is a genuinely separate document with its own namespace and so on. But _for_the_purposes_of_display_and_processin g_ it is 'inserted' into the source document at the point the element occurs. And ACTUATE="AUTO" says that it is, in effect, a necessary part of the catalogue record. If you want to physically embed a chunk of HTML into your document which conforms to a non-HTML DTD, surely you have to extend your DTD as Peter has done. And once you have put the HTML element types into your DTD, you're not really 'mixing and matching' in the sense of this discussion: just borrowing a bunch of element types from another DTD. Richard Light xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From akirkpatrick at ims-global.com Fri Jun 27 15:57:42 1997 From: akirkpatrick at ims-global.com (akirkpatrick@ims-global.com) Date: Mon Jun 7 16:58:07 2004 Subject: Lark 0.90 available, with an applica Message-ID: Sorry if this has been over before, but these are my thoughts on the class-per-element-type idea (mentioned recently in Tim Bray's post about Lark). I did something very similar recently (admittedly in C++) and abandoned it. My application was an SGML->RTF convertor. It read the events using SP and created a tree of elements derived from SGMLElement but specialised towards RTF. The hierarchy looked something like: SGMLElement RtfFile RtfContainer RtfPara RtfTitle RtfTitleTarget RtfAdmonition RtfInline (parametrised) RtfLink etc. I found the following drawbacks: 1. Leads to "class spaghetti" with similar code being spread all over the place. 2. There is usually a large degree of dependence between the elements and the driving application. Often the elements need to access the driving application directly and there is no obvious and efficient way provide this interface. 3. You need to create a new class for each new element type (less of a problem in Java?). For C++, this means recompiling the application. It was actually when I looked at the prospect of creating a whole new raft of classes for the HTML output that I decided to start again. I rewrote my application to use the follow process: 1. SgmlReader reads document and creates tree of generic elements. Each element has an SgmlRule member variable/class. 2. SgmlStylesheet reads a stylesheet (also in SGML) and associates properties with the elements based on gi, position, etc. These properties are added to the SgmlRule for each element. 3. RtfWriter/HtmlWriter/XxxWriter recursively processes the elements deciding what to do based on the properties applied by the stylesheet. (I realise this is similar to the way Jade operates but our RTF writer also handles WinHelp and has other output/app-specific features). Ideally, this should be generalised further with a SgmlElementPlusRule class which just contains a pointer to the SgmlElement and the SgmlRule (otherwise the SgmlElement has a dependency on SgmlRule). The stylesheet mechanism is (just about) indendent of the output format. All the code to handle RTF/HTML/whatever is centralised in the XxxWriter class. I've found this much easier to enhance and maintain than the previous implementation. I've also found that 90% of the time we can do things with the stylesheet without recompiling the application. I'd be really interested to hear views in favour of the class approach. Alfie. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From eliot at isogen.com Fri Jun 27 16:17:01 1997 From: eliot at isogen.com (W. Eliot Kimber) Date: Mon Jun 7 16:58:07 2004 Subject: Lark 0.90 available, with an application Message-ID: <3.0.32.19970627075258.00f35098@mail.swbell.net> At 05:04 PM 6/26/97 -0700, Tim Bray wrote: >For Xh, I also, after getting it working, realized that I had re-used >Peter Murray-Rust's trick of just having a .class per element-type >(Class.forName() and Class.newInstance(), gotta love 'em) - I wonder if >this is just a coincidence or is this the basic paradigm on which XML >software is going to be built? If so, it might make sense to wire >a standard class-finder call into XAPI. It makes sense to have one class per element type (after all, you usually have distinct types because they have distinct semantics and thus distinct behavior). You can take this one step further if you add architectures: your element-type-specific classes can themselves be derived from arch-form-specific classes. In other words, given an architecture hierarchy, it should be natural to define a corresponding implementing class hierarchy. If your processor has a fallback scheme for mapping element types to objects that includes mapping element types to the objects for the architectural forms from which they are derived when they don't have their own object, it should be possible to build fairly generic processors for common architectures that can be quickly applied to new documents simply by adding the architectural mapping to the documents. For example, in Tim's case he's mapping to HTML. It's probably the case that most of the mapping is a simple one-to-one mapping, which can be represented by deriving the base document from HTML as an "architecture".[I say "architecture" because HTML is not really suitable as an architecture as it is not sufficiently general--in particular, the lack of generic, nesting divisions with generic titles makes it difficult, if not impossible, to derive from HTML document types that themselves use recursive divisions because the mapping to HTML is dependent on the nesting context. You could use the SGML implicit LINK feature to do such a mapping, but I'm not suggesting that as a general solution.] Not all the mappings are this simple, but probably 80% are. Given this, instead of having one object class per element type in the base documents, you could have one class per HTML "form" plus unique classes only for those base elements that require more complex mappings. If you've got a DTD, you can do most of the mapping there: Thus, expressed in procedural syntax (I don't know Java), you can have logic like: switch (element_type()) { case "specialized-A": map_specialized_A(); break; case "specialized-B": map_specialized_B(); break; default: if (if_derived_from_arch(current_node(),"HTML") { # Is element derived from the HTML arch? switch (arch_form(current_node(),"HTML") { case "html": print ""; break; case "h1" print "

" default: # no mapping } } } Of course, without a formal mechanism to refer to a hierarchy of architectures in XML (e.g., using data attributes as defined by the AFDR of the HyTime standard [review temporarily at www.drmacro.com/hythtml/clause-A.3.html]), you can only define one level of architectural inheritance, but that's probably enough to simplify 80% of the mappings people need to do. It should be clear as well that browsers can provide easy-to-invoke default styles by defining an HTML-like architecture that reflects their formatting semantics and then provide built-in architectural recognition for at least that architecture (this is essentially what we do when we down-translate to HTML--we're invoking the browsers' built-in formatting semantics associated with the HTML "architecture"--but there's no reason the transform can't be done in the browser. But note that HTML itself (at least in its current form) won't work for the reasons given above. Cheers, E. --
W. Eliot Kimber, Senior Consulting SGML Engineer Highland Consulting, a division of ISOGEN International Corp. 2200 N. Lamar St., Suite 230, Dallas, TX 95202. 214.953.0004 www.isogen.com
xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From nmikula at edu.uni-klu.ac.at Fri Jun 27 16:33:01 1997 From: nmikula at edu.uni-klu.ac.at (Norbert Mikula) Date: Mon Jun 7 16:58:07 2004 Subject: Lark 0.90 available, with an application In-Reply-To: <3.0.32.19970627075258.00f35098@mail.swbell.net> Message-ID: On Fri, 27 Jun 1997, W. Eliot Kimber wrote: > At 05:04 PM 6/26/97 -0700, Tim Bray wrote: > >For Xh, I also, after getting it working, realized that I had re-used > >Peter Murray-Rust's trick of just having a .class per element-type > >(Class.forName() and Class.newInstance(), gotta love 'em) - I wonder if > >this is just a coincidence or is this the basic paradigm on which XML > >software is going to be built? If so, it might make sense to wire > >a standard class-finder call into XAPI. > > It makes sense to have one class per element type (after all, you usually > have distinct types because they have distinct semantics and thus distinct > behavior). You can take this one step further if you add architectures: > your element-type-specific classes can themselves be derived from > arch-form-specific classes. I have also had a similar idea a few days ago.*1 I would like to know whether you guys think it makes sense to go even further and have also this kind of calls for attributes and other potential "nodes" in our parse tree. I would think so. Now the question remains if this approach should substitute the event base stream that built the bottom layer of our XAPI-J discussion. I think the event based approach should still form the base. Many people, I believe, feel still very comfortable with it. *1 http://www.lists.ic.ac.uk/hypermail/xml-dev/9706/0133.html Best regards, Norbert H. Mikula ===================================================== = SGML, XML, DSSSL, Intra- & Internet, AI, Java ===================================================== = mailto:nmikula@edu.uni-klu.ac.at = http://www.edu.uni-klu.ac.at/~nmikula ===================================================== xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From ebaatz at barbaresco.East.Sun.COM Fri Jun 27 16:41:37 1997 From: ebaatz at barbaresco.East.Sun.COM (Eric Baatz - Sun Microsystems Labs BOS) Date: Mon Jun 7 16:58:07 2004 Subject: Character encoding questions Message-ID: Oh, oh (as my 18-month old daughter says :-). Your email message addresses me as if I was an expert on SGML's and XML's use of character sets. I am not, so I will not be attempting to answer your queries. Fortunately, your email copied xml-dev, and many SGML and XML experts exist there. I just want to explicitly invite them to respond to your queries, as I, alas, cannot. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri Jun 27 20:45:27 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:07 2004 Subject: XML and HTML Intermixed Message-ID: <8576@ursus.demon.co.uk> In message Richard Light writes: > In message <8543@ursus.demon.co.uk>, Peter Murray-Rust > writes > >... > >(e) Can I use XML-ised HTML and include XML-LINKs to other XML > documents? > > Yes, if the HTML has been extended to use XML-LINK. This is > what I > >do to avoid namespace clashes. It may have its detractors. Be warned > that > >there is not much software which can display XML documents using two > different > >DTDs at the same time; I'm working out how JUMBO will do this - if I > get some > >answers to my LINK queries it should be fairly straighforward. > >... > >To summarise, I believe that mix-and-match from different DTDs is a valid and > >useful approach to XML. It means that there can be 'islands of validity' > >[an idea from the WG] within XML documents, so that XML-WF docs will not > >be semantically void tag soup. The difficulty at present is how those > >islands are identified - there is no consensus yet. > > I would suggest, looking at the XML-Link spec, that the clean way to mix > and match is to use simple links with the attribute specifications: > > SHOW="EMBED" ACTUATE="AUTO" > > Have your chunk of HTML as a separate document (which can be valid or > well-formed, as you wish), and just point to it. This is a fragment from > an object record in a museum catalogue, where the artist's biographical > details are stored in a separate XML document with a different DTD: This is exactly what I was suggesting above in (e). I only didn't put in the details because I have posted them in gory detail a few postings ago under 'Re: XML-LINK'. > > > Mathias, William > > > > On hitting the empty element, the XML processor will go > off and read mathias.xml. This will be parsed separately, and probably > held separately in memory. It is a genuinely separate document with its > own namespace and so on. But _for_the_purposes_of_display_and_processin > g_ it is 'inserted' into the source document at the point the > element occurs. And ACTUATE="AUTO" says that it is, in This is what I am waiting for guidance on :-). Some people such as Eliot (and I as a humble follower), see 'resource' as a point. Others appear to use 'resource' to represent a finite piece of information. If the latter is, in fact, the ERB's view, then the question of where to 'insert' the other information is critical. If you find my earlier analysis useful, I'd be grateful for comments as this would give me confidence to implement it (or not!). P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri Jun 27 20:45:37 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:07 2004 Subject: Lark 0.90 available, with an application Message-ID: <8577@ursus.demon.co.uk> In message Norbert Mikula writes: > On Fri, 27 Jun 1997, W. Eliot Kimber wrote: > [...] > > I have also had a similar idea a few days ago.*1 > I would like to know whether you guys think it makes > sense to go even further and have also this kind of > calls for attributes and other potential "nodes" in > our parse tree. I would think so. Yes. Definitely. The more of this that can be generalised, the better. Essentially quite a lot of JUMBO is involved in this sort of processing and I'd be more than happy to try to migrate JUMBO's ideas towards an API. My Node class (== Element, more or less) has nearly 100 member functions, and I'll try to post them as a javadoc API (just needs locating on the WWW). > > Now the question remains if this approach should substitute > the event base stream that built the bottom layer of > our XAPI-J discussion. I think the event based approach > should still form the base. Many people, I believe, > feel still very comfortable with it. I feel very comfortable with the event stream API and I would certainly not substitute it. Essentially JUMBO can consume Elements from either an NXP-like event stream, or a Lark-like tree structure. It may be useful to identify those objects such as Element and Attribute that are relevant to both environments, and there could be an Element/Attribute Factory sitting on top of both (but leaving them exposed as well for those who need the lower level). P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri Jun 27 20:46:03 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:07 2004 Subject: Lark 0.90 available, with an applica Message-ID: <8578@ursus.demon.co.uk> In message akirkpatrick@ims-global.com writes: > Sorry if this has been over before, but these are my No, it's a new and useful discussion :-) > thoughts on the class-per-element-type idea (mentioned > recently in Tim Bray's post about Lark). > > I did something very similar recently (admittedly in C++) > and abandoned it. My application was an SGML->RTF > convertor. It read the events using SP and created a tree > of elements derived from SGMLElement but specialised > towards RTF. The hierarchy looked something like: > > SGMLElement > RtfFile > RtfContainer > RtfPara > RtfTitle > RtfTitleTarget > RtfAdmonition > RtfInline (parametrised) > RtfLink > etc. > > I found the following drawbacks: I think the primary problem is that the mapping of SGML to RTF is formally impossible. If the SGML application was MathML, the content might be a second order differential equation; if CML, it might be the active site of HIV protease. Neither of these has the concept of 'paragraph' :-) It's very common that people use 'SGML' as a shorthand for 'a-conventional human-readable-textual-document-marked-up-with-a-set-of-tags-that-make-textual sense'. They then devise SGML2XYZ translators. These can only be generic if the have heuristics about how commonly encountered markup maps onto XYZ constructs. JUMBO has a small number of such heuristics. It tries to find the title of an Element (for display) as follows: - use the TITLE attribute - else find a child with TITLE elementType - else use the ID attribute - else take the first 30 characters of PCDATA - else take the elementType but this is only to try to help human navigators - it's not a formal transformation. > > 1. Leads to "class spaghetti" with similar code being spread > all over the place. This isn't necessary if inheritance is used. JUMBO has a superclass Node which has default procedures (e.g. getTitle() above). By default all Elements display or are processed using this. There are a lot of useful defaults a Node can have. > > 2. There is usually a large degree of dependence between the > elements and the driving application. Often the elements need > to access the driving application directly and there is no obvious > and efficient way provide this interface. No. In JUMBO there is very little coupling between subclassed Nodes and JUMBO. Yes, they have to be subclassed from Node, because that's what they are, but beyond that they have their own behaviour (or none). > > 3. You need to create a new class for each new element type > (less of a problem in Java?). For C++, this means recompiling > the application. My MOLNode class is 1500 lines of Java because molecules are complex. There are routines like orthogonaliseFractionalCoordinates, getMolecularWeight, countHydrogenAtoms, etc. These would have to be written whatever structure was used. There is actually very little duplicated code. Similarly Matrix, Graph and so forth require distinct code. If classes share common functions then they can be subclasses of an intermediate class. Thus in PLAYDTD, both ACT and SCENE could be subclassed from PlayDivision. This class would know that both ACT and SCENE had a child TITLE. [Indeed they might both be instances of PlayDivision directly.] Many elements can get by with just the generic Node class. > > It was actually when I looked at the prospect of creating a whole > new raft of classes for the HTML output that I decided to start again. > I rewrote my application to use the follow process: > > 1. SgmlReader reads document and creates tree of generic elements. > Each element has an SgmlRule member variable/class. > > 2. SgmlStylesheet reads a stylesheet (also in SGML) and associates > properties with the elements based on gi, position, etc. These properties > are added to the SgmlRule for each element. > > 3. RtfWriter/HtmlWriter/XxxWriter recursively processes the elements > deciding what to do based on the properties applied by the stylesheet. > > (I realise this is similar to the way Jade operates but our RTF writer > also handles WinHelp and has other output/app-specific features). It sounds as if you would be better off using DSSSL, since it handles transformations. It's possible to do the same thing in Java - and probably takes the same amount of code - but you may need to define some formatting classes (Div, Para, etc.). > [...] > > I'd be really interested to hear views in favour of the class approach. I hope I've given some above. Wherever the object is complex, then it makes sense for its behaviour to be attached closely to it. I wouldn't like to write a 3-D geometry program in DSSSL (though it would be possible) just as I'd prefer not to do typesetting in Java. The difficult part comes with element-in-context. If an element has different behaviours in different contexts, then code can become hairy. This is often a problem with CML-like DTDs where there are only 10-20 elements per DTD. The other difficult bit is with relations between objects. This can be managed generically with XML-LINK, but usually semantics have to be added. I am trying to make XML-LINK as generic as possible in JUMBO, but I suspect there will be places within one Node where links to anoth have to be specifically considered. HTH P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From Peter at ursus.demon.co.uk Fri Jun 27 21:27:10 1997 From: Peter at ursus.demon.co.uk (Peter Murray-Rust) Date: Mon Jun 7 16:58:07 2004 Subject: API and JUMBO Message-ID: <8584@ursus.demon.co.uk> Following the discussion of the API and the elements-as-classes I have posted my API for JUMBO as javadoc classes. Since there are over 3000 member functions please excuse some awful documentation - some of it was done at an early stage! (Also I haven't been able to copy the javadoc icons). It's at: http://www.vsms.nottingham.ac.uk/vsms/java/jumbo/api/ and the most relevant file there is http://www.vsms.nottingham.ac.uk/vsms/java/jumbo/api/sgml.html (It's only called SGML because I started it before XML:-) The key class is SGMLNode, possibly followed by SGMLTree and DTD and SGMLAttlist. If you read the API I hope you will get some feel for the sorts of member functions that I have found necessary. Note that JUMBO is tree-oriented,. [The graphical functions are in DrawableSGMLNode and SGMLTOC, so avoid those if you are interested in abstract functions only.] I'd value comments, and I'll try to help with the admittedly bad documentation. In my defence, I had no idea where all this was going when I started - JUMBO was not intended to be graphical, an editor, or to support hyperlinks :-) P. -- Peter Murray-Rust, domestic net connection Virtual School of Molecular Sciences http://www.vsms.nottingham.ac.uk/ xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From richard at light.demon.co.uk Fri Jun 27 22:55:51 1997 From: richard at light.demon.co.uk (Richard Light) Date: Mon Jun 7 16:58:07 2004 Subject: XML and HTML Intermixed In-Reply-To: <8576@ursus.demon.co.uk> Message-ID: In message <8576@ursus.demon.co.uk>, Peter Murray-Rust writes > >This is what I am waiting for guidance on :-). Some people such as Eliot >(and I as a humble follower), see 'resource' as a point. Others appear to >use 'resource' to represent a finite piece of information. If the latter >is, in fact, the ERB's view, then the question of where to 'insert' the >other information is critical. Well, the XML-Link draft defines 'resource' as "an addressable unit of information or service which is participating in a link", e.g. "files, images, documents, programs and query results". In that sense, surely, a 'resource' is definitely a finite piece of information rather than a point. I think I can understand the 'point' point-of-view, in that linking (to XML documents or elements within them) always addresses nodes, which are points. However, the XML locator syntax carefully ensures that the target of a link within an XML document is always an element (or sometimes a little clutch of elements) - it can never be part of an element (as it can with TEI extended pointers). So your 'point' is always an element node(s) in the tree structure. This being the case, I have assumed myself that the intention is to _be able to_ treat the target resource (element) as a finite thing which can be delivered to the client. Surely the wording for ?XML-XPTR= syntax shows an intent to actually deliver the whole element: "... the host should perform the XPointer processing to extract the sub-resource [= element], and that only the sub-resource should be transmitted to the client". Richard Light. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Fri Jun 27 23:42:01 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:07 2004 Subject: Lark 0.90 available, with an applica Message-ID: <3.0.32.19970627143941.00a6c890@pop.intergate.bc.ca> At 02:08 PM 27/06/97 +0000, akirkpatrick@ims-global.com wrote: >I did something very similar recently (admittedly in C++) >and abandoned it... > >1. Leads to "class spaghetti" with similar code being spread >all over the place. In the XtoH application, the ElementLogic class from which all the element classes are subclassed has an atStart(), an atEnd(), and a doText(). In a lot of cases, the atStart/atEnd amounted to "emit the following string, interpolating the following attribute values". So yes, a lot of parallelism, but this seemed a fair price to pay for the independence and modularity. >2. There is usually a large degree of dependence between the >elements and the driving application. Often the elements need >to access the driving application directly and there is no obvious >and efficient way provide this interface. Not always true. I don't do C++, but in Java, after the controller cooks up the per-element object, he calls its method registerController(this) - the per-element classes all have a mController member, thus they can callback to the controller. The amount they had to do so was pretty small. >3. You need to create a new class for each new element type >(less of a problem in Java?). For C++, this means recompiling >the application. Non-problem in Java... in fact, you don't even need to know what you've got when you start; when you find a new element, you can dynamically see if there's a class for it. >I'd be really interested to hear views in favour of the class approach. Why I wrote this. I would say that while we'd all prefer a declarative stylesheet approach, it is my belief that in a lot of cases it's going to be common to use, at least occasionally, some per-element custom logic. Java makes this easy enough to be very appealing as a general framework. - Tim xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From tbray at textuality.com Sun Jun 29 03:39:07 1997 From: tbray at textuality.com (Tim Bray) Date: Mon Jun 7 16:58:07 2004 Subject: Lark 0.90 refreshed, argh Message-ID: <3.0.32.19970628183455.00aa22f0@pop.intergate.bc.ca> It wanted to use the olde-fashioned comments. Big hole in the test suite. -T. xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From north at synopsys.com Mon Jun 30 11:35:22 1997 From: north at synopsys.com (Simon North) Date: Mon Jun 7 16:58:07 2004 Subject: Request: more SGML restriction explanations Message-ID: <199706300935.LAA25658@cadis.de> Please accept my apologies if these are non-developmental questions, and my heartfelt thanks to all that have answered my earlier question. I am trying to establish a few points and would appreciate some enlightened answers ... 1. Is allowed? This was really useful SGML and would probably be pretty handy in XML. 2. Are MS, MD, STARTTAG and ENDTAG forbidden in declarations? 3. What exactly is meant by "no attribute value specs on ENTITY declarations"? 4. I'm not allowed data attributes on NOTATIONs, and I'm not allowed name groups in ATTLISTs, but can I cheat and use the following: or would this be illegal too? Could I get rouind it by using a data attribute spec? Thanks in advance, Simon North xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From elm at arbortext.com Mon Jun 30 16:07:24 1997 From: elm at arbortext.com (Eve L. Maler) Date: Mon Jun 7 16:58:07 2004 Subject: Request: more SGML restriction explanations Message-ID: <3.0.32.19970630100938.00ad3220@village.doctools.com> At 11:35 AM 6/30/97 +0000, Simon North wrote: >Please accept my apologies if these are non-developmental questions, >and my heartfelt thanks to all that have answered my earlier >question. > >I am trying to establish a few points and would appreciate some >enlightened answers ... > >1. Is allowed? This was really useful SGML >and would probably be pretty handy in XML. #DEFAULT entity declaration isn't allowed as part of XML. >2. Are MS, MD, STARTTAG and ENDTAG forbidden in declarations? Yes; bracketed text entities aren't allowed as part of XML. >3. What exactly is meant by "no attribute value specs on ENTITY >declarations"? In SGML, data attributes are declared for notations and their values specified as part of NDATA entity declarations. Data attributes are not allowed as part of XML. >4. I'm not allowed data attributes on NOTATIONs, and I'm not allowed >name groups in ATTLISTs, but can I cheat and use the following: > > > >or would this be illegal too? Could I get rouind it by using a data >attribute spec? Nope, not allowed as part of XML. Sorry! Eve xml-dev: A list for W3C XML Developers Archived as: http://www.lists.ic.ac.uk/hypermail/xml-dev/ To unsubscribe, send to majordomo@ic.ac.uk the following message; unsubscribe xml-dev List coordinator, Henry Rzepa (rzepa@ic.ac.uk) From jtigue at datachannel.com Mon Jun 30 17:26:27 1997 From: jtigue at datachannel.com (John Tigue) Date: Mon Jun 7 16:58:07 2004 Subject: XAPI-J refinement proposal Message-ID: <33B7D06D.4A0DDD57@datachannel.com> Currently in XAPI there is the interface IContent which has the methods relevant to being a "node" in a parse tree/grove; accessors for parent and children. Recent work has shown that there would be some benefit to breaking IContent into IContent (child) and IContainer (parent). Also there was feedback asking for addContent() to be extended to appendContent() and insertContent(). This would look like: package xml; import java.util.Enumeration; public interface IContainer { public Enumeration getContents(); public void insertContent( IContent aContent, IContent preceedingContent ); // appendContent() puts aContent at the end of the list public void appendContent( IContent aContent ); public void removeContent( IContent aContent ); } AND package xml; import java.util.Enumeration; public interface IContent { public void setParent( IContainer aContainer ); public IContainer getParent(); public String getData(); } So a Document class (not currently part of XAPI-J) would implement IContainer but not IContent. IElement would implement both. A Text or Data class would implement only IContent. I don't see how this interferes with any existing processors. I hope I have not missed anything. Another point is IContent.getData(). This is how, for example, PCData would be retrieved. Grove terminology refers to non marked up text as "data" so we have getData(). Except for this detail the method could just as well have been called getText() (which was my first choice), getString(), or some such. Any comments? -- John Tigue Programmer jtigue@datachannel.com DataChannel (http://www.datachannel.com) 206-462-1999 -------------- next part -------------- A non-text attachment was scrubbed... Name: vcard.vcf Type: text/x-vcard Size: 316 bytes Desc: Card for John Tigue Url : http://mailman.ic.ac.uk/pipermail/xml-dev/attachments/19970630/14b98b7c/vcard.vcf